Twitter Live Feed with Oracle Database as a Service and Business Intelligence Cloud Service

In this post I am going to show how you can very quickly get a Twitter live feed streaming into your Oracle Database as a Service (DBAAS) instance.  Although this example will work on any machine with an Oracle client and for any Oracle Database instance putting it on DBAAS makes it extremely easy to utilize our streaming Twitter data inside of other Oracle Cloud applications such as Business Intelligence Cloud Service (BICS).

Acknowledgments:

  1. Thank you to geniuses at Tweepy for making it so I don’t have to deal with oauth or really anything related to the Twitter API.  I don’t know who you are but I love you! ♥
  2. Twitter so I never have to wonder what Kim or Kanye are thinking.
  3. My homey Przemyslaw Piotrowski (I don’t know him either but his examples sustain my laziness in coding) for writing a super helpful series called The Mastering Oracle+Python Series which I fully intend to finish someday.
  4. Who ever this dude is (https://pythonprogramming.net/twitter-api-streaming-tweets-python-tutorial/) whose example made this easy.

The great thing about DBAAS is I actually have access to the underlying server so I can take advantage of the operating system, and languages like python, as well as the database.  Some might be offended by such interloping but I’m not one of them and as I said above there isn’t any requirement to do this on the database server I’m just taking advantage of a machine that has the cycles to spare.

I will assume that you already have a DBAAS instance up and running and are familiar with how to administer it and are also familiar with using tools like MobaXterm to connect to it.

At the time of this example our Database as a Service is running Oracle Linux 6.7 and 12.0.1.2 of the database with all the bells and whistles.  I needed to install 5 things to make this work:

1. plliblzma to make it so I can get the latest EPEL repository without having to deal with “expletive” Error: xz compression not available message

wget https://kojipkgs.fedoraproject.org//packages/pyliblzma/0.5.3/3.el6/x86_64/pyliblzma-0.5.3-3.el6.x86_64.rpm

yum install pyliblzma-0.5.3-3.el6.x86_64.rpm

2. Latest EPEL repository for this version of linux

wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

yum install epel-release-6-8.noarch.rpm

3. Then I can install python-pip with ease

yum install python-pip

4. And then we can fetch Tweepy with equal ease

pip install tweepy

5. And last but not least Oracle’s python library for interacting with the database

pip install cx_Oracle

Now that we have all the chunks make sure you can import the stuff you need with python and not get any errors.

[oracle@testdrive-01 ~]$ python
Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tweepy
>>> import cx_Oracle
>>>

If you have issue with cx_Oracle you most likely just need to make sure the Oracle environment variables are set via bashrc or however you like to set them.

I created a table in my database to store the JSON document that is the API response

CREATE TABLE "TWITTER_USER"."EAT_MY_TWEET"
( "ID" NUMBER,
"USERNAME" VARCHAR2(500 BYTE),
"TIMEWHYUPUNISHME" TIMESTAMP (6) DEFAULT systimestamp,
"TWEET_JSON" CLOB,
CONSTRAINT "ENSURE_JSON" CHECK (TWEET_JSON is JSON) ENABLE
)

In order to use the Twitter API you will have to register your app with them here https://apps.twitter.com at which point they will give you the keys to make oauth work.

Then all we need to do is write (by which I mean copy paste and modify) a little python code and we are ready to rock:

#import libraries
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import cx_Oracle
import datetime
import json


#connection string for database
conn_str='WHODAT/Ap@ssw0rd@localhost:1521/MyPluggableDatabseServiceName'

#get me a connection
conn =cx_Oracle.connect(conn_str)

#turn on autocommit
conn.autocommit=1

#object for executing sql
c=conn.cursor()

#clob variable
bvar=c.var(cx_Oracle.CLOB)

#twitter api application keys
#consumer key, consumer secret, access token, access secret.
ckey='Dont'
csecret='Tell'
atoken='Anybody'
asecret='The Password'

#listen to the stream
class listener(StreamListener):

#get some
    def on_data(self, data):
         try:

            #barf response insto json object
            all_data = json.loads(data)

            #parse out tweet text, screenname and tweet id
            tweet = all_data["text"]
            if (all_data["user"]["screen_name"]) is not None:
                username = all_data["user"]["screen_name"]
            else:
                username = 'No User'
            tid = all_data["id"]

            #set clob variable to json doc
            bvar.setvalue(0,data)
            try:
                #create sql string with bind for clob var

                sql_str="INSERT INTO EAT_MY_TWEET (ID,USERNAME,TWEET_JSON) Values("+str(tid)+",q'["+username.encode('utf-8').strip()+"]',:EATIT)" 

                #insert into database 
                c.execute(sql_str,[bvar])                  

            except Exception: 
                sys.exc_clear() 

            #watch tweets go by in console 
            print((username,tweet)) 

            #in case you want to print response 
            #print(data) 
            return(True) 
        except Exception: 
                sys.exc_clear() 
def on_error(self, status): 
     print status 
     print(data) 
     print sql_str 

#log in to twitter api 
auth = OAuthHandler(ckey, csecret) 
auth.set_access_token(atoken, asecret)
 
#fire it up 
twitterStream = Stream(auth, listener()) 

#what to search for (not case sensitive) 
#comma separated for words use 
# for hashtag 
#phrases are words separated by spaces (like this comment) 
twitterS.filter(track=["ProveFilterWorks,Oracle,Vlamis,OBIEE,BICS,DVCS,Data Visualization Desktop"])

I named my file stream.py so to execute I just fire it up with nohup –

nohup python -u /home/oracle/stream.py>/home/oracle/stream.out 2>/home/oracle/stream.err &

and if I tail my stream.out file with tail -f stream.out I can watch the tweets go by as they are inserted into the database

now that I am inserting the json document that the twitter api sends back to me into the database

I can use Oracle Database 12c support for json to expose the document in my table as a view using the following SQL

CREATE OR REPLACE FORCE EDITIONABLE VIEW "TWITTER_USER"."LIVE_TWITTER_FEED" ("CST_DATE", "UTC_DATE", "UTC_HOUR", "UTC_MINUTE", "ID", "CREATED_ON", "SCREEN_NAME", "LOCATION", "FOLLOWERS_CNT", "FRIENDS_CNT", "LISTED_CNT", "FAVOURITES_CNT", "STATUSES_CNT", "RETWEET_CNT", "FAVOURITE_CNT", "URL", "PROFILE_IMAGE_URL", "BANNER_IMAGE_URL", "HASHTAGS", "TWEET", "EMT_TIMEWHYUPUNISHME") AS 
  SELECT cast(TO_TIMESTAMP_TZ(REPLACE(upper(b.created_on),'+0000','-00:00'),'DY MON DD HH24:MI:SS TZH:TZM YYYY')  at Time zone 'CST' as date) CST_DATE,
cast(TO_TIMESTAMP_TZ(REPLACE(upper(b.created_on),'+0000','-00:00'),'DY MON DD HH24:MI:SS TZH:TZM YYYY') as date) UTC_DATE,
cast(to_char(cast(TO_TIMESTAMP_TZ(REPLACE(upper(b.created_on),'+0000','-00:00'),'DY MON DD HH24:MI:SS TZH:TZM YYYY') as date),'HH24') as number) UTC_HOUR,
cast(to_char(cast(TO_TIMESTAMP_TZ(REPLACE(upper(b.created_on),'+0000','-00:00'),'DY MON DD HH24:MI:SS TZH:TZM YYYY') as date),'MI') as number) UTC_MINUTE,
a."ID",b."CREATED_ON",a."USERNAME",b."LOCATION",b."FOLLOWERS_CNT",b."FRIENDS_CNT",b."LISTED_CNT",b."FAVOURITES_CNT",b."STATUSES_CNT",
b."RETWEET_CNT",b."FAVOURITE_CNT",b."URL",b."PROFILE_IMAGE_URL",b."BANNER_IMAGE_URL",b."HASHTAGS",b."TWEET",
a.TIMEWHYUPUNISHME
FROM EAT_MY_TWEET a,
json_table(tweet_json,
'$' columns(
    id varchar(50) path '$.id',
    created_on varchar2(100) path '$.created_at',
    screen_name varchar2(200) path '$."user".screen_name',
    location varchar2(250) path '$."user"."location"',
    followers_cnt number path '$."user".followers_count',
    friends_cnt  number path '$."user".friends_count',
    listed_cnt  number path '$."user".listed_count',
    favourites_cnt  number path '$."user".favourites_count',
    statuses_cnt  number path '$."user".statuses_count',
    retweet_cnt number path '$.retweet_count',
    favourite_cnt number path '$.favorite_count',
     url varchar2(250) path '$."user"."url"',
    profile_image_url  varchar2(500) path '$."user".profile_image_url',
    banner_image_url varchar2(500) path '$."user".profile_banner_url',
     hashtags varchar2(500) format json with wrapper path '$.entities.hashtags[*].text',
      tweet varchar2(250) path '$.text'
    -- nested path '$.entities.hashtags[*]' columns (ind_hashtag varchar2(30) path '$.text'    )
   
)
) b

Then I can analyze the real time twitter data I’m interested in using a simple sql statement

SELECT * FROM LIVE_TWITTER_FEED ;

and while being able to hit real time twitter data with SQL is cool the real goal is being able to surface that data inside of our BI tools. Here we combine Answers and Visual Analyzer on a dashboard to show latest tweets and aggregate information over time utilizing Oracle Business Intelligence Cloud Service (BICS)