Custom Python Twitter Search Class

Twitter has massive amounts of unstructured text data available that can be accessed through a series of REST API’s the company makes available. This information is a goldmine for researches and marketers alike looking to extract features from text to use in various algorithms. In this blog post, we will build our own Twitter Search class to search for tweets using Python. By doing so, we will gain a better understanding of Twitter’s API and REST interfaces in general.

After reading this blog post you will be able to:

• Query twitter’s Search API

Twitter Search API

Importance of Outside Data

I have realized there are companies that rarely look outside their transactional systems for data. In today’s day and age, it’s important to find more and more data as it can be put to good use as no other time in history. For example, Deep Neural Networks perform better as the training size and features increase. The more data you have the better.

Social media provides a vast array of information that can be used. One of the popular social media platforms is Twitter. Thankfully, Twitter makes it easy to access their tweets providing a great amount of information that can be utilized in various algorithms.

Obtain an API

In order to access twitter’s API’s you must first create an application in twitter. Go to the following URL and create your application: https://apps.twitter.com/app/new

When completed, under the Keys and Access Tokens tab of your application, obtain the Consumer Key (API Key) and Consumer Secret (API Secret) which we will use later in our code.

Create Twitter Application

Twitter Search Class

Let’s now start building our Twitter Search class which will require 3 python libraries. Requests is the first library we need which allows you to send HTTP requests quite easily and is a very popular library for this purpose which makes it easy to consume REST API’s. The base64 module will be used to encode our API keys into the format expected by Twitter’s API. The last required is the JSON library.

import requests
import base64
import json

Let’s now start building our Twitter Search class by defining our constructor which requires the client_key and client_secret values you obtained in the previous steps. We then define several parameters that will be used through our class.

·         base_url: twitter’s api endpoint address.

·         auth_url: endpoint used in our authorization function

·         search_url: search api’s endpoint of which we default results to json

·         key_secret: built from our client_key and client_secret

·         KEY_B64: our encoded key secret to be sent to twitter

·         Access_Token: upon authentication, this token is provided by twitter

·         search_headers: used in our twitter search request header

·         cache: a dictionary to store request responses.

class TwitterSearch(object):
    
    def __init__(self,client_key, client_secret):
        #Url Parameters
        self.base_url = 'https://api.twitter.com/'
        self.auth_url = '{}oauth2/token'.format(self.base_url)
        self.search_url = '{}1.1/search/tweets.json'.format(self.base_url)
       
        #Key Encode
        key_secret = '{}:{}'.format(client_key, client_secret).encode('ascii')
        #enconde our key
        self.KEY_B64 = base64.b64encode(key_secret).decode('ascii')
        #b64_encoded_key = b64_encoded_key.decode('ascii')
        
        self.Access_Token = self.Authenticate()
        #search Header
        self.search_headers = {'Authorization': 'Bearer {}'.format(self.Access_Token)}        
        cache = {}

Next up is our Authenticate function to authenticate our application with twitter. Upon successful authentication, twitter return an access_token to be used in our calls to Twitters API’s.

Our function creates the auth_headers and auth_data dictionaries and executes a post to the auth_url specified in our class constructor. The successful response is code 200.

If the response code is 200 we return the access token. There are various response codes that can be returned, you can find those here.

    def Authenticate(self):
        auth_headers = {
                            'Authorization': 'Basic {}'.format(self.KEY_B64),
                            'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
                            }

        auth_data = {
                        'grant_type': 'client_credentials'
                    }

        auth_resp = requests.post(self.auth_url, headers=auth_headers, data=auth_data)

        # Check status code okay
        #Find Response Status Codes Here: https://developer.twitter.com/en/docs/basics/response-codes
        if auth_resp.status_code==200:
            print("Authentication Successfull")
            auth_resp.json()['access_token']
            return auth_resp.json()['access_token']
        else:
            raise ValueError("Error Authenticating...")

Now we create a function to cache important items from the results which can be used for paging. The _updateCache function requires parameter sr which will hold our search results in json format (will get to this in the search function). 

    def _updateCache(self, sr):
        #cache the results
        self.cache = {"max_id": sr['search_metadata']['max_id'],
                      "since_id": sr['search_metadata']['since_id'],
                      "refresh_url": sr['search_metadata']['refresh_url']
                     }
        
        if 'next_results' in sr['search_metadata']:
            self.cache['next_results']= sr['search_metadata']['next_results']

Now we implement our search functionality and define the required parameters and their defaults. The search string is the parameter q. We are defaulting the results to be most recent tweets, and a default count of 50 results, the maximum is 100. 

We then create a dictionary which contains all our query parameters. Additionally, the function accepts latitude, longitude and radius to perform geo-queries. If these are provided, our function adds them to the search_params dictionary.

Finally, we issue a GET to the search_url and receive the search response which is converted to json and stored in the sr parameter. This parameter is passed to the _updateCache we previously defined.  

    #Search API Documentation
    #https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
    def Search(self, q, result_type='recent',count=50, lang="en", lat = None, lon = None, radius=None):

        search_params = {
            'q': q,
            'result_type': result_type,
            'count': count,
            'lang': lang
        }
        
        
        if lat != None and lon != None and radius != None:
            search_params['geocode'] = "%f,%f,%dmi" % (lat, lon, radius)

        
        search_resp = requests.get(self.search_url, headers=self.search_headers, params=search_params)             
        sr = search_resp.json()
        
        #cache
        self._updateCache(sr)
        
        #return search results
        return sr

 

Our last function, utilized for paging, is called Get_Next_Results. Twitter’s maximum results per request is 100. If there are more results, the search_metadata will contain a next_results key providing the url parameters to obtain the next page. This is being stored in the cache if available.

The Get_Next_Results makes a new request to twitter’s search api with the corresponding parameters to obtain the next results. 

    def Get_Next_Results(self):
        if 'next_results' in self.cache:
            search_resp = requests.get(self.search_url, headers=self.search_headers, params=self.cache['next_results'])             
            sr = search_resp.json()
            
            self._updateCache(sr)
            
            return sr

Query Twitter with Python

Let’s now use our custom TwitterSearch class to find some tweets corresponding to the flu. To start, initialize the class providing the client key and client secret you obtained when creating your twitter application.

Upon instantiation our class, it will call the authentication function and print that authentication was successful.

twittersearch = TwitterSearch(client_key='################', 
                              client_secret='###################################################')

Next, call the Search function to obtain the results as shown below and store them in our search_results variable. I am also providing some coordinates, but most of these parameters are optional and you can leave out all together. The second line obtains the list of tweets from our results and the last line prints the search_metadata of our results.

search_results = twittersearch.Search('flu',count=200, lat=26.1079375, lon=-80.2592334, radius=50)
tweets = search_results['statuses']
print(search_results['search_metadata'])

You should now see the search metadata similar to my results below:

{'completed_in': 0.118, 'count': 100, 'max_id': 944302280307822593, 'max_id_str': '944302280307822593', 'next_results': '?max_id=943262111974686726&q=flu&geocode=26.107937%2C-80.259233%2C50mi&lang=en&count=100&include_entities=1&result_type=recent', 'query': 'flu', 'refresh_url': '?since_id=944302280307822593&q=flu&geocode=26.107937%2C-80.259233%2C50mi&lang=en&result_type=recent&include_entities=1', 'since_id': 0, 'since_id_str': '0'}

To view the first 10 tweets you can run the following code:

for i in range(10):
    print(tweets[i]['text'])

If there are more results, you can call our Get_Next_Results function which returns the same dictionary of responses.

twittersearch.Get_Next_Results()

Conclusion

We have now built our custom TwitterSearch class which authenticates and allows you to query Twitter’s Search API. You should be able to extend this class to fit your needs and hopefully you now have a better understanding of how to consume REST API’s with python.

 

MJ

Advanced analytics professional currently practicing in the healthcare sector. Passionate about Machine Learning, Operations Research and Programming. Enjoys the outdoors and extreme sports.

Related Articles