Jump to content
  • Text Analytics - Sentiment, Translations and Beyond using AWS and Azure through Spotfire®


    This article is an original blog about using Text Analytics - Sentiment, Translations using Cloud Services through Spotfire

    1.thumb.png.c88c5efb4168271d616217e7a20e0b5b.png

    Introduction

    Understanding natural language has been a crucial skill for as long as we have been communicating. When performed in human interactions, we not only are listening and interpreting the words that are spoken but we are analyzing and drawing conclusions or suppositions from the tone, volume, and body language used when making this communication. 

    What happens when we have language communicated in a written form however. There are no tones, timings, or body language to analyse based upon the person directly. This led to many forms of written communication to imply an intent or tone i.e. formal, humourous, accusatory. However, with the explosion in mediums for written communication through websites such as blogs, apps for social media and the worldwide way we can now communicate, how can we interpret natural written text? Furthermore, with the mass volume and sources of this written text, we simply can't cope with the time required to do this manually. 

    This is where data science again comes into play. Machine learning and artificial intelligence models can be trained and constantly be learning from text. They can provide various services through these data science models such as:

    • Analysing the sentiment of any text i.e. how positive, negative the intent of the text is
    • Analyse the text for keywords and phrases used
    • Analyse for references to known entities i.e. products, names, places etc.
    • Instantly translating between languages

    These can play a major role in analysing text data for companies such as assessing sentiment from customer interactions or social media interactions such as twitter posts, analysing keywords and sentiment in customer emails, or publications in the media on any topic. It can provide services to remove language barriers for users so text is available in any language instantly, or drive content to users based upon their likes and interests.

    Watch my Dr.Spotfire video on this topic on YouTube

    Using Spotfire® with Sentiment Analysis and Beyond

    Previously I have written about using Spotfire® to produce interactive and highly visual tools for image recognition. This utilised Amazon Web Services (AWS) machine learning called Rekognize. In this blog I want to continue on this theme but expand its usage to perform natural language processing, and text analytics as described above. This time I wanted to compare and contrast the experience of using Microsoft's Azure services to that of Amazon's. 

    Again, in this blog we will be using the Spotfire Python data function as described before in my blog: https://community.spotfire.com/articles/spotfire/image-recognition-tibco-spotfirer-using-python-and-aws/

     

    Here is a short video of what we are going to build in our blog post today:

     

    Our Example Data

    For this blog I choose to use the AirBnB review data which you can download for many cities here: http://insideairbnb.com/get-the-data.html . From here I used the Edinburgh dataset being the most local to me, and downloaded the listings summary data as well as the reviews . Bringing this into Spotfire is incredibly simple as you simple add a local data files:

    our_example_data.png.d620bb20aaa66ff20d30f067e49a02b3.png

    From there I can use AI recommendations to get an overview of the data easily. For example, using AI recommendations I built this dashboard in very few clicks:

    3.thumb.png.b58184062f9cad9b798e1e6004c5c1a6.png

    Dashboard built using AI recommendations (shown on left)

    Setting up your Environment for Python and Spotfire

    Follow these summary steps to set up your environment to run Python through Spotfire

    1. Since this blog was written, Spotfire now includes ability to run Python - here is the FAQ
    2. Install Python locally on your machine, or server you run Spotfire from (gotcha - make sure you add Python to your PATH variable!)
    3. Use Pip to install any libraries you need:
    • Pandas is minimum required.
    • Boto3 is AWS Python library - pip install boto3
    • Azure - pip install azure
    • Azure - then needs individual service library installed depending on your service:
    • pip install --upgrade azure-cognitiveservices-language-textanalytics

    In this blog we are using Comprehend and Translate service from amazon:

    For Azure we used Cognitive Services for both text and translations:

    Building your Text and Sentiment Analytics Spotfire Tool

    Here is the machine learning dashboard I built which calls two services in AWS and Azure covering sentiment, key phrase extraction, language detection and translation (to English):

    4.thumb.png.d3247c722d0a5045b2fe3c67f91f1fb7.png

    Completed dashboard - calling text analytics in the cloud

    To call any cloud service from Spotfire using the Spotfire Python data function follows the same pattern:

    5.png.43ca908fc11364883cbef486bbad2654.png

    Flow for calling cloud services from Spotfire

    In Spotfire I register a new Python data function (Tools->Register Data Function).One for AWS and another for Azure. You could combine these into one data function but the advantage of having them separate means you can call them simultaneously, and have better control on when they are called, as well as code management.

    (Note that all the code examples and setup required are explained in more detail in this article article: https://community.spotfire.com/s/article/text-analytics-sentiment-analysis-key-phrases-and-translations-spotfire-using-aws-and)

    # Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.
    
    from Python_Data_Function import *
    # Put package imports here
    # Please make sure you have the correct packages installed in your Python environment
    
    import pandas as pd
    import boto3
    
    comprehend = boto3.client(service_name='comprehend', region_name='eu-west-1')
    
    if __name__ == "__main__":
    
    	## Empty results list
    	results_list = []
    
    	## Empty df to pass back if no results
    	sentiment_results = pd.DataFrame(columns=(idColumnName,'Mixed', 'Negative', 'Positive', 'Sentiment'))
    
    	## Loop text in table - note AWS has a batch mode that may be more efficient to use
    	for index, row in inputTable.iterrows():
    		if not pd.isna(row[idColumnName]):
    			##  run text analytics
    			text_results = comprehend.detect_sentiment(Text=row[textColumnName], LanguageCode='en')
    			text_results['SentimentScore']['Sentiment'] = text_results['Sentiment']
    			text_results['SentimentScore'][idColumnName] = int(row[idColumnName])
    			results_list.append(text_results['SentimentScore'])
    
    	if len(results_list) > 0:
    		sentiment_results = pd.DataFrame.from_dict(results_list, orient='columns')
     

    In this instance I used the AWS CLI install to authenticate which means I do not need to expose the credentials in the code. However, you can specify credentials in the code in AWS and Azure. My inputs for this data function are defined as here:

    6.png.2f4505af85e86b637291acfad7ae8564.png

    And the outputs:

    7.png.60dc8ec761ea748e067f9f1843422c2d.png

    Sending our review data to this function specifying the review_id column as the idColumnName, and the comments column as the textColumnName input, we get a sentiment table such as this from AWS:

    8.png.4ce52df5f34bfc59e5d63c330a8d1388.png

    Here we can see amazon give you a score from 0 to 1 for each sentiment class they have i.e. mixed, negative, neutral and positive as well as a final sentiment result. 

    Let's compare this to Azure code and output:

    Our Azure code is as follows:

    # Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.
    
    from Python_Data_Function import *
    # Put package imports here
    # Please make sure you have the correct packages installed in your Python environment
    
    #-----------------------------------------------------------------------------
    
    #Libraries
    
    from azure.cognitiveservices.language.textanalytics import TextAnalyticsClient
    from msrest.authentication import CognitiveServicesCredentials
    import pandas as pd
    import numpy as np
    
    #-----------------------------------------------------------------------------
    
    # Azure Text Analytics Endpoint Configuration
    assert keyTextAnalytics
    
    # Set credentials
    credentials_text = CognitiveServicesCredentials(keyTextAnalytics)
    text_analytics = TextAnalyticsClient(endpoint=endpointTextAnalytics, credentials=credentials_text)
    
    #-----------------------------------------------------------------------------
    
    if __name__ == "__main__":
        
        ## Empty results list
    	sentiment_results_list = []
    
        ## Empty df to pass back if no results
    	sentiment_results = pd.DataFrame(columns=('Sentiment',idColumnName,'Language','Sentiment_Category'))
        
        ## Loop text in table
    	for index, row in inputTable.iterrows():
    		if not pd.isna(row[idColumnName]):
                
                ## Convert to required format by azure
    			documents = [{"id": row[idColumnName],"text" : row[textColumnName]}]
            
    			## Run Azure sentiment analysis
    			text_sentiment_result = text_analytics.sentiment(documents=documents)
    			sent_result_dict = {}
    			sent_result_dict.update({"Sentiment": text_sentiment_result.documents[0].score,
    									 idColumnName: text_sentiment_result.documents[0].id, 
    									 "Language": 'en'})
    
    			## Azure doesn't define sentiment categories so lets define our own
    			conditions = [(sent_result_dict['Sentiment'] >= 0.6),
    						  (sent_result_dict['Sentiment'] > 0.35) & (sent_result_dict['Sentiment'] < 0.6),
    						  (sent_result_dict['Sentiment'] <= 0.35)]
    			choices = ['positive', 'neutral', 'negative']
    
    			## Define sentiment category selected
    			sent_result_dict['Sentiment_Category'] = np.select(conditions, choices, default='')
    			sentiment_results_list.append(sent_result_dict)
    
    	if len(sentiment_results_list) > 0:
    		sentiment_results = pd.DataFrame.from_dict(sentiment_results_list, orient='columns')
     

    Note that this code sets the credentials in the code instead of using the Azure CLI (as we did with AWS). This is purely for comparison in terms of code. Both Azure and AWS can use either method. As I don?t want to expose any credentials in the code, I have added two extra input parameters for the Azure data function, which are the azure key and the azure service endpoint:

    9.png.c060b9ddcf0ffb5920876a2db6c1a42a.png

    Running this code on some reviews returns a table from Azure that looks like this:

    10.png.6080742e44100ea4b569446a51e865b4.png

    Here we do not have a score per sentiment category compared to AWS. This is because whereas AWS returns a JSON object which has a score per sentiment category and the overall selected sentiment, Azure simply returns a score from 0 to 1. It is then up to you as a user to determine how to represent this number. Essentially the closer to 1, the more positive sentiment. In our case we have defined this into 3 groups in the code above negative, neutral and positive from 0 to 0.35, 0.35 to 0.6 and 0.6 and above respectively. 

    Data Preparation

    Of course, all good data science tasks involve data preparation to get the best out of the models. Jobs such as data cleansing, transformations, standardisations etc. are common place, and often followed by feature engineering. However, our data science task is a simple one - analyse natural text. The data format supplied it is already in a form that we can send to AWS and Azure to perform our text analytics so don't need transformations in this case. However, text can contain a lot of extra data which either causes issues for text analytics, or python. For instance you may want to remove stop words, punctuation, stem words or use lemmatization. However, in our case we want to retain this information as it may be important for sentiment analysis. So we simple clean out any extra characters, and new lines to prevent issues in the code and cloud service calls.

    Spotfire has an extensive expression language, so we can easily achieve this using regular expressions and creating new columns. Below is the calculated column I used in this example:

     Substitute(RXReplace(RXReplace(RXReplace(RXReplace(Substitute([comments],"&","&"),"https\\://.*","","g"),"[\\n\\t\\r]"," ","g"),"[^\\w\\s\\-\\?\\.]*","","g"),"\\?+","?","g"),"?.","?") 
     

    This removes non alphanumerics, new lines, tabs etc. as well as getting rid of any urls and standardising ampersands.

    Putting it All Together

    Just using my AWS or Azure data, I can put together a simple sentiment dashboard tool in Spotfire as shown below:

    11(1)(1).thumb.png.7559424bdd33870051748fedcc08d125.png

    Simple Sentiment Dashboard

    We are using a map to select property reviews to analyse for sentiment and when we select/mark some properties on the map, the reviews are sent to AWS or Azure for analysis. We then create some KPI charts to summarise sentiment, and have a cross table to show the individual reviews with their sentiment.

    If you are wondering how I have shown satellite imagery on my Spotfire map, then watch this video by Neil Kanungo:

    Expanding to Key phrases, and Translations

    We can of course exploit additional services from AWS and Azure to also extract key phrases, entities and translate text between languages. For this we just have to expand the code we already have to handle these options and call the relevant services. Our end product in Spotfire is a tool such as:

    12.thumb.png.b00d6367d2d5528d04f06cef699448af.png

    Which compares Azure and AWS results side by side as well as doing keywords and translations. Note that I have used property controls in Spotfire and added these to a text area, to tell our Python data functions which cloud services to run:

    13.png.49d64d09af52fef0f92bc858ea573a8f.png

    Giving the user the option of which cloud services to run

    We can then pass these to the Python data function as true/false values to check and handle as appropriate.

    The inputs and outputs are the same as AWS with the exception of passing the credentials and endpoints needed in this script for authentication (see description above on authentication).

    You can also watch a live and full explanation of how these examples work on our YouTube channel:

    Please feel free to ask any questions on the Spotfire community with a link to this blog.

    License:  TIBCO BSD-Style License


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...