Jump to content
  • List and download items from AWS S3 Buckets in Spotfire®


    This article will explain two code examples of how to List and download items from AWS S3 Buckets in Spotfire®

    Introduction

    Spotfire® can connect to, upload and download data from Amazon Web Services (AWS) S3 stores using either the in-built Python engine that comes with Spotfire 10.7 and above, your own custom Python (again 10.7 and above only), or using the Python Data Function for Spotfire if using Spotfire 10.6 or less. This is achieved using Amazon's Boto3 Python library. This article will provide and explain two code examples:

    1. Listing items in a S3 bucket
    2. Downloading items in a S3 bucket

    These examples are just two demonstrations of the functionality available by using the Boto3 library in Spotfire. It can also be used to run any service such as SageMaker, Rekognize, and connect to other sources such as DynamoDB. Read the blog on doing image recognition in Spotfire using AWS to find out more.

    Prerequisites

    • For Spotfire 10.7 and above: no extra Python requirements as Python is 
    • For Spotfire 10.6 or less: the Python Data Function for Spotfire must be installed on your Spotfire instance or server. Part of this set up is to install Python also and some key libraries. See the linked guide for more details. When installing Python remember to set python in your PATH variable (the windows installer has an option you can tick to do this automatically for you). 
    • The AWS CLI should be installed and configured from Amazon if you want to use authentication through the CLI. You must have an active AWS account to be able to configure the AWS CLI. See instructions from AWS for full details. You can also set your authentication programmatically. Again, see details/examples from AWS.
    • Install the Boto3 Python library. This can be done using Tools->Python Tools->Package Management in later versions of Spotfire. For earlier versions, pip will be required: see this article 

    Listing items in a S3 bucket

    To connect to AWS we use the Boto3 python library. To connect to S3 we only need the bucket name to connect to. Follow the steps below to implement your Python data function to do this:

    • Register a new data function (From Tools->Register Data Function menu), and select Python as your engine (if you have a red error that means Python is likely not added to your windows PATH variable).
    • Give your script a name and paste in the following code:
       
    # Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.
    
    from Python_Data_Function import *
    # Put package imports here
    # Please make sure you have the correct packages installed in your Python environment
    
    import pandas as pd
    import boto3
    import tempfile
    
    def iterate_bucket_items(bucket):
        # Generator that iterates over all objects in a given s3 bucket
        # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
        #for return data format
        #:param bucket: name of s3 bucket
        #:return: dict of metadata for an object
        client = boto3.client('s3')
        paginator = client.get_paginator('list_objects_v2')
        page_iterator = paginator.paginate(Bucket=bucket)
    
        for page in page_iterator:
            if page['KeyCount'] > 0:
                for item in page['Contents']:
                    yield item
    
    
    if __name__ == "__main__":
    
    	# Create arrays to hold bucket item properties
    	item_names = []
    	item_sizes = []
    	item_last_modified = []
    
    	## check we got a bucket name passed from Spotfire
    	if bucketName != "":
    		## Iterate over bucket items
    		for i in iterate_bucket_items(bucket=bucketName):
    			item_names.append(i['Key'])
    			item_sizes.append(i['Size'])
    			
    	## Build dictionary from data returned and convert to a data frame
    	S3Dict = { 'Name': item_names, 'Size': item_sizes }     
    	S3Table = pd.DataFrame.from_dict(S3Dict, orient='columns')
     
    • Add an input parameter for the bucket name called bucketName

    s3-1.png.bd75febc0661e5cc29fdc2bb31f60068.png

    • And finally, an output of type Table called S3Table

    s3-2.png.cdcde2d893a0414304ad77d648929790.png

    • You can now click Run to save and run the function where you will be prompted to provide values for the inputs and outputs in your DXP.

    You can further edit this function and its inputs/outputs from the Data->Data function properties menu.

    When the function runs successfully, you will now have a new table called S3 Table in your Spotfire analysis.

    Downloading items in a S3 bucket

    Using the same input for a new data function, you can change the script to download the files locally instead of listing them. Here is the code:

    # Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.
    
    from Python_Data_Function import *
    # Put package imports here
    # Please make sure you have the correct packages installed in your Python environment
    
    import pandas as pd
    import boto3
    import os
    import tempfile
    from datetime import datetime
    
    def iterate_bucket_items(bucket):
        # Generator that iterates over all objects in a given s3 bucket
        # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
        #for return data format
        #:param bucket: name of s3 bucket
        #:return: dict of metadata for an object
    
        client = boto3.client('s3')
        paginator = client.get_paginator('list_objects_v2')
        page_iterator = paginator.paginate(Bucket=bucket)
    
        for page in page_iterator:
            if page['KeyCount'] > 0:
                for item in page['Contents']:
                    yield item
    
    
    if __name__ == "__main__":
    	## Check we got a bucket name
    	if bucketName != "":
    		## Set default location using temp director
    		## You could change this to what you like, or let user specify
    		downloadLocation = tempfile.gettempdir() + "\\spotfire-temp"
    		if not os.path.exists(downloadLocation):
    			os.mkdir(downloadLocation)
    
    		## Loop over items found in bucket
    		for i in iterate_bucket_items(bucket=bucketName):
    			## Set name and path to download to
    			itemPathAndName = downloadLocation + "\\" + i['Key'];
    			## Check if file exists already
    			if not os.path.exists(itemPathAndName):
    				## Download item
    				boto3.resource('s3').Bucket(bucketName).download_file(i['Key'], itemPathAndName)
    
    		# fix backslash issue
    		downloadLocation = downloadLocation.replace('\\','\\\\')
    		timeCompleted = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
    	else:
    		downloadLocation = ""
    		timeCompleted = ""
     

    In this instance, the script is returning the download location as a value, and a time completed as a value. These can be set as outputs of type Value to your data function when you register it, or simply ignored. 

    This script could be expanded to read the data downloaded if it was a CSV, or Excel file for example and then pass the data back as a Data Table to Spotfire. 

    Summary

    In Spotfire you are able to run data functions automatically, when filtering or marking changes, or even triggered from an action control in a text area for instance. So you can easily control when you retrieve this list, or download items from your bucket. Using these scripts you can list items and download items from S3. You could combine the script above into one, which would list and download at the same time if need be.

    License:  TIBCO BSD-Style License

     

     


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...