List and download items from AWS S3 Buckets in Spotfire®

This article will explain two code examples of how to List and download items from AWS S3 Buckets in Spotfire®

Introduction

Spotfire® can connect to, upload and download data from Amazon Web Services (AWS) S3 stores using either the in-built Python engine that comes with Spotfire 10.7 and above, your own custom Python (again 10.7 and above only), or using the Python Data Function for Spotfire if using Spotfire 10.6 or less. This is achieved using Amazon's Boto3 Python library. This article will provide and explain two code examples:

Listing items in a S3 bucket
Downloading items in a S3 bucket

These examples are just two demonstrations of the functionality available by using the Boto3 library in Spotfire. It can also be used to run any service such as SageMaker, Rekognize, and connect to other sources such as DynamoDB. Read the blog on doing image recognition in Spotfire using AWS to find out more.

Prerequisites

For Spotfire 10.7 and above: no extra Python requirements as Python is
For Spotfire 10.6 or less: the Python Data Function for Spotfire must be installed on your Spotfire instance or server. Part of this set up is to install Python also and some key libraries. See the linked guide for more details. When installing Python remember to set python in your PATH variable (the windows installer has an option you can tick to do this automatically for you).
The AWS CLI should be installed and configured from Amazon if you want to use authentication through the CLI. You must have an active AWS account to be able to configure the AWS CLI. See instructions from AWS for full details. You can also set your authentication programmatically. Again, see details/examples from AWS.
Install the Boto3 Python library. This can be done using Tools->Python Tools->Package Management in later versions of Spotfire. For earlier versions, pip will be required: see this article

Listing items in a S3 bucket

To connect to AWS we use the Boto3 python library. To connect to S3 we only need the bucket name to connect to. Follow the steps below to implement your Python data function to do this:

Register a new data function (From Tools->Register Data Function menu), and select Python as your engine (if you have a red error that means Python is likely not added to your windows PATH variable).
Give your script a name and paste in the following code:

# Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.

from Python_Data_Function import *
# Put package imports here
# Please make sure you have the correct packages installed in your Python environment

import pandas as pd
import boto3
import tempfile

def iterate_bucket_items(bucket):
    # Generator that iterates over all objects in a given s3 bucket
    # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
    #for return data format
    #:param bucket: name of s3 bucket
    #:return: dict of metadata for an object
    client = boto3.client('s3')
    paginator = client.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket)

    for page in page_iterator:
        if page['KeyCount'] > 0:
            for item in page['Contents']:
                yield item


if __name__ == "__main__":

	# Create arrays to hold bucket item properties
	item_names = []
	item_sizes = []
	item_last_modified = []

	## check we got a bucket name passed from Spotfire
	if bucketName != "":
		## Iterate over bucket items
		for i in iterate_bucket_items(bucket=bucketName):
			item_names.append(i['Key'])
			item_sizes.append(i['Size'])
			
	## Build dictionary from data returned and convert to a data frame
	S3Dict = { 'Name': item_names, 'Size': item_sizes }     
	S3Table = pd.DataFrame.from_dict(S3Dict, orient='columns')

Add an input parameter for the bucket name called bucketName

And finally, an output of type Table called S3Table

You can now click Run to save and run the function where you will be prompted to provide values for the inputs and outputs in your DXP.

You can further edit this function and its inputs/outputs from the Data->Data function properties menu.

When the function runs successfully, you will now have a new table called S3 Table in your Spotfire analysis.

Downloading items in a S3 bucket

Using the same input for a new data function, you can change the script to download the files locally instead of listing them. Here is the code:

# Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved.

from Python_Data_Function import *
# Put package imports here
# Please make sure you have the correct packages installed in your Python environment

import pandas as pd
import boto3
import os
import tempfile
from datetime import datetime

def iterate_bucket_items(bucket):
    # Generator that iterates over all objects in a given s3 bucket
    # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 
    #for return data format
    #:param bucket: name of s3 bucket
    #:return: dict of metadata for an object

    client = boto3.client('s3')
    paginator = client.get_paginator('list_objects_v2')
    page_iterator = paginator.paginate(Bucket=bucket)

    for page in page_iterator:
        if page['KeyCount'] > 0:
            for item in page['Contents']:
                yield item


if __name__ == "__main__":
	## Check we got a bucket name
	if bucketName != "":
		## Set default location using temp director
		## You could change this to what you like, or let user specify
		downloadLocation = tempfile.gettempdir() + "\\spotfire-temp"
		if not os.path.exists(downloadLocation):
			os.mkdir(downloadLocation)

		## Loop over items found in bucket
		for i in iterate_bucket_items(bucket=bucketName):
			## Set name and path to download to
			itemPathAndName = downloadLocation + "\\" + i['Key'];
			## Check if file exists already
			if not os.path.exists(itemPathAndName):
				## Download item
				boto3.resource('s3').Bucket(bucketName).download_file(i['Key'], itemPathAndName)

		# fix backslash issue
		downloadLocation = downloadLocation.replace('\\','\\\\')
		timeCompleted = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
	else:
		downloadLocation = ""
		timeCompleted = ""

In this instance, the script is returning the download location as a value, and a time completed as a value. These can be set as outputs of type Value to your data function when you register it, or simply ignored.

This script could be expanded to read the data downloaded if it was a CSV, or Excel file for example and then pass the data back as a Data Table to Spotfire.

Summary

In Spotfire you are able to run data functions automatically, when filtering or marking changes, or even triggered from an action control in a text area for instance. So you can easily control when you retrieve this list, or download items from your bucket. Using these scripts you can list items and download items from S3. You could combine the script above into one, which would list and download at the same time if need be.

License: TIBCO BSD-Style License

Sign In

List and download items from AWS S3 Buckets in Spotfire®

Introduction

Prerequisites

Listing items in a S3 bucket

Downloading items in a S3 bucket

Summary

Table of contents

User Feedback

Recommended Comments

Industries