Introduction
Spotfire® can connect to, upload and download data from Amazon Web Services (AWS) S3 stores using either the in-built Python engine that comes with Spotfire 10.7 and above, your own custom Python (again 10.7 and above only), or using the Python Data Function for Spotfire if using Spotfire 10.6 or less. This is achieved using Amazon's Boto3 Python library. This article will provide and explain two code examples:
- Listing items in a S3 bucket
- Downloading items in a S3 bucket
These examples are just two demonstrations of the functionality available by using the Boto3 library in Spotfire. It can also be used to run any service such as SageMaker, Rekognize, and connect to other sources such as DynamoDB. Read the blog on doing image recognition in Spotfire using AWS to find out more.
Prerequisites
- For Spotfire 10.7 and above: no extra Python requirements as Python is
- For Spotfire 10.6 or less: the Python Data Function for Spotfire must be installed on your Spotfire instance or server. Part of this set up is to install Python also and some key libraries. See the linked guide for more details. When installing Python remember to set python in your PATH variable (the windows installer has an option you can tick to do this automatically for you).
- The AWS CLI should be installed and configured from Amazon if you want to use authentication through the CLI. You must have an active AWS account to be able to configure the AWS CLI. See instructions from AWS for full details. You can also set your authentication programmatically. Again, see details/examples from AWS.
- Install the Boto3 Python library. This can be done using Tools->Python Tools->Package Management in later versions of Spotfire. For earlier versions, pip will be required: see this article
Listing items in a S3 bucket
To connect to AWS we use the Boto3 python library. To connect to S3 we only need the bucket name to connect to. Follow the steps below to implement your Python data function to do this:
- Register a new data function (From Tools->Register Data Function menu), and select Python as your engine (if you have a red error that means Python is likely not added to your windows PATH variable).
-
Give your script a name and paste in the following code:
# Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved. from Python_Data_Function import * # Put package imports here # Please make sure you have the correct packages installed in your Python environment import pandas as pd import boto3 import tempfile def iterate_bucket_items(bucket): # Generator that iterates over all objects in a given s3 bucket # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 #for return data format #:param bucket: name of s3 bucket #:return: dict of metadata for an object client = boto3.client('s3') paginator = client.get_paginator('list_objects_v2') page_iterator = paginator.paginate(Bucket=bucket) for page in page_iterator: if page['KeyCount'] > 0: for item in page['Contents']: yield item if __name__ == "__main__": # Create arrays to hold bucket item properties item_names = [] item_sizes = [] item_last_modified = [] ## check we got a bucket name passed from Spotfire if bucketName != "": ## Iterate over bucket items for i in iterate_bucket_items(bucket=bucketName): item_names.append(i['Key']) item_sizes.append(i['Size']) ## Build dictionary from data returned and convert to a data frame S3Dict = { 'Name': item_names, 'Size': item_sizes } S3Table = pd.DataFrame.from_dict(S3Dict, orient='columns')
- Add an input parameter for the bucket name called bucketName
- And finally, an output of type Table called S3Table
- You can now click Run to save and run the function where you will be prompted to provide values for the inputs and outputs in your DXP.
You can further edit this function and its inputs/outputs from the Data->Data function properties menu.
When the function runs successfully, you will now have a new table called S3 Table in your Spotfire analysis.
Downloading items in a S3 bucket
Using the same input for a new data function, you can change the script to download the files locally instead of listing them. Here is the code:
# Copyright (c) 2017-2019 TIBCO Software Inc. All Rights Reserved. from Python_Data_Function import * # Put package imports here # Please make sure you have the correct packages installed in your Python environment import pandas as pd import boto3 import os import tempfile from datetime import datetime def iterate_bucket_items(bucket): # Generator that iterates over all objects in a given s3 bucket # See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2 #for return data format #:param bucket: name of s3 bucket #:return: dict of metadata for an object client = boto3.client('s3') paginator = client.get_paginator('list_objects_v2') page_iterator = paginator.paginate(Bucket=bucket) for page in page_iterator: if page['KeyCount'] > 0: for item in page['Contents']: yield item if __name__ == "__main__": ## Check we got a bucket name if bucketName != "": ## Set default location using temp director ## You could change this to what you like, or let user specify downloadLocation = tempfile.gettempdir() + "\\spotfire-temp" if not os.path.exists(downloadLocation): os.mkdir(downloadLocation) ## Loop over items found in bucket for i in iterate_bucket_items(bucket=bucketName): ## Set name and path to download to itemPathAndName = downloadLocation + "\\" + i['Key']; ## Check if file exists already if not os.path.exists(itemPathAndName): ## Download item boto3.resource('s3').Bucket(bucketName).download_file(i['Key'], itemPathAndName) # fix backslash issue downloadLocation = downloadLocation.replace('\\','\\\\') timeCompleted = datetime.now().strftime("%m/%d/%Y, %H:%M:%S") else: downloadLocation = "" timeCompleted = ""
In this instance, the script is returning the download location as a value, and a time completed as a value. These can be set as outputs of type Value to your data function when you register it, or simply ignored.
This script could be expanded to read the data downloaded if it was a CSV, or Excel file for example and then pass the data back as a Data Table to Spotfire.
Summary
In Spotfire you are able to run data functions automatically, when filtering or marking changes, or even triggered from an action control in a text area for instance. So you can easily control when you retrieve this list, or download items from your bucket. Using these scripts you can list items and download items from S3. You could combine the script above into one, which would list and download at the same time if need be.
License: TIBCO BSD-Style License
Recommended Comments
There are no comments to display.