Jump to content
  • Python data function to merge and load multiple Excel (.xls) files from a directory / folder


    This article contains steps to implement a Python data function to merge and load multiple Excel (.xls) files from a directory / folder

    Overview

    Below are the steps to implement a simple Python data function that imports and merges multiple Excel (.xls) files into Spotfire as a single datatable.

    The data function (which can downloaded) assumes the Excel files all have the same structure and are located under the same folder / directory.

     

    Steps to implement

    To create a new data function select Data > Data function properties > Register New.. from the Spotfire menu.

    Update the Edit Script dialog with:

    • Name: This will be the name displayed in the Spotfire data canvas / library.
    • Type: Python script
    • Packages: xlrd
    • Script: Copy and paste the script below

    Register data function

     

    Python code

    import pandas as pd
    import os
    import glob
    
    def merge_excel_files(folder_path):
        # Create a list to hold dataframes
        df_list = []
        
        # Use glob to find all Excel files in the folder
        file_pattern = os.path.join(folder_path, '*.xls')
        excel_files = glob.glob(file_pattern)
        
        # Loop through the files and read each into a dataframe
        for file in excel_files:
            df = pd.read_excel(file)
            df_list.append(df)
        
        # Concatenate all dataframes into one
        merged_df = pd.concat(df_list, ignore_index=True)
        
        return merged_df
    
    # Output Spotfire datatable
    result = merge_excel_files(folder_path)

     

    Click Add on the Input Parameters tab to create a parameter with the following properties:

    • Name: folder_path
    • Type: Value
    • Allowed Data Types: String

     

    Screenshot2024-05-09at12_50_34.thumb.png.8efc8b4a7e464fe0c717bfdfe5da2b63.png

    Click Add on the Output Parameters tab to create a parameter with the following properties:

    • Name: result
    • Type: Table

     

    Screenshot2024-05-09at12_51_26.thumb.png.cf76d51c9b2cebbc09349ae868ec05b6.png

    Click Screenshot2024-05-09at13_33_13.png.b2bd1e2a18ff1060bed4d20cb87e873b.png to execute the data function and in the following screen (Edit Parameters) set the Input Handler to Document property. Click New... to create a new document property (with a String data type) whose value is set to the directory folder containing the Excel file(s).

     

    Screenshot2024-05-09at12_52_20.thumb.png.63d8360e33eaf97800e29cb47ca6d409.png

    Under the Output tab, set the Output handler to Data table, select the option to Create new data table: and provide a name for the Spotfire datatable that will be created by the Python data function.

    Screenshot2024-05-09at12_53_00.thumb.png.dd1586763d42f6094f3236c5b012796d.png

     

    Click OK to run the data function.

    The image below shows what the configured data function should look like when viewed using the Spotfire data canvas.

    Screenshot2024-05-09at12_53_45.thumb.png.b5f455356561ce9e0f00612374f041da.png

     

     

    Optional /extra step

    Note the script requires the xlrd Python package (see documentation: https://xlrd.readthedocs.io/en/latest/). To install this into Spotfire, navigate to Tools > Python Tools > Package Management search for xlrd in Available Pacckages:, select xlrd from the list of packages displayed and click Install.

     

    Screenshot2024-05-09at13_48_29.png.5ae57e7778534259e1b18fe6596d3d16.png

     

     

     

    Log in to download this file

    • Thanks 1

    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...