Jump to content
We've recently updated our Privacy Statement, available here. ×

Whole folder of files as data source


Recommended Posts

Hello. I have a practical question. Maybe somebody have a solution already. I have CSVs in a Hadoop folder (CSVs has the same structure of columns). I can pick one CSV as the data source in Team Studio but also I can pick the whole folder as data source (by drag and droping into canvas). In that case, I will have one file with all the data. But I would like to have also names of the files inside this folder as one column. Does anybody know how to do that
Link to comment
Share on other sites

Hi Tomas,

 

You may use a pySpark script from a Notebook within TIBCO Data Science Team Studio (formerly TIBCO Spotfire Data Science) to achieve that.

 

You'd loop through all the files in the folderto read each fileinto a Spark dataframe, add the filename column to each dataframe and then union them all form one table.

 

You may also use a Python script with pandas dataframesinstead of using pySpark to do the equivalent. The difference is thatthe data is moved to the Python environment for the manipulation. Thiswould beok if the dataset is not huge.

 

TIBCO Data Science Team Studio provides a convenient Python helper class called Chorus Commander ('cc') with APIs for reading (and writing) data in Notebooks. Here's an example of how it is used.

 

# Path of the file on HDFS

input_path = '/myfolder/myfile.csv'

 

# Read input as Spark dataframe

input_df = cc.read_input_file(input_path, sqlContext=sqlContext, header=True, use_input_substitution=False)

When the sqlContext is provided as an argument, the file is read as a Spark dataframe. If it is left out, the file is read as a pandas dataframe.

 

I'll be writing some posts on using pySpark inTIBCO Data Science Team Studio Notebooks and will post the links here.

 

Chia-Yui

TIBCO Data Science Team

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...