Paul Hickford 2 Posted April 2 Posted April 2 Hi, I have a data set that contains various columns, including a START and END date. I want to expand each date range with a row per month in the date period, so for example a set that starts looking like this: Start End Person Project 01-Jan-24 01-Apr-24 name1 project1 01-Feb-24 01-Apr-24 name1 project2 01-Jan-24 01-Mar-24 name2 project3 01-Feb-24 01-Apr-24 name2 project4 Would be expanded to more rows like this where there is a row for the 1st of the Month for every month a task is ongoing (in my test data the tasks all end on the 1st of the month but this isn't always the case). In this example the first 3 rows are from the 1st input row, then the next 2 are from the second input row and so on: Start End Person Project 01-Jan-24 01-Apr-24 name1 project1 01-Feb-24 01-Apr-24 name1 project1 01-Mar-24 01-Apr-24 name1 project1 01-Feb-24 01-Apr-24 name1 project2 01-Mar-24 01-Apr-24 name1 project2 01-Jan-24 01-Mar-24 name2 project3 01-Feb-24 01-Mar-24 name2 project3 01-Feb-24 01-Apr-24 name2 project4 01-Mar-24 01-Apr-24 name2 project4 It's similar to this example I've found in some old documentation https://support.tibco.com/s/article/How-to-create-a-continuous-date-range-given-start-and-end-dates-in-TIBCO-Spotfire-with-Data-Functions but I've not used R within Spotfire (or much at all) so am struggling to implement this. Is anyone able to point me in the right direction?
Paul Hickford 2 Posted April 2 Author Posted April 2 As a possibly clearer (or possibly not) indication of what I'm trying to do, the Sheet1 on left is the format of my original data set, the Sheet2 on right is an example I have manually prepared, but would like to automate on a large set. Also an example of the kind of output my users are looking for, tracking various outputs over time from a set of data loaded with start/end dates.
David Boot-Olazabal Posted April 2 Posted April 2 Hi Paul, Are you able to solve the above using a modified SQL Query? The article you are referring to not only proposes 2 solutions involving TERR, but also a 'SQL Query' solution. If that is also applicable to your use case, you can start with a query that is explained here. Kind regards, David
Paul Hickford 2 Posted April 2 Author Posted April 2 (edited) Yes, I'm looking at doing this in the database using SQL, but the person responsible for the database in question is currently reluctant to deploy the view that I had prepared. The article you linked too may be useful, that's not a function I'm aware of so will investigate. I don't want to spread the values out across the rows, but want to repeat them. I'd hoped to be able to adapt the examples in the article, but these seem to sum the values whereas I need a defined start and end date. Edited April 2 by Paul Hickford 2
Paul Hickford 2 Posted April 2 Author Posted April 2 Hmm, if I try and run the SQL suggested in Spotfire I just get a wall of errors, but running the same query directly in database either seems to run forever in a test set limited to a single user, or gets an ORA-01436 CONNECT BY loop in user data if I remove the dbms_random.string line. I'm on the verge of giving up and just accepting that Spotfire can't do this even though it feels like it should be trivial, and it seems to be a simple request in a lot of other software. I can make the transformation quite easily in Knime but was hoping to remove the need for any manual data manipulations.
Gaia Paolini Posted April 2 Posted April 2 You can do it with a data function, as you are generating a table that has more rows than the initial. Here I have done it with Python. You need to have the Python packages pandas and dateutils available. Probably dateutils needs installing with Menu > Tools > Python tools > Package management. import pandas as pd from dateutil.relativedelta import relativedelta # Expand as many rows as difference in months def months_delta(a,b): r = relativedelta(b, a) return (r.years * 12) + r.months #Add a running month to the start date def running_month(a,m): aa = a+relativedelta(months=m-1) return aa.strftime("%d-%b-%Y") df['TASK_START'] = pd.to_datetime(df['TASK_START']) df['TASK_END'] = pd.to_datetime(df['TASK_END']) df['__DELTA']= df.apply(lambda x: months_delta(x['TASK_START'], x['TASK_END']), axis=1) df2=df.reindex(df.index.repeat(df['__DELTA'])) df2['__MONTH_INDEX'] = df2.groupby(['USER','PROJECT']).cumcount()+1 df2['TASK_START']= df2.apply(lambda x: running_month(x['TASK_START'], x['__MONTH_INDEX']), axis=1) df2['TASK_END'] = df2['TASK_END'].dt.strftime("%d-%b-%Y") df2.drop(columns=['__MONTH_INDEX','__DELTA'],inplace=True)
Andre Kooy Posted April 3 Posted April 3 Hi Paul Have you tried to implement the SQL in an information link, in stead of having it run in the database? Kind regards, Andre
Andre Kooy Posted April 3 Posted April 3 Hi Paul, It can also be done just in Spotfire. You have your original file (data table 1). Add data table 2 that has all months of the year and a dummy column with value "X". Now create data table 3 that does a pivot on data table 1, that has only the project names as identifiers (so just 1 column). Then add a transformation for a calculated column called Dummy with value "X" and add it to data table 3. This gives you a table with all months for all projects. Then create data table 4 with the following steps: 1) load data table 1 2) join data table 3, joining on project name. 3) add the following calculated column If([Month]>[End],"Exclude","Include") Now filter to the included rows only. I added the dxp I created as an example Extra rows.dxp
Paul Hickford 2 Posted April 15 Author Posted April 15 Thanks all for the suggestions and apologies for not responding sooner, I will try and have a proper look at these tomorrow when I'm back in the office.
Paul Hickford 2 Posted April 17 Author Posted April 17 Many thanks all for the suggestions. I had a look at the two approaches of Python and using the separate tables in the Data Canvas. I got the Python version working, but I the Data Canvas approach seems to have much better performance; I set the various dummy columns and filters to be used in the necessary transformations and then removed so the end users won't need to deal with any extra columns. It will break in a hundred years or so when my list of months runs out, but will deal with that later. 1
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now