Aqif Mukhtar 2 Posted June 15, 2021 Share Posted June 15, 2021 Hi I have two data sources D1 and D2. Both have event dates. I want to lookup that from the D1 event date, what was the next closest date in D2 for the same person. To do this I have addedrows from D2 under D1 to bring all dates in one table. I have two tables like this D1: SOURCE PERSON_ID D1_EVENT_DATETIME D1 A123 13/11/2020 D1 A124 28/02/2020 D1 A125 15/08/2019 D1 A126 6/12/2017 D1 A127 29/11/2017 D2: SOURCE PERSON_ID D2_EVENT_DATETIME D2 A128 9/02/2021 17:07 D2 A129 3/12/2020 17:15 D2 A130 30/11/2020 11:39 D2 A131 30/11/2020 5:13 D2 A132 27/10/2019 10:30 D2 A133 20/12/2018 15:14 D2 A134 20/12/2018 12:40 D2 A135 20/12/2018 12:40 D2 A136 10/12/2018 13:21 D2 A137 9/10/2018 1:25 D2 A138 4/08/2018 17:13 So waht I want is that first date in D1 13/11/2020. The next date in D2 is 30/11/2020. Second row date in D1 is 28/02/2020. For this the next date in D2 will also be 30/11/2020. The third row is 15/08/2019. For this the next date in D2 should be 27/10/2019. I have merged the rows from D2 below D1 and created a single merge date column. What result I want it below as I have manually put in the dates in bold ROWID SOURCE PERSON_ID D1_EVENT_DATETIME D2_EVENT_DATETIME DATE_MERGE NEXT_D2_EVENT 6 D1 A123 13/11/2020 13/11/2020 16:35 30/11/2020 14 D1 A124 28/02/2020 28/02/2020 16:15 30/11/2020 12 D1 A125 15/08/2019 15/08/2019 17:00 27/10/2019 13 D1 A126 6/12/2017 6/12/2017 13:00 4/08/2018 15 D1 A127 29/11/2017 29/11/2017 15:10 4/08/2018 19 D2 A128 9/02/2021 17:07 9/02/2021 17:07 9/02/2021 18 D2 A129 3/12/2020 17:15 3/12/2020 17:15 9/02/2021 17 D2 A130 30/11/2020 11:39 30/11/2020 11:39 9/02/2021 25 D2 A131 30/11/2020 5:13 30/11/2020 5:13 30/11/2020 24 D2 A132 27/10/2019 10:30 27/10/2019 10:30 30/11/2020 16 D2 A133 20/12/2018 15:14 20/12/2018 15:14 30/11/2020 26 D2 A134 20/12/2018 12:40 20/12/2018 12:40 20/12/2018 23 D2 A135 20/12/2018 12:40 20/12/2018 12:40 20/12/2018 22 D2 A136 10/12/2018 13:21 10/12/2018 13:21 20/12/2018 21 D2 A137 9/10/2018 1:25 9/10/2018 1:25 20/12/2018 20 D2 A138 4/08/2018 17:13 4/08/2018 17:13 20/12/2018 What ever date will appear in D2 rows won't matter as I will filter that out. So far the closest formula that's half working is this Last([DATE_MERGE]) OVER (INTERSECT([PERSON_ID],AllNext([DATE_MERGE]))) I works fine for first two rows but after that is not the result I want. Any help will be appreciated SOURCE PERSON_ID D1_EVENT_DATETIME DATE_MERGE NEXT_D2_EVENT D1 A123 13/11/2020 13/11/2020 16:35 30/11/2020 D1 A124 28/02/2020 28/02/2020 16:15 30/11/2020 D1 A125 15/08/2019 15/08/2019 17:00 27/10/2019 D1 A126 6/12/2017 6/12/2017 13:00 4/08/2018 D1 A127 29/11/2017 29/11/2017 15:10 4/08/2018 Link to comment Share on other sites More sharing options...
Fabian Duerr Posted June 15, 2021 Share Posted June 15, 2021 In your tables I can see person_id A123-A127 in D1 and person_id A128-A138 in D2... So, it's not really clear to me where your next dates (per person_id) are coming from. Also, I expect that you can habe multiple dates for each person:id in each table. Can you please clarify BTW Are you familiar with data functions Link to comment Share on other sites More sharing options...
Aqif Mukhtar 2 Posted June 16, 2021 Author Share Posted June 16, 2021 Hi fabd My appologies as the person ID is wrong because I was trying to de-identify the data in excel and it auto-populated the person id. I have two datastes with date representing the appointment date and I will try and explain what I want Dataset 1 - D1 SOURCE PERSON_ID D1_APPT_DATE D1 A123 13/11/2020 D1 A123 28/02/2020 D1 A123 15/08/2019 D1 A123 6/12/2017 D1 A123 29/11/2017 Dataset 2 - D2 SOURCE PERSON_ID D2_APPT_DATE D2 A123 9/02/2021 D2 A123 3/12/2020 D2 A123 30/11/2020 D2 A123 30/11/2020 D2 A123 27/10/2019 D2 A123 10/12/2018 D2 A123 9/10/2018 D2 A123 4/08/2018 Both table contain data for one person but in the actual table there are thousands of persons. The aim is that for each appointment date in Dataset 1, bring next closest date from Dateset 2 with the results looking like this: ROWID SOURCE PERSON_ID D1_APPT_DATE NEXT_D2_APPT DATE 6 D1 A123 13/11/2020 30/11/2020 14 D1 A123 28/02/2020 30/11/2020 12 D1 A123 15/08/2019 27/10/2019 13 D1 A123 6/12/2017 4/08/2018 15 D1 A123 29/11/2017 4/08/2018 As I know that Spotfire can't lookup values from other tables so I tried to add combine row from both datasets in to one table and then tried to look for the next date. The combined dataset looks like this ROWID SOURCE PERSON_ID D1_APPT_DATE D2_APPT_DATE DATE_MERGE 6 D1 A123 13/11/2020 13/11/2020 14 D1 A123 28/02/2020 28/02/2020 12 D1 A123 15/08/2019 15/08/2019 13 D1 A123 6/12/2017 6/12/2017 15 D1 A123 29/11/2017 29/11/2017 19 D2 A123 9/02/2021 9/02/2021 18 D2 A123 3/12/2020 3/12/2020 17 D2 A123 30/11/2020 30/11/2020 25 D2 A123 30/11/2020 30/11/2020 24 D2 A123 27/10/2019 27/10/2019 22 D2 A123 10/12/2018 10/12/2018 21 D2 A123 9/10/2018 9/10/2018 20 D2 A123 4/08/2018 4/08/2018 I am familar with data functions so any lead in that direction would be appreciated. Link to comment Share on other sites More sharing options...
Fabian Duerr Posted June 16, 2021 Share Posted June 16, 2021 Thanks for clarification. My approach would be to use a data function that takes both tables D1 and D2 as input. (There's no need to combine them for this solution). The output could be either a new table or simply a column that you add to your master table D1. Let's go with the column solution for now. All you need to do is to loop thru every row of D1 and extract the person_id and date. Then filter your table D2 to this person_id and to dates larger than the extracted date. The smallest remaining date is the date you are looking for. Store those values into a vector that you initialize in the beginning with NAs (for example NEXT.DATES <- rep(NA, nrow(D1)) ). This vector is your output column that you add to table D1. This should be a simple loop and just a few lines of code. I hope you understand the idea. Please let me know your thoughts Link to comment Share on other sites More sharing options...
Aqif Mukhtar 2 Posted June 16, 2021 Author Share Posted June 16, 2021 Hi While I understand what you're suggesting but I am new to data functions so would it be possible for you to post sample script that can get me going Thanks Link to comment Share on other sites More sharing options...
Fabian Duerr Posted June 16, 2021 Share Posted June 16, 2021 Please read these discussions and follow the tutorial that I provided therein: https://community.spotfire.com/questions/how-use-value-n-th-row-calculated-... https://community.spotfire.com/questions/adding-value-previous-row-current-... https://community.spotfire.com/questions/self-referential-calculated-column This should help you with your data function. Are you familiar with R Or are you also a beginner in R programming Link to comment Share on other sites More sharing options...
Aqif Mukhtar 3 Posted June 16, 2021 Share Posted June 16, 2021 Hi By using Rank Function I think I am quite close to the desired output. Just one last bug that I will explain Still using the Date Merge approach, first I created a Rank Column as: Rank([DATE_MERGE],"asc",[PERSON_ID],"asc") Then I created another column to lookup next occuring date: First([DATE_MERGE]) OVER (intersect([PERSON_ID],Next([RANK_PERSON_DATE]))) That has given me quite close to what I want except that if the next occuring date is in Dataset D1, it brings it up. I want it to totally ignore Dataset 1 and only bring next date from dataset 2. My outpuut looks like this: Row SOURCE PERSON_ID D1_APPT_DATE D2_APPT_DATE DATE_MERGE RANK_PERSON_DATE NEXT_ROW_DATE NEXT_DATE_SOURCE 1 D1 A123 13/11/2020 13/11/2020 9 30/11/2020 D2 2 D1 A123 28/02/2020 28/02/2020 8 13/11/2020 D1 3 D1 A123 15/08/2019 15/08/2019 6 27/10/2019 D2 4 D1 A123 6/12/2017 6/12/2017 2 4/08/2018 D2 5 D1 A123 29/11/2017 29/11/2017 1 6/12/2017 D1 6 D2 A123 9/02/2021 9/02/2021 13 7 D2 A123 3/12/2020 3/12/2020 12 9/02/2021 D2 8 D2 A123 30/11/2020 30/11/2020 10 3/12/2020 D2 9 D2 A123 30/11/2020 30/11/2020 10 3/12/2020 D2 10 D2 A123 27/10/2019 27/10/2019 7 28/02/2020 D1 11 D2 A123 10/12/2018 10/12/2018 5 15/08/2019 D1 12 D2 A123 9/10/2018 9/10/2018 4 10/12/2018 D2 13 D2 A123 4/08/2018 4/08/2018 3 9/10/2018 D2 As you can see that for rows 2 and 5 it finds next date in Dataset 1, which is correct according to the expression. I somehow want to bypass date lookup in Dataset 1. I am quite close but result still not perfect. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now