Jump to content

Join tables on date range


Thomas Siragusa

Recommended Posts

This is a two part question.

I need to join two tables (that already exist in the DXP) using a 'fuzzy' join where a.MY_DATE is =b.END_DATE. This match needs to execute each time the data is refreshed. note: the originating data connection isSalesforce. It seems Salesforce direct connection doesn't allow custom queries

Q1. Is there a way to do this without implementing an R or Python data function

I tested an R function,fuzzyjoin(), which works beautifully, but when trying to call it as a transformation step, I learned that Transformation Data Functions can only have 1 table as input.

Q2. If I can use R to do the match, how best to automate the process

Reference image attached.

Thanks, Thom

Link to comment
Share on other sites

Here you can find a interesting discussion:

https://stackoverflow.com/questions/37289405/dplyr-left-join-by-less-tha...

Q1: (Following this discussiion...) You could do a left join (which problably ends up in a huge table) and then create a additional column that checks your condition. And then filter your data to condition = TRUE.

Q2: Write a data function that takes both tables as an input. Use a fuzzy join or a left join in combination with filtering (see discussion) and output a new table. Make sure your data function has the checkmark to refresh automatically.

I prefer the data function approach. But maybe someone comes up with a good solution without a data function.

More about fuzzyjoins:

https://cran.r-project.org/web/packages/fuzzyjoin/fuzzyjoin.pdf

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...