Jump to content

Combine/Link Multiple Datasets using 'contain' instead of exact match


Stefan D 2

Recommended Posts

Hi all,

I'm new to Spotfire and stuck on something that I can't seem to figure out by searching the community.

I am trying to combine or link multiple datasets which works pretty easy and straightforward when working with exact matching data. However I want to merge data mostly nota 1:1 match because it contains multiple values or irrelevant data. Therefore I want to match on a 'contain' criteria but I can't figure out how.

Here's some sample data for reference:

Table 1:

 

 

 

ItemID

Alternative Item

 

 

Wheels 1

11

 

 

Brakes 2

21

 

 

Tires 3, 4, 5

31

 

 

Tires 3, 4, 5

41

 

 

3, 5

51

 

 

Window 6

61

 

 

 

Table 2:

 

 

 

ItemID

Value

 

 

1

200

 

 

2

300

 

 

3

400

 

 

4

500

 

 

5

600

 

 

6

700

 

 

 

Desired result:

 

 

 

ItemID

Value

Alternative

Alternative 2

Alternative 3

 

 

1

200

11

 

 

 

 

2

300

21

 

 

 

 

3

400

31

41

51

 

 

4

500

31

41

 

 

 

5

600

31

41

51

 

 

6

700

61

 

 

 

 

 

Is something like this possible

Thank you in advance.

Stefan

Link to comment
Share on other sites

There is no built-in join method for your needs. You need to clean up your input data.You could solve this with a data function that needs to do a few tasks:

 

remove all characters with a regular expression from ItemID column (eg "Tires 3, 4, 5" --> "3, 4, ,5")

split all lines into multiple lines (eg ("3, 4, ,5"; "41") --> ("3"; "41"), ("4"; "41"), ("5"; "41") and output the resulting table

then you could continue with your join of table1 and table2 on ItemID 

 

If you don't want to work with a data functions there might be a chance to do this with calculated columns and other operations. But this would only work if you now the maximun amount (=N) of possible numbers (1,2,3,...) in your ItemID column.  So, you would create N coloumns

 

RXExtract([itemID],"[0-9]",1)

 

RXExtract([itemID],"[0-9]",2)

 

RXExtract([itemID],"[0-9]",3)

 

...

 

RXExtract([itemID],"[0-9]",N)

 

Then you need to create a new table with an unpivot operation to get this result:

 

(Original table on top with calculated column; snapshot with unpivot operation below)

 

itemid_0.png

 

Now, you can join table1 and table2 on ItemID

Link to comment
Share on other sites

Hi fadb, Tibco Community,

Thank you for your time to reply! I think your first solution could work however I might have over simplified my sample data. In fact I am working with aeronautical part numbers which consists of both letters and numbers. I'll give an exact example of what I'm trying to do:

Database 1:

 

 

 

 

Part Number:

5762-02AAR

 

 

Replacement For:

P/N 8333-02H01 and P/N 23456-3H01

 

 

 

 

Database 2:

 

 

 

 

P/N:

8333-02H01

 

 

 

 

23456-3H01

 

 

 

 

Desired result:

 

 

 

 

P/N:

8333-02H01

 

 

Alternative:

5762-02AAR

 

 

 

 

23456-3H01

 

 

5762-02AAR

 

 

 

 

I really hope something like this is possible in Spotfire, would you have any idea

Kind regards,

Stefan

Link to comment
Share on other sites

Thanks, I'll check those out! Sadly the string format is very very random so that's why a lookup function would be the perfect solution...

 

What I'm thinking of now is splitting on the 'space' delimeter, then filtering out all rows that don't contain a number, deleting all comma's, etc.

 

Here's some more sample data:

 

Part Number: Replacement For:
5762-01AAR Manufacturer's Name PTX123-0000-1, PTX123-0000-2, PTX123-0000-1, Manufacturer's Name Z1243W431-1, Z1243W431-2, Z1243W431-2
5762-02AAR Manufacturer's Name P/N 2340H13-1 Use in Manufacturer X/Manufacturer Y Blower Ass'y (Waste System Vacuum Blower) P/N 402384-2
Link to comment
Share on other sites

Okay, so basically your original data is a complete mess... ;)

 

I think that even with a lookup function it would be rather complicated to clean your data set.

 

I assume you have a complete list off all possible Part Numbers If yes, my advice would be to use a data function to extract only the part numbers from [Replacement For]

 

If you have to do more of those complex analysis or data merges than I can highly recommend to learn more about data functions.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...