Jump to content

I would like to have a calculated column which is a polynomial fit of my data. Does anyone know how it's done by TERR script?

Joel Dror 2
Go to solution Solved by Joel Dror 2,

Recommended Posts

You should be able to do it with a TERR expression function that creates the fitted column.

To create an expression function, go to top menu > Data > Data Function Properties and then choose the tab Expression Functions.

Here give it a name, e.g. polyFit, then type the script

#polynomial order (#inputs are vectors so we need the first element only)poly_order=input3[1] model=lm(input2 ~ poly(input1, poly_order, raw=TRUE))output=predict(model,data.frame(y=input2))

The function will a Column function, return type Real and Category Statistical functions (the default).

Then go to top menu > Data > Column properties and insert a new calculated column with expression:

polyFit([your x column],[your y column],4)

where 4 is just an example of the desired polynomial order (you can read it from a document property if you like instead of typing it explicitly).

So now the polyFit function is called with input1=your x column, input2=your y column and input3=4. The output column is returned as your polynomial fit.

I think the x order is dealt with inside the poly function so you should be ok if your input x is not in the correct order.

Link to comment
Share on other sites

Hi Gaia,

Thanks you very much for your response.

This seems to work well, however, I have several such series that I would like to polyfit.

So I would like to do something like in the calculated column:

PolyFit([DacCodeSigned],[VPos],3) Over ([Temperature],[ChipID],[Voltage])

But I see that it is not allowed.

Is there any way to get around this limitation and compute the polyfit for each sub-set of my column?

I know that Spotfire has a polynomial fit in the trend lines, but I would like to eventually have that information in a column in order to use in further calculations.

Link to comment
Share on other sites

It looks like you need an extra (or a few extra) parameters for your grouping column, then implement your grouping inside the function.

At least from this support article:


I think that if you input the grouping columns as Concatenate([Temperature],[ChipID],[Voltage]) you can get away with only one extra input parameter. Are you familiar with R so you can change the code?

Link to comment
Share on other sites

Hi, Gaia,

No, I don't know R enough to change the code.

I tried instead to create a data function in Python.

I have X and Y and ColumnForGroup which is indeed a calculated column in my data which is a concatenation of all the columns that I want to group by.

X.name = 'X'

Y.name = 'Y'

ColumnForGroup.name = 'ColumnForGroup'

df = pd.concat([X, Y, ColumnForGroup], axis=1)

df.groupby('ColumnForGroup').apply(lambda x: pd.Series(np.polyval(np.polyfit(x.X.to_numpy(dtype=float, na_value=0), x.Y.to_numpy(dtype=float, na_value=0), 3),x.X))).sort_index(level=1)

I do get individual polyfitted curves, however I see that the resulting column is not ordered in alignment with the original data columns.

So I get proper polyfitted curves but they are jumbled.

I don't have enough knowledge to maintain the order to fit the order of the original data columns.



Link to comment
Share on other sites

  • Solution


I think I managed to get the Python code to work.

This yields good results on my file.

Unfortunately, I don't fully understand how it worked 😕

Does it seem OK to you?

Do you have suggestions for improvement or an alternative method in Python or R?

  • X,Y are tied to the relevant columns in my table.
  • ColumnForGroup is another column in my table which is a concatenation of many other columns.
  • Order is a value (e.g. 3)
  • PolyFit is the output column
import numpy as npimport pandas as pd X.name = 'X'Y.name = 'Y'ColumnForGroup.name = 'ColumnForGroup' df = pd.concat([X, Y, ColumnForGroup], axis=1)df = df.set_index('ColumnForGroup')B = df.groupby('ColumnForGroup', sort=False)C = B.apply(lambda x: pd.Series(np.polyval(np.polyfit(x.X.to_numpy(dtype=float, na_value=0), x.Y.to_numpy(dtype=float, na_value=0), Order),x.X)))D = C.values.ravel()PolyFit = pd.Series(D)
Link to comment
Share on other sites

The code works fine.

You are inputting three columns and renaming them to 'X','Y' and 'ColumnForGroup'.

(not sure what the set_index is doing but it is not harming).

You are then creating a data frame out of the three columns, and grouping it.

You are applying the polynomial fit to each group: first polyfit gives you the coefficients, then polyval gives you the fitted values for each x.

You do a little bit of restructuring at the end to reshape your column

Link to comment
Share on other sites

It seems that sort=False was critical to get the resulting column in the right order.

I'm just writing it down here in case it helps someone who's trying to accomplish something similar.

Thanks you for your help 🙌

P.S. Is there a good resource (course/book) to be able to better master this kind of scripting?

Link to comment
Share on other sites

Difficult to recommend resources, there are tons out there, both courses and free resources.

Every library (package) has its own rules, and to fine tune (like for instance sort=False) you really need to look into the function itself. There is no consistency across functions or libraries, and in my opinion no course that teaches that. Moreover new versions of packages can add or remove input parameters and generate disruptive changes.

I think it is important to set yourself up with a good interactive editor and always check the intermediate results of your code vs what you expect.

For data functions in Python, adding the following snippet stores the inputs you sent to Spotfire in a .pickle file that you can read back from your editor, and it would allow you to see what exact parameters were received by the data function, and debug the code line by line:

'file path' is where you want to write your .pickle file; something like 'C:/myfolder/inputs.pickle'.

input1,input2 are the explicit list of your input parameters (however many there are).

You set spotfire=True in the Spotfire data function, to write, and spotfire=False when you run the same code from your editor, to read.

import picklefilename = "file path"spotfire = True # set to False in the editorif (spotfire == True): with open(filename, 'wb') as f: # save data pickle.dump((input1,input2), f)else: with open(filename, 'rb') as f: # load data input1,input2 = pickle.load(f)

For R, the equivalent would be this: where file path should be something like 'C:/myfolder/inputs.RData'

spotfire=TRUE #set to FALSE in the editorfilename='file path'if (spotfire==TRUE) { save.image(filename)} else { load(filename)}
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...