Jump to content

Interpolation Data Function Getting Error


Go to solution Solved by Gaia Paolini,

Recommended Posts

Posted

Hi,

I created the following data function script (with the help of some AI chat).
It's supposed to get grouped sets in which there is simulation data (IsSim=1) and measured data (IsSim=0).
It works well in a solely Python environment on example data I created, but not within Spotfire.

In Python env the code started like this to simulate the typical case (see full code further down):

FreqSim = pd.Series([1,2,3,4,5,6,7,8,9,10]*3)
GainSim = pd.Series([10,20,30,40,50,60,70,80,90,100,  -10,-20,-30,-40,-50,-60,-70,-80,-90,-100,  210,220,230,240,250,260,270,280,290,300])
IsSimSim = pd.Series([True]*len(FreqSim))
ColumnForGroupSim = pd.Series(['A']*10 + ['B']*10 + ['C']*10)

FreqMeas = pd.Series([1.1,2.2,3.3,4.9, 1.2,2.3,3.5,4.7,4.9, 1.3,3.6])
IsSimMeas = pd.Series([False]*len(FreqMeas))
ColumnForGroupMeas = pd.Series(['A','A','A','A', 'B','B','B','B','B', 'C','C'])

data = pd.DataFrame({
    'Freq': pd.concat([FreqSim, FreqMeas]).reset_index(drop=True),
    'Gain': pd.concat([GainSim, pd.Series([np.nan] * len(FreqMeas))]).reset_index(drop=True),
    'IsSim': pd.concat([IsSimSim, IsSimMeas]).reset_index(drop=True),
    'ColumnForGroup': pd.concat([ColumnForGroupSim, ColumnForGroupMeas]).reset_index(drop=True)
})

The resulting interpolated series was all OK as expected.

    Freq   Gain  IsSim ColumnForGroup  InterpGainFromSim
0    1.0   10.0   True              A                NaN
1    2.0   20.0   True              A                NaN
2    3.0   30.0   True              A                NaN
3    4.0   40.0   True              A                NaN
4    5.0   50.0   True              A                NaN
5    6.0   60.0   True              A                NaN
6    7.0   70.0   True              A                NaN
7    8.0   80.0   True              A                NaN
8    9.0   90.0   True              A                NaN
9   10.0  100.0   True              A                NaN
10   1.0  -10.0   True              B                NaN
11   2.0  -20.0   True              B                NaN
12   3.0  -30.0   True              B                NaN
13   4.0  -40.0   True              B                NaN
14   5.0  -50.0   True              B                NaN
15   6.0  -60.0   True              B                NaN
16   7.0  -70.0   True              B                NaN
17   8.0  -80.0   True              B                NaN
18   9.0  -90.0   True              B                NaN
19  10.0 -100.0   True              B                NaN
20   1.0  210.0   True              C                NaN
21   2.0  220.0   True              C                NaN
22   3.0  230.0   True              C                NaN
23   4.0  240.0   True              C                NaN
24   5.0  250.0   True              C                NaN
25   6.0  260.0   True              C                NaN
26   7.0  270.0   True              C                NaN
27   8.0  280.0   True              C                NaN
28   9.0  290.0   True              C                NaN
29  10.0  300.0   True              C                NaN
30   1.1    NaN  False              A               11.0
31   2.2    NaN  False              A               22.0
32   3.3    NaN  False              A               33.0
33   4.9    NaN  False              A               49.0
34   1.2    NaN  False              B              -12.0
35   2.3    NaN  False              B              -23.0
36   3.5    NaN  False              B              -35.0
37   4.7    NaN  False              B              -47.0
38   4.9    NaN  False              B              -49.0
39   1.3    NaN  False              C              213.0
40   3.6    NaN  False              C              236.0




The full code in the Spotfire data function:
 

import numpy as np
import pandas as pd
from scipy.interpolate import interp1d

Freq.name = 'Freq'
Gain.name = 'Gain'
IsSim.name = 'IsSim'
ColumnForGroup.name = 'ColumnForGroup'


data = pd.DataFrame({
    'Freq': Freq.reset_index(drop=True),
    'Gain': Gain.reset_index(drop=True),
    'IsSim': IsSim.reset_index(drop=True),
    'ColumnForGroup': ColumnForGroup.reset_index(drop=True)
})

print(data[0:100])

# Add a column to store the interpolated values
data['InterpGainFromSim'] = np.nan


# Function to perform interpolation for each group
def interpolate_group(group):
    # Reset index to avoid any NA issues
    group = group.reset_index(drop=True)
    # Separate the known and unknown data
    known = group[group['IsSim']]
    unknown = group[~group['IsSim']]

    if not known.empty and not unknown.empty:
        # Create interpolation function
        interpolation_function = interp1d(known['Freq'], known['Gain'], kind='linear', fill_value='extrapolate')

        # Apply interpolation to the unknown data and store the result in the new column
        # Interpolate and update the new column in the group
        interpolated_values = interpolation_function(unknown['Freq'])
        group.loc[unknown.index, 'InterpGainFromSim'] = interpolated_values
    return group


# Apply interpolation to each group
# interpolated_data = data.groupby('ColumnForGroup').apply(interpolate_group)
interpolated_data = data.groupby('ColumnForGroup',  group_keys=False).apply(interpolate_group).reset_index(drop=True)
# Extract the 'InterpGainFromSim' column as a Series
interp_gain_from_sim_series = interpolated_data['InterpGainFromSim']


When the data function is run I get the following error:

 

Could not execute function call 'CreateGainDeltaMeasSim' (2)


Error executing Python script:

KeyError: "None of [Index([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n      dtype='Int32')] are in the [columns]"

Traceback (most recent call last):
  File "data_function.py", line 364, in _execute_script
    exec(compiled_script, self.globals)
  File "<data_function>", line 45, in <module>
  File "groupby.py", line 1824, in apply
    result = self._python_apply_general(f, self._selected_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "groupby.py", line 1885, in _python_apply_general
    values, mutated = self._grouper.apply_groupwise(f, data, self.axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "ops.py", line 919, in apply_groupwise
    res = f(group)
          ^^^^^^^^
  File "<data_function>", line 29, in interpolate_group
  File "frame.py", line 4108, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "base.py", line 6200, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "base.py", line 6249, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")


 

I think all my data is intact and I don't know where these ...0, 0,\n come from.
Any ideas?

Thanks in advance,

Joel

  • Solution
Posted

Your IsSim column is read as an "object" into Spotfire. Some data types need to be recast in Python data functions (notably dates, I did not know about booleans).
if I add the line below, to recast the column to a bool:
 

IsSim=IsSim.astype(bool)


just before defining data, the code runs without error in Spotfire (using your simulated data).

  • Like 1
Posted

Hi Gaia,

Thanks for your help, this line indeed eliminated the error.
Now I see values appear in the output column, but I see that this output column is not in sync with the original Freq and Gain columns.

I would appreciate any help.

Thanks,

Joel

 

Posted

I think I managed. Thanks.
Probably some inconsistency in my data, I removed a lot of old data sources that were part of the eventual data table, and now it seems to work OK.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...