Jump to content

Gaia Paolini

Spotfire Team
  • Posts

    803
  • Joined

  • Last visited

  • Days Won

    5

Community Answers

  1. Gaia Paolini's post in Average of N largest values was marked as the answer   
    I can think of doing it with 2 calculated columns. 

    First define a document property, Nvalue, containing your desired N.

    Then  define a column containing the sum of the N largest values, call it e.g. [sum_N_largest]:
    case when [var]>= NthLargest(DISTINCT([var]),${Nvalue}) then Sum([var]) end This will give you the sum over all values in the top N, but it will be empty for the rows where the value is not in the N largest.
    So the second column creates the max value of that (since a defined value is always greater than None) and divides it by the N:
    This would be your result column, if creating a calculated column, or expression for your plot.
    Max([sum_N_largest])/${Nvalue} This column will be defined for every row.
    I notice that NthLargest only returns the true Nth value if all the records are distinct.  So I amended the call to NthLargest adding Distinct.
    I am not 100% sure of what you want to calculate though. You can keep or remove the Distinct(..) as needed.
  2. Gaia Paolini's post in Customizing cross table totals was marked as the answer   
    You should be able to do so by adding the quarter column to the columns of the cross table
  3. Gaia Paolini's post in Spotfire Analyst 14.4 The data function '[Geospatial] Distance Matrix' could not be executed. was marked as the answer   
    We have a Python option in the newly released spotfire-dsml library.
    What you need to do is install spotfire-dsml via Spotfire (Menu > Tools > Python Tools > Package Management).

    Then use the following data function (the only slight modification I made to the code is to de-duplicate any id vector, as it expects unique ids):
    The distance method could be: 'haversine' (similar to the old TERR version), 'haversine_r' (slightly slower but more accurate) or 'geodesic' (slowest and most accurate).
    If buffer is not None, it applies a buffer cutoff to the returned distance matrix.
    Other than that, the inputs are the latitude, longitude and id for each dataset (lat1,lon1,id1,lat2,lon2,id2).
     
    # Import modules from spotfire_dsml.geo_analytics import distances, crs_utilities import pandas as pd # Make sure ids are not duplicated def deduplicate_id(id): id_list = list(id) map_list = map(lambda x: x[1]+ "_" + str(id_list[:x[0]].count(x[1]) + 1) if id_list.count(x[1]) > 1 else x[1], enumerate(id_list)) return list(map_list) id1 = deduplicate_id(id1) id2 = deduplicate_id(id2) distance_method='haversine' crs='EPSG:4326' unit='m' buffer=None distance_matrix = distances.calculate_distance_matrix(crs,unit,buffer,distance_method,lat1, lon1,id1,lat2,lon2,id2)  
  4. Gaia Paolini's post in Python scripts in Spotfire producing different results than in Jupyter Notebooks was marked as the answer   
    The interpretation of a geopandas dataframe that is output into Spotfire from a Python data function is currently under review in Spotfire.
    For the time being, I suggest to cast your output geopandas dataframe into a simple pandas dataframe, and do all the final data preparation (calculate bounds, centroid, turn geometry into WKB, and add metadata) within your Python script.

    I have added these lines at the end of your code, generating a second output dataframe called PolyOutput2, then plotted it in the attached screenshot.

    I am not sure if this is the intended result, but it looks a bit better than what you are getting now and the bounds and centroid look consistent with your coordinates.

     
    # Calculate bounds and centroid manually shape_bounds= PolyOutput.bounds shape_bounds = shape_bounds[['minx','maxx','miny','maxy']] shape_bounds.columns=['XMin','XMax','YMin','YMax'] shape_centroid_points = PolyOutput.centroid shape_centroid=pd.DataFrame({'XCenter':[],'YCenter':[]}) shape_centroid['XCenter'] = shape_centroid_points.map(lambda p: p.x) shape_centroid['YCenter'] = shape_centroid_points.map(lambda p: p.y) # Translate geometry to WKB wkb_geometry = gpd.GeoSeries.to_wkb(PolyOutput['geometry']) # Downgrade to simple pandas data frame PolyOutput2 = pd.DataFrame({'geometry':wkb_geometry}) PolyOutput2 = pd.concat([PolyOutput2,shape_bounds,shape_centroid],axis=1) # Geocode for Spotfire (should be the last statements before output) PolyOutput2['geometry'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["Geometry"], "ContentType": ["application/x-wkb"]} PolyOutput2['XMax'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["XMax"]} PolyOutput2['YMax'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["YMax"]} PolyOutput2['XMin'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["XMin"]} PolyOutput2['YMin'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["YMin"]} PolyOutput2['XCenter'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["XCenter"]} PolyOutput2['YCenter'].spotfire_column_metadata = {"MapChart.ColumnTypeId": ["YCenter"]}
  5. Gaia Paolini's post in Cumulated sum when One column is Zero Or one was marked as the answer   
    I understand that you already have a [Row_ID] column which is calculated as RowId()
    You don't say what [CalcCol2] is meant to represent, I assume it is the count of values in each group of [CalcCol1]
    The idea is to create a duplicate of [Col1] that increments every time [Col1] changes.
    Call it [Col1b] as:
    Sum(Integer(SN(Lag([Col1]),[Col1])!=[Col1])) OVER (AllPrevious([Row_ID])) where 
    Lag([Col1]),[Col1])!=[Col1] is True when [Col1] changes and False when it stays the same.
    SN(..) is there to set the first value to [Col1] Integer(..) turns True/False into 1/0 Then you calculate [CalcCol1] as:
    DenseRank([Row_ID],[Col1b]) and [CalcCol2] as:
    Max([CalcCol1]) over ([Col1b])
     
  6. Gaia Paolini's post in Render a hex colour in a column was marked as the answer   
    I guess you could do it with an Iron Python script, but by far the easiest solution would be to create a custom colour scheme that you can then permanently assign to your column.
    To do that you would first assign the corresponding colours to your variable, within a visualization (say a barchart).
    To save it as a colour scheme: go to the icon shown in the screenshot, click on Save As > Document Color Scheme.

    give it a name (e.g. hexColScheme)
    then go to the column properties and assign it as a categorical colour scheme (type the exact name)

    so the scheme is kept with the variable and every time you use that variable to colour, it will be used.
  7. Gaia Paolini's post in How to rank by multiple columns in a Calculated Column. was marked as the answer   
    To do it in steps:
    First define a column [Decode_Judge_Type] to 'decode' JUDGE_TYPE like this:
    case [JUDGE_TYPE] when 'Y' then 1 when 'N' then 2 when 'H' then 3 else 99 end Then define a column [Rank_Process_Time] to rank PROCESS_TIME (most recent first) like this:
    DenseRank([PROCESS_TIME],'desc',[PRODUCT_ID],[CHECK_TYPE]) [Note here that ranking can be done over multiple columns].
    Then create your result like this:
    case   when ([Rank_Process_time]=1) and ([Decode_Judge_Type]=1) then 1  when ([Rank_Process_time]=1) and ([Decode_Judge_Type]=Min([Decode_Judge_Type]) over ([PRODUCT_ID],[CHECK_TYPE])) then 1  end
    The empty columns can then be filtered out. See picture. Given the small example provided it looks ok, but I don't know if there are any fringe cases.

  8. Gaia Paolini's post in Cumulative count between dates (Calculated column) was marked as the answer   
    I could think of a way without data functions, but it is a bit involved, and the final data structure is not exactly what you wanted. 
    Maybe there is an easier way?  Otherwise I would do it in a data function.

    You start with one 'initial' dataset and one 'additional' dataset.
    Read them in as separate data tables, with DoseDateTime read of type Datetime.
    Change column names of Dose DateTime to 'Initial' and 'Additional' respectively.
    Add a new table with the + symbol on the left, choose 'Other' then select the 'initial' table.
    Rename it to e.g. 'combined'
    In data canvas, add transformation 'add columns' and bring in the 'additional' dataset from the existing table in the analysis
    In settings for added columns: join by Patient only, and select Full Outer Join as join type. In 'columns from new data' only import the column you named 'Additional'.
    This will do a cartesian join within each [Patient], now you have to work out which rows make sense.
    on the 'combined' table create a calculated column: [Delta], as
        DateDiff('day',[Initial],[Additional])
        
    Now create your target column [Goal] as:
       1 + sum(Integer(([Delta]>=0) and ([Delta]<=2))) OVER ([Patient],[Initial])
       
     the logical clause is:  the time difference is more than zero and less than  two days .
     Integer() turns True/False into 1/0.
     So when you sum, you ignore the zeros.
     You add 1 as you need to count the initial dose itself
     You group by Patient and initial dose.

     
  9. Gaia Paolini's post in Get second Maximum using expression... was marked as the answer   
    You could define a column that ranks your [timeperiod] in descending order:
     
    DenseRank([timeperiod],'desc') then use it to find the maximum value of the expression DenseRank([timeperiod],'desc')=2 (which would be max when this statement is true), like this:

     
    ValueFormax(DenseRank([timeperiod],'desc')=2,[timeperiod])  
  10. Gaia Paolini's post in How can I delete a data canvas operation with an iron python script? was marked as the answer   
    I found now what looks like a complete solution. I was reminded that removing rows is a destructive operation, in the sense that a new sourceview is generated and the old one becomes invalid.
     
    from Spotfire.Dxp.Data.DataOperations import DataSourceOperation from Spotfire.Dxp.Data.DataOperations import DataOperation #tbl is an input parameter of type Data Table. ## functions def find_remaining_operations(sv): allOps = sv.GetAllOperations[DataOperation]() numOps=0 for op in allOps: if type(op).__name__ == "RemoveRowsOperation": numOps=numOps+1 return numOps ###--------------------------------------------- #Remove last RemoweRows operation sourceView = tbl.GenerateSourceView() #remove operation is destructive and sourceView is updated after every remove while find_remaining_operations(sourceView)>0: op=sourceView.LastOperation if type(op).__name__ == "RemoveRowsOperation" and \ sourceView.CanRemoveOperation(op): sourceView = sourceView.RemoveOperation(op) #Find how many of these operations are left - this is now just optional #The document properties needs to exist already (type integer) numRemoveRowsOperations=find_remaining_operations(sourceView) Document.Properties['numRemoveRowsOperations']=numRemoveRowsOperations  
  11. Gaia Paolini's post in Market Basket Analysis - Data Function Help was marked as the answer   
    It appears to be a lack of defensive coding in the data function. Your input parameters (I think primarily the minimum support) do not return any rules, and the rules data frame should have been returned empty instead of throwing an error.
    I fixed so that if no rules are found, it returns a data frame with one row which should help a bit. See below
       # Import pandas import pandas as pd # MBA packages # Import the transaction encoder function from mlxtend from mlxtend.preprocessing import TransactionEncoder # Import Apriori algorithm from mlxtend.frequent_patterns import apriori # Import the association rule function from mlxtend from mlxtend.frequent_patterns import association_rules   # List transactions by group trans_df = customer_df.groupby(['Invoice', 'Customer_ID'])['CategoryGroup'].apply(list).reset_index(name='Transaction') trans_df['Transaction'] = trans_df['Transaction'].apply(lambda x: list(set(x))) # Data preprocessing trans_list = trans_df['Transaction'].to_list() encoder = TransactionEncoder() encode_arr = encoder.fit_transform(trans_list) # Converting to dataframe encode_df = pd.DataFrame(encode_arr, columns=encoder.columns_) # Compute frequent itemsets using the Apriori algorithm frequent_itemsets = apriori(encode_df, min_support = min_support, max_len = max_len, use_colnames=True) # Compute all association rules for frequent_itemsets rules = association_rules(frequent_itemsets, metric="lift", min_threshold= min_lift) if rules.shape[0]>0: # Clean rules rules['antecedents'] = rules['antecedents'].apply(lambda x: ','.join(list(x))).apply(lambda x: x.replace(',','|')) rules = rules[(rules['consequents'].apply(lambda x: len(x)==1))] rules['consequents'] = rules['consequents'].apply(lambda x: ','.join(list(x))).apply(lambda x: x.replace(',','|')) else: rules= pd.DataFrame(index = range(1), columns=rules.columns) rules['antecedents']='None found' rules['consequents']='None found' rules.fillna(0,inplace=True)     # Copyright (c) 2024. TIBCO Software Inc. # This file is subject to the license terms contained in the license file that is distributed with this file.
  12. Gaia Paolini's post in I would like to replace data(Replacedata()) using ironPython script from Library Folder in Spotfire Server. but it fails. Please let me know how to avoid it. was marked as the answer   
    Your csv data would be stored in the library as a sbdf. The way to retrieve something from the library is quite different.
    This code should work. Replace the path to your library path (excluding the word 'Library') ending with '/' and your file name in filePath accordingly.
    The variable table is an input parameter of type DataTable.
    from Spotfire.Dxp.Data import * from Spotfire.Dxp.Framework.Library import LibraryManager, LibraryItemType from Spotfire.Dxp.Data.Import import SbdfLibraryDataSource   libraryPath ='/yourpath/' filePath = 'yourfile'   manager = Application.GetService[LibraryManager]()   (found, item) = manager.TryGetItem(libraryPath+filePath,LibraryItemType.SbdfDataFile)   print(found) if found: ds = SbdfLibraryDataSource(item) table.ReplaceData(ds)  
  13. Gaia Paolini's post in Is it possible to set MARK using ironPython script in Sopotfire? was marked as the answer   
    Yes. See for instance this article:
    https://community.spotfire.com/s/article/How-to-Mark-Rows-based-on-Document-Property-Value-in-TIBCO-Spotfire-Using-IronPython-Scripting
  14. Gaia Paolini's post in How to change multiple columns data type in the script with IronPython was marked as the answer   
    I finally worked it out with the help of Spotfire Engineering: the expression
    ColumnSelection(column_name)
    needs to be surrounded by " enclosed by ' ..' (sorry it is difficult to see)
    ColumnSelection('"'+column_name+'"')
    to stop it searching for 'c' and also grabbing 'cola'. I tried it with your code and your latest generated example and it seems ok.
    Can you let me know if it works.
    Note that this can not be applied to calculated columns, only to imported ones.
    from Spotfire.Dxp.Data.Transformations import ExpressionTransformation,ColumnSelection from Spotfire.Dxp.Data import * table = Document.Data.Tables['df2'] t = ExpressionTransformation() t.ColumnReplacements.Add( 'sub_id','integer([sub_id])',ColumnSelection('sub_id') ) table.AddTransformation(t) print("Done Changing wafer_id Datatypes....") ###################################################### ## functions ###################################################### def changeColumnDataTypeToReal(table, column, transformation): rowsToInclude=column.RowValues.ValidRows cursor = DataValueCursor.CreateFormatted(column) column_name = column.Name values = [] for row in table.GetRows(rowsToInclude, cursor): values.append(cursor.CurrentValue) try: values = [float(x) for x in values] transformation.ColumnReplacements.Add( column_name, 'real(['+column_name+'])', ColumnSelection('"'+column_name+'"') ) table.AddTransformation(transformation) print('column changed to real:', column_name) #print ('real(['+column_name+'])') except: print('something went wrong, probably column was not numeric:', column_name) pass return ###################################################### table = Document.Data.Tables['df2'] t = ExpressionTransformation() columns = list(table.Columns) # input para use_case = 1 if use_case==1: excluded_columns = ['cola','id','sub_id'] start_index=3 else: excluded_columns = ['cola','id','sub_id'] start_index=0 for cc in columns[start_index:]: #print (cc) if cc.Name not in excluded_columns and cc.DataType!=DataType.Real: changeColumnDataTypeToReal(table, cc, t)  
  15. Gaia Paolini's post in Adding running/cumulative count to a data table was marked as the answer   
    Would this work:
    case   when Trim([Treatment])='DrugA' then DenseRank(RowId(),[Patient])   when ([Treatment] is null) and (Trim(LastValidBefore([Treatment]))="DrugA") then DenseRank(RowId(),[Patient])   end  
    I only used Trim(..) as in my case I had some spurious spaces in the drug name, you probably don't need it if the field does not have leading or trailing spaces.
  16. Gaia Paolini's post in Substracting Dynamically Previous Row Values in CrossTab was marked as the answer   
    It can be done in Spotfire if you add a row definition that creates a sort order that reflects the values of the weighted average.
    Note that this means the sorting is fixed.
    Horizontal: (Column Names)
    Vertical:
    <DenseRank(WeightedAverage([Volume],[Price]) OVER ([Company]),"desc") as [Rank] NEST [Company]>
    note that in Settings you need to check the Current Filtering only option.
    Cell values:
    WeightedAverage([Volume],[Price]) as [Avg Price], WeightedAverage([Volume],[Price]) THEN [Value] - First([Value]) OVER (Previous([Axis.Rows])) as [Difference]
  17. Gaia Paolini's post in When I imported the string “2023/01/01 0:00:00” using the ReplaceData method, it became a Date type. Is it possible to import it as a DateTime type? was marked as the answer   
    The simplest way, without changing your script too much, would be to add a datatype transformation at the end.
    Your first file is imported using the most likely data type settings, since your date has no time, it is set to a Date (interestingly, when I run it, it sets it as a String). So, since you are copying all the settings from your first file, everything else is set as a Date.
    Try this: I added an import at the top and a transformation at the end:
    # Copyright © 2022. TIBCO Software Inc. Licensed under TIBCO BSD-style license. # Replace tables from files   import clr import Spotfire.Dxp.Data.DataTable from Spotfire.Dxp.Data import * from Spotfire.Dxp.Data.Transformations import ExpressionTransformation,ColumnSelection   clr.AddReference("System.Windows.Forms") from System.Windows.Forms import OpenFileDialog   print "Start"   myDataManager = Document.Data #myDataManager = Document.Data['BOM_Detail'] #myDataManager = table d1 = OpenFileDialog() d1.Multiselect=True #lets you select more than one file d1.InitialDirectory='C\\Data' #the folder containing your source data files d1.ShowDialog()   files=d1.FileNames #lazy workaround: replace data with first file then add the other selected files ds=myDataManager.CreateFileDataSource(files[0]) table.ReplaceData(ds)   for ff in files[1:]: ds=myDataManager.CreateFileDataSource(ff) settings = AddRowsSettings(table,ds) table.AddRows(ds,settings)   print "Complete"     transformation = ExpressionTransformation() column_name='Update DateTime' transformation.ColumnReplacements.Add( column_name, 'DateTime(['+column_name+'])', ColumnSelection(column_name) ) table.AddTransformation(transformation)  
  18. Gaia Paolini's post in Property value in CASE WHEN Statement was marked as the answer   
    What happens is that the whole expression is translated using the current value of the Metrics document property.
    So when Metrics is equal to Region, this is the expression (in the preview):
    case WHEN "Region"="Region" then UniqueCount([Region]) else SUM([Region]) ENDcase WHEN "Region"="Region" then UniqueCount([Region]) else SUM([Region]) END it tries to parse the expression and finds that you cannot evaluate Sum of Region, regardless of whether that statement is reached or not.
    A way to fool the expression interpreter that worked for me:
    case WHEN "${Metrics}"="Region" then UniqueCount([Region]) else SUM( Real([${Metrics}]) ) END you are obviously not going to turn Region into real and sum it, but the interpreter seems satisfied that the expression is correct with respect to the data types used.
  19. Gaia Paolini's post in How to replace cell values with IronPython? was marked as the answer   
    see my reply on stack overflow https://stackoverflow.com/questions/77244449/spotfire-ironpython-replacing-data-table-values
  20. Gaia Paolini's post in How Do I go about adding R script "geosphere" into my Spotfire analysis? was marked as the answer   
    Let me know if this helps:
    https://support.tibco.com/s/article/Tibco-KnowledgeArticle-Article-44673#:~:text=1)%20From%20the%20menu%2C%20click,use%2C%20and%20then%20click%20Load
  21. Gaia Paolini's post in Compare with values in other columns within an Over row as a calculated column. was marked as the answer   
    I could only achieve it by calculating a column by using a TERR (basically R) expression function.
    Unfortunately there is no Python version for expression functions.
    What happens is you create this expression function and it becomes available to you for calculating a new column.
    This video might help:

    I only tested it with the sample data you provided.
    The syntax to call it would be:
    TERR_CalculateMatchingCount([Product Number],[Process Number],[X],[Y])
    You use this formula in a custom expression editor to calculate your new column.
    You need to load the R library data.table into Spotfire to use it.
    To do that: 
    go to top menu > Tools > TERR Tools 
    choose tab: Package Management
    Load the CRAN Package Repository
    into Available Packages type 'data.table'
    when it appears in the window below, select it and click Install.
    then Close and exit.
    To create a TERR Expression Function:
    go to top menu > Data > Data Function Properties
    choose tab Expression Functions
    click on New...
    give it the Name TERR_CalculateMatchingCount
    Function type: Column function
    Return type: Integer
    Category: Statistical functions
    Copy and paste the attached script.
    The script has 4 inputs (input1,input2,input3,input4) which will be the entire column specified at the start.
    It has one output, which is the new column.
    There are some hard-coded values there: 1000 and 2000 as process number values, and 2 as the maximum distance between coordinates.
    For each product number, this maximum distance is calculated across the rows with the selected product numbers.
    Then only the distances between 1000 and 2000 are kept, when less than or equal to 2.
    These are accumulated for each product number, then the counts of rows that are in the desired range are summed up and merged back to the original dataset.
    At this point, the count column is extracted and returned.
    It needs merging as it needs to be returned in exactly the same order that the data came in.
  22. Gaia Paolini's post in I want to get the last value of a particular group through the calculated column. was marked as the answer   
    Your original formula threw an error for me, I think because Last does not work after Over.
    Can you try this:
    ValueForMax([Production Time],[Target Value]) over ([Product ID],[Process Name])
  23. Gaia Paolini's post in Add different columns representing similar data on different days to x-axis and create a line chart was marked as the answer   
    I think it can be done, I put together a Spotfire dxp to exemplify. It appears a lot of ticks, but it is easy once you have done it a few times.
    I unpivoted the data in the data canvas I added a [Day] calculated column, to extract the numbers from the content of the column, as: Integer(RXExtract([Category],'\d+',1))I changed the colour of the marking to yellow: top menu > File>Document Properties > Markings I created a line chart. I changed its appearance so marking would appear as a separate colour: right click on plot > Properties > Appearance> Use separate color for marked items. I coloured the chart by [batch] but kept all the colours grey: right click on plot > Properties > Colors. I changed the Y limits to 60 and 105: right click on plot > Properties > Y Axis> Range. I added a calculated column [Avg_value] to represent the average: top menu > Data > Column properties > Insert:  Avg([Value]) over ([Category])In the plot, I added a line representing this column: right click on plot > Properties > Lines & curves > Add > Line from column values. I adjusted line thickness in both normal lines and average. When you click on a line, it is marked in yellow.

  24. Gaia Paolini's post in Static Sub-totals when filtering a crosstable was marked as the answer   
    The only way I could find to see only the subset of filtered data and to preserve the percentages was to define a calculated column to represent your total count. Calculated columns do not react to filters.
    So if you change your expression to this, where [TotalTotal] = Count() was your new calculated column
    Count() / Max([TotalTotal]) as [%Total]... then it should work. I had to add Max() as it wants an aggregation.
    You could try using a dropdown selection of the Body System and then setting a rule in Show/Hide items, but I could only make it work if only one Body System is selected each time.
×
×
  • Create New...