Jump to content

Gaia Paolini

Spotfire Team
  • Posts

    803
  • Joined

  • Last visited

  • Days Won

    5

Gaia Paolini last won the day on July 2

Gaia Paolini had the most liked content!

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Gaia Paolini's Achievements

Experienced

Experienced (11/14)

  • Dedicated Rare
  • Problem Solver Rare
  • Week One Done
  • One Month Later
  • One Year In

Recent Badges

7

Reputation

35

Community Answers

  1. Pending your reply, this is what worked for me (but it generates 250 not 13420 for TI1). You want to create a group index so that when a Zone re-occurs, you don't treat it like the same Zone. Create a [rowID] calculated column defined as RowId() Create a [STEP] column like this: SN(Lag(Concatenate([Wellname],[Zone]))!=Concatenate([Wellname],[Zone]),True) This will be True when it is the first element, or when a change happens from the previous line, including change of well name. Then the [GROUP_INDEX] is: sum(Integer([STEP])) over (AllPrevious([rowID])) You can avoid defining [STEP] separately once you are happy it works. Then the [DELTA] (your Footage) for each Zone taking repeats into account is: Last([MD]) OVER (Intersect([Zone],[GROUP_INDEX],[Wellname])) - First([MD]) OVER (Intersect([Zone],[GROUP_INDEX],[Wellname])) However, now you have [DELTA] defined over each row. To make display simple, you could define an associated column [DELTA2] that is only defined at every step change: case when [STEP]=True then [DELTA] end Now it is easier to use it in a cross table. Using columns [Zone] and [Wellname] and Sum([DELTA2]) as cell values.
  2. Could you try this: from Spotfire.Dxp.Application.Visuals import * mychart = mychart.As[ScatterPlot]() mychart.ColorAxis.Coloring.Clear()
  3. OK I can see the calculations on Excel. So I understand that you want to calculate the difference between the last and first each time the Zone re-occurs, then sum it over the zones. But I would have put a zero, not a 13170 on T1, as here there is only one row and no difference. So I would say the amount for T1 you want is 10+70+40+30+0+50+50, not 10+70+40+30+13170+50+50. Can you confirm either way?
  4. Thank you, I think I can see what you are calculating, but I don't understand what you want to calculate. Right now you are calculating for each Well and Zone, the difference between the last and the first value of MD. I can see that each Zone can be repeated. But, for instance, where does 13420 for TI1 come from? How would you calculate it, were it not in Spotfire but on a piece of paper?
  5. Your version should be ok. Please download the example dxp that comes with the Holt Winters data functions in the Community. You will find that there is already a pre-processing data function. The only thing you need to make sure of is to set the Test data input parameter to None in both the pre-processing and the Holt Winters data functions. In terms of data, it is your business case that will dictate how you aggregate it. I don't know the meaning of the different columns, I can only advise to look into it. It seems that at any given date, the number of cpk values varies from 1 to 1300, so not all the suppliers, programs and critical parameters are represented in all dates.
  6. The way to make it work with automatic frequency is to set Gamma=false. But that takes away the seasonality, so the advantage of Holt Winters.
  7. I tried on Spotfire 14.3 and it works. However, your data does not have any seasonal trend, so I suspect Holt Winters is not the appropriate method: setting the tolerance very high during pre-processing, I get a lot of interpolated data on a daily basis. applying Holt Winters gets me a forecast but only it I set the frequency manually, and I do not see any frequency in the data
  8. Your data is not ready to be used in the Holt Winters application: 1) there are multiple values of cpk per date. Your cpk column has up to 1340 values for the same date. Is this right or do you need to slice by some other column? The supplied pre-processing data function can aggregate multiple values, as long as you think this is correct. Note that the Holt Winters application provided does not do forecast by group. 2) the data is not a regular time series. There are gaps, sometimes very big in the data. You can close some gaps by aggregating via the pre-processing data function. I have tested it with the pre-processing and after aggregation the data appears weekly and has only 77 values. However, I am seeing an issue with the TERR engine (I am on Spotfire 14.4). If you are on an earlier Spotfire version can you do the following test? Upload your dataset as the Training dataset into the provided example Spotfire dxp. Use the pre-processing data function before the Holt Winters one. Set the Test data to None.
  9. I am not a domain expert and that picture looks quite advanced. Can you describe a simple use case?
  10. can you look at using something like sum(distance) over [zone]? This will calculate the sum within each zone. You can add multiple slicing columns (normally add an intersect(..) around the multiple columns
  11. I can think of doing it with 2 calculated columns. First define a document property, Nvalue, containing your desired N. Then define a column containing the sum of the N largest values, call it e.g. [sum_N_largest]: case when [var]>= NthLargest(DISTINCT([var]),${Nvalue}) then Sum([var]) end This will give you the sum over all values in the top N, but it will be empty for the rows where the value is not in the N largest. So the second column creates the max value of that (since a defined value is always greater than None) and divides it by the N: This would be your result column, if creating a calculated column, or expression for your plot. Max([sum_N_largest])/${Nvalue} This column will be defined for every row. I notice that NthLargest only returns the true Nth value if all the records are distinct. So I amended the call to NthLargest adding Distinct. I am not 100% sure of what you want to calculate though. You can keep or remove the Distinct(..) as needed.
  12. Excellent! Yes, I set unit='m' in the code I gave you; you can set units in meters, kilometers or miles or feet (with various spelling and abbreviations). It's all pretty new so documentation examples etc are still being processed.
  13. You should be able to do so by adding the quarter column to the columns of the cross table
  14. We have a Python option in the newly released spotfire-dsml library. What you need to do is install spotfire-dsml via Spotfire (Menu > Tools > Python Tools > Package Management). Then use the following data function (the only slight modification I made to the code is to de-duplicate any id vector, as it expects unique ids): The distance method could be: 'haversine' (similar to the old TERR version), 'haversine_r' (slightly slower but more accurate) or 'geodesic' (slowest and most accurate). If buffer is not None, it applies a buffer cutoff to the returned distance matrix. Other than that, the inputs are the latitude, longitude and id for each dataset (lat1,lon1,id1,lat2,lon2,id2). # Import modules from spotfire_dsml.geo_analytics import distances, crs_utilities import pandas as pd # Make sure ids are not duplicated def deduplicate_id(id): id_list = list(id) map_list = map(lambda x: x[1]+ "_" + str(id_list[:x[0]].count(x[1]) + 1) if id_list.count(x[1]) > 1 else x[1], enumerate(id_list)) return list(map_list) id1 = deduplicate_id(id1) id2 = deduplicate_id(id2) distance_method='haversine' crs='EPSG:4326' unit='m' buffer=None distance_matrix = distances.calculate_distance_matrix(crs,unit,buffer,distance_method,lat1, lon1,id1,lat2,lon2,id2)
×
×
  • Create New...