Jump to content

Recommended Posts

Posted (edited)

Is there any bug known or anything we know that would tell me when I need to double-check the data on demand settings of my information links?

I have a dashboard with dozens of information links connected, and these are part of a huge pipeline creating the next table, with calculated columns depending on user interaction to transform everything.

Automatic loading somehow didn't work, because it seems to create loops if I create a button to update a table, making Spotfire very slow. Then I deactivate ALL options of automatically reloading the data links and data on demand. However, from time to time the dashboard starts getting slow again... And then I need to go through every Link and check the auto-reload option, and usually I find one of them that is somehow checked again (reload automatically activated), making the pipeline slow again.

Any clues of why this could be happening?

I'm mostly creating new data functions, calculated columns, adding columns to existing tables, and rarely renaming a column. These new columns or columns I changed are not used by any calculations or scripts.
Spotfire 11.4.6

Thank you!

image.png.b743afd2b07d75123b30cd6f8740573f.png

Edited by Henry Heberle
Posted

Hi Henry,

When you create a data function, you can also check a box to refresh the function automatically:
image.png.745100b57c3594b5795a1f7a7a313e4c.png

Could it be that you activate the data function to refresh automatically and then your on-demand also kicks off?
Or are your data function always to be refreshed manually?

Kind regards,

David

Posted

Hi David, the problem is happening in my pipeline without data function - although we have data functions to build other tables unrelated to the ones of this post.

It has usually one initial Information Link (SQL) with data on demand as a starting point, then I join a bunch of other tables in the data canvas, creating a long pipeline. At the end I have a Result Table. This one is what I tell Spotfire sometimes to reload all data, or sometimes only linked data, depending on the need, with IronPython scripts. For this reason, it's very important that all this information on "join connectors" don't reload automatically - I wish we could just let them automatic, and would update the pipeline from where that connection starts (e.g, from the middle), however, in our tests it seems that this was creating loops or something like that, and making the dashboard extremely slow, so we stick with only refreshing the pipelines in the data canvas with scripts and never with auto reload.

Typically, the join with other tables is what is somehow being checked again by Spotfire itself, and I couldn't find out when this is happening, it feels really random. I have been changing Information links here and there, like adding columns to the IL, which has caused some problems with breaking filter schemes it seems. Maybe this issue is related? I don't know. We have dozens of tables with hundreds of columns each, so it's getting a bit difficult to keep up with double-checks and performance tests.

By the way, if you have some tips on how to create some automated checks, e.g., tracking time of each small step, store the average time of those steps in a table so we can compare previous time and averages with current time, please let me know. For data function and ironpython I can imagine how to track time, but for the information links/sql, the Joins (that connector with auto-reload settings) and the calculated columns, it's unclear to me.

Posted

Hi Henry,

Thank you for the explanation. I think it depends on how you create your pipelines. Are they constructed of a combination of Information Links that are pulled in directly or are they already residing within Spotfire (you basically 'copy' them into the pipeline that is)?
If the latter is the case, you could 'simply' tell the iron python script to load the parts you want reloaded. If not, the reload is done for the whole pipeline I guess (unless you are able to single out the specific Information link that needs reloading).
I don't know how your script looks like, but would this reference be of any help?

 

Concerning your last question, you can see some information of the overall loading time and data table size in the Help --> Support Diagnostics and Logging:
image.png.f2465dfeb351370e725fbd0bd563a19b.png

Besides that, if you really want to compare differences between different 'versions' of your analysis file, I would also recommend to have a go at the Application Profiler (via Tools --> Diagnostics --> Application Profiler) or the Qualification Tool (Tools -> Qualification -->Compare (one below the Application Profiler)):
image.thumb.png.7e5cc8d400233414d3f02f23b0ca6ad7.png

Kind regards,

David

Posted

Thanks for sharing David! When I find a blocker again, I will reference these tips to check for running time or so.

Ah, I forgot to mention the main smell that makes me look for "reload automatically" in all connectors in our data canvas, which is: I make a 'pop-up' appear, the normal one showing processes in Spotfire, whenever the user clicks the buttons that refreshes the tables. Usually it goes fine and fast, but sometimes it gets stuck with the message "Creating output data"... And doesn't go further, although we might be working with the same data. That's when I start checking those reload automatic settings, and when I typically find somewhere in the pipelines of the data canvas that this was activated again. Then I deactivate it, run it again, and the tasks run normally without getting stuck on 'creating output data'. But then in a few weeks of work, I may face the same problem again, in the same or different table, that the setting suddenly is now 'true' and not 'false' anymore.

The reason that I could think of is that the button/script will refresh multiple dependencies tables (which may be a sequence of dependencies) before refreshing the final table, so if any is 'automatic reload', I believe Spotfire will try to refresh the result final table multiple times because it was refreshed in the script, and because the dependencies are being refreshed by the script too, causing the final table to refresh again (?), so multiple refresh actions will be called, generating redundant and conflict actions.

Any script to check all connections (entry points in the pipeline) and print the source of data and status (true/false) of this setting? This would be a great check because I could just check if all of them are false very quickly.
image.png.3393526a420ac459399b54b23603b26c.png

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...