Jump to content
  • Spotfire Copilot: Interact with Spotfire in human language!


    This article introduces Spotfire Copilot. The Spotfire Copilot™ artificial intelligence tool is a free offering licensed under the Apache License 2.0 for use with Spotfire. As the tool is not part of the commercial Spotfire product, Cloud Software Group, Inc. does not offer paid support or maintenance for Spotfire Copilot.

    Check out the Spotfire Copilot AI tool, now available for download from the exchange. With the free add-on, any organization using the Spotfire® platform can stand up their own private instance of the Spotfire Copilot tool.

    What if you could ask Spotfire, "What is the best-selling product and its quarterly revenue" in this dashboard? Or, after installation, ask right inside Spotfire, "How do I add a new Python library?" The way we interact with computers has moved from cryptic punch cards to keyboards and mice to natural language. Through the Spotfire Copilot tool, we are on the cusp of offering our vast user community the same modern and intuitive experience inside the product. See this example where a Spotfire end user benefits from the Spotfire Copilot tool's automatic visual analytics prowess while having a smart conversation about formation pressure breakdown in an oil rig.

    image.thumb.png.e2ed1648952aee11ad370603340aae07.png

    The article below is a window into what we are currently designing, the architecture, and the use cases.

    The Spotfire Copilot core components

    The question-answering about oil rigs is done simply through the Spotfire platform's extensibility feature (data functions) calling OpenAI's API on the backend. While this example is very easy to build, we are continuing to work on much more significant enhancements to the Spotfire Copilot tool.

    image.thumb.png.41129797b6bb30cf6ac7acc4ad62d2e6.png

    From the OpenAI ChatGPT data function, we worked on creating a question-answering system, centric to Spotfire. The architecture is covered in more detail below. We "fed" the model Spotfire product documentation and multimedia. Thus, the user can ask a Spotfire product question and receive an answer (with citations if applicable).

    image.thumb.png.3dcc616534077af4ab8b083c1ff1dea1.png

    Our latest work is around autochart generation in Spotfire. Most major LLMs, including ChatGPT, are currently limited to producing text responses. Of course, our Spotfire users are interested in automating and maintaining Spotfire dashboards composed of visuals. The Spotfire Copilot tool has now evolved to a conversational chatbot with advanced natural language query abilities to answer analytics questions, produce innate Spotfire visuals, and perform Spotfire operations.

    image.thumb.png.1020a7af608b841bf119955273f66c0c.png

    image.thumb.png.839751c8dc4a0da44681125e9b05fa21.png

    Short introduction to Large Language Models (LLMs)

    NLP and LLMs - Glossary

    LLMs are advanced NLP models that use machine learning algorithms to analyze and understand human language. These models are trained on vast amounts of text data, such as books, articles, and websites, allowing them to generate human-like language and perform various language-related tasks.

    The recent advancements in deep learning and computational power have enabled the development of larger and more powerful language models. These models have been shown to excel at a variety of language tasks, including language generation, text summarization, translation, and sentiment analysis, among others. LLMs employ statistical techniques or probability distributions over sequences of words and range in complexity from N-grams to deep learning models. For a more detailed review of language models refer to this article.

    For the remainder of this article, we assume that the employed LLMs and accompanying services come from Microsoft Azure unless otherwise mentioned explicitly. We have been partnering closely with Microsoft to build a Spotfire Copilot tool to serve various types of Spotfire users and make their work more efficient.

    Why use the Spotfire Copilot tool?

    Talking to the countless number of Spotfire users, we have crafted several use cases for the Spotfire Copilot tool. Each one of the use cases helps at least one persona be more efficient, less error-prone, or have more satisfaction working with the product. The use cases cover a few different categories.

    In one category, the user may be looking for a specific feature of the product itself. The Copilot add-on can expedite and improve the experience of finding the answer to product questions. In another category, the Spotfire end user may be interested in generating an analytical narration of their data. The narration may accompany charts and graphs automatically generated based on the user data. In another set of use cases, the user is a developer or data scientist who is building new analyses inside Spotfire. When they need help with their Python or JavaScript code, the Copilot tool can help them achieve the result faster. The graph below lists a number of these specific use cases with some example prompts.

    image.thumb.png.5d5988e5ba282232b7434375ae710fa1.png

    The architecture

    To build a capable, secure, and customizable Copilot tool, several components are required. The logical overview of the architecture is depicted below. In the following short paragraphs, we briefly explain how the Copilot add-on works, keeps the data secure, and yet enables the user to customize their experience.

    At the heart of the user experience is a chat prompt where the user inputs their question or request. That happens in Spotfire. In the simplest of architectures, this prompt can be directly sent to an LLM (e.g., Azure OpenAI or OpenAI) through a Spotfire® Data Function. The result will be parsed and presented to the user. While this is useful, we aim to embed a much more powerful Copilot tool inside the product. 

    Under the hood, the prompt gets significantly processed before it is sent to the LLM. This whole process is managed by a component called "Orchestrator." The Orchestrator runs the raw prompt by a search engine, vector database, or otherwise cognitive service first. The reason to do so is to acquire context around the user's prompt. Imagine the following scenarios where in both the user asks "How do I install a Python package?" In the first scenario, the question is directly sent to the LLM. The LLM doesn't have the Spotfire context; therefore, the answer will likely be the most common answer where a user wants to install the package in their notebook through PyPi. In the second scenario, the raw prompt searches an indexed set of documents with the prompt. The resulting content will then be passed to the LLM as the context. That way, the LLM will know how to answer the question in the Spotfire context.

    The indexed documents are Spotfire product documentation, relevant articles, or any set of curated questions and answers. On top of that, and very importantly, we are designing features for any admin user to be able to plug in other documents per the needs of the business. That way, the user of the Spotfire Copilot tool will be able to converse about the internal documents, nomenclature, subject matter, policies, or anything else that is provided to the search engine. That is to say, the Copilot add-on is extensible.

    All of these processes and data flows are handled with the highest levels of security and privacy standards in mind. Our partner, Microsoft, provides several additional safety components that ensure the security and relevance of the user experience in Azure. We also provide citations to the resulting text so that the user knows the answer is trustworthy by looking at the references. We specifically instruct the LLM to avoid hallucinations and only respond within the context of relevant context.

    image.thumb.png.68748df50129b3dc07d58828ef4fa862.png

    Compare fine-tuning and prompt engineering

    One of the hot topics in employing LLMs is fine-tuning vs. prompt engineering. The LLMs are often called foundation models. These foundation models, such as GPT4, are trained by massive amounts of data, are very large (many billions of nodes), and are often frozen in time. When users need to customize their experience by providing context, the customization can happen in one of two ways. The more obvious way that comes to mind first is to fine-tune the foundation model with a relevant dataset the same way you would retrain any machine learning model. This process employs a significant set of prompt-completion pairs (at least a few hundred of them) that are curated by human vetters. This curated set is fed into a special process to adjust a subset of weights (probabilities) in the model based on the provided context. While this approach may sound obvious it comes at a significant cost. The process of fine-tuning is often quite pricey. Besides, the process of building the curated list of prompt-completion pairs is resource-intensive and expensive. At the time of writing this article, the fine-tuning processes for LLMs are not quite stable either. The incremental improvement in the performance should be important enough to justify the fine-tuning process.

    In contrast, the process of adjusting, padding, and otherwise engineering the raw prompt before it is sent to the LLM can achieve many of the desired benefits at a fraction of the cost. This process, aptly called prompt engineering, is much more common these days compared with fine-tuning. The first step is to send a system message to the LLM to provide operational instructions, the tone, and the parameters of the conversation. The prompt is also passed to a search engine separately as we explained in the last segment. The results from the search service will serve as the context for the foundation model. Adding the resulting context to the raw prompt before sending it to the LLM is another significant step in prompt engineering. Some more advanced features such as preserving the history of the conversation, bifurcating the search depending on the type of the request, choosing different models based on the prompt, and similar enhancements can all be accomplished by the Orchestrator in the prompt engineering approach. Libraries such as LangChain and Semantic Kernel provide several advanced features for prompt engineering. The table below depicts the two approaches and a comparison of relative pros and cons.

    image.thumb.png.cad5dd4c4d2f4dc9809ffd25c23099c8.png

    ChatGPT deployment in Spotfire

    The first demo in this article showed an example of ChatGPT in Spotfire. Embedding this service inside Spotfire can be easily accomplished through a Spotfire® Data Function. All you need is your own OpenAI API key in the code which can be obtained from your OpenAI account. If you prefer to use other APIs you can simply swap OpenAI with the service of your choice. There are many simple examples of similar deployments available as open-source online. Note that any data sharing would be as if the data is shared with the API service. Always check with your organization's policies before using the generative AI services and never share sensitive data or information.

    Have questions? You can email datascience@spotfire.com or leave a comment on this post!


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...