Jump to content
  • Spotfire® Server Disaster Recovery Options


    Introduction

    Often times, customers want to setup a Spotfire environment to support disaster recovery.   This article discusses options for setting up disaster recovery within a Spotfire environment.  The DR scenario discussed has a hot production environment and a cold DR environment.  The scenario has a production environment that is active while the DR environment is off-line.  The DR environment will come on-line when needed.   Since the Spotfire Server documentation has a section on what to Backup and restore to recover from a crash, that information will not be rehashed here.  This article focuses on how to configure Spotfire in order to allow one to have a Production environment and a Disaster Recovery environment.

    The information in this article applies to Spotfire Server 7.5 and above.  Within the Spotfire environment, the Spotfire Server(s) and Node Manager(s) get unique ids in the Spotfire database such that one cannot easily make copies of environments and expect them to work. The three high-level options discussed in this article are:

    1. For Spotfire 7.5 and above, setup Production and DR to be clustered together and then turn-off the DR environment until needed.
    2. For Spotfire 7.8 and above, make an OS level copy of the Production machines and then do some file level configuration manipulations
    3. For Spotfire 7.9 and above, one can use the Sites feature to have a Production Site and a DR Site as separate environments in the same database.

    The third option will build on the first option.   For all options, the Spotfire database will need to be replicated from Production to DR.  Typically, this replication is done using the method of database replication supported by the database and is outside the scope of this article.  Option #3 is similar to Option #1, but uses the new Sites feature to effectively treat the Production and DR as two sites in the same installation.  The following sections go through the steps for the two options showing some screens where appropriate.

    Option #1: Setup Production and DR environments clustered together

    This option assumes that the Production and DR environments can communicate over the network.  This option will work for any environment with Spotfire Server 7.5 or later. 

    Step 1.1: Install Components in Production and DR

    Install the Production and DR Spotfire Server(s) and the Production and DR Node Manager(s) following the installation documentation.  While it is not critical, one can use the URL for a Production Spotfire Server when installing the Production Node Manager(s) and the URL for a DR Spotfire Server when installing the DR Node Manager(s).  

    Step 1.2: Configure Production Spotfire Server using the Configuration Tool

    In this example, an alias for the actual database server is used so that one can more easily switch over to a the DR database when needed.  This screen shot shows the bootstrap setup with the database hostname as win-db-alias instead of win-db.

    prodconfig_aliaseddb.png.df4c4517be96dc79482060c8468ddf6c.png

    Complete the configuration for authentication and other items.  

    Step 1.3: Enable Clustering in the Spotfire Server Configuration

    In this option, the servers will be clustered with the DR server offline until needed.  This screen shows how to enable clustering in the Spotfire Server configuration.

    prodclustering.png.276e2081977285832e92e55e871c18d9.png

    After completing the configuration, save the configuration and complete the setup of the environment - administration users and deployment.

    Step 1.4: Setup Node Manager Services

    Make sure that the Node Manager service is running and then trust the node once it appears in the "Untrusted nodes" list.  

    prodtrusnodes.thumb.png.09839fefc4247b7df6c3ea8a6d091562.png

    Once the node comes on-line completely, one can then add Web Player or Automation Services instances as seen here.

    prodnewnode.thumb.png.ecf1f905c56201fb2acdf8b086d821e0.png

    In this example, I end up with one Web Player instance and one Automation Services instance.

    prodnodecomplete.png.cae39111882aed44734e055c57687a6c.png

    Step 1.5: Configure DR Server Bootstrap file

    On the DR Spotfire Server, create the bootstrap which for now points to the Production Spotfire Server database.  One can use an alias, e.g. win-db-alias, as done in this example:

    drbootstrapfile_0.png.b1a3034cd8ea8acda4798c26ccd181a9.png

    Step 1.6: Start up DR Spotfire Server and Node Managers Services

    After starting up the DR Spotfire Server (win-dr-tss.pwm.com), it should appear in the list of Spotfire Servers.

    prod_dr_spotfireservers.png.2b7ef51132042c624f472bf102ef672e.png

    After the DR Spotfire Node Manager is started, it will show up in the list of untrusted nodes.

    prodconfig_aliaseddb.thumb.png.3edc8e26c5260089b8bf0fd8b219a601.png

    Step 1.7: Add services to DR Spotfire Node Manager

    Once the DR Spotfire Node Manager is trusted, one can add Web Player and Automation Services services to it.  

    prod_dr_nodes.png.cc5cb0f6b5006a5f5d0fadd069026786.png

    Step 1.8: Stop the DR Spotfire Server service and the DR Spotfire Node Manager service

    After the DR Nodes have been configured, the DR Spotfire Server Windows Service and the Node Manager Windows Services should be stopped and possibly set to Manual restart.  The reason for Manual restart is to ensure that they are only started when needed and not if the machines restart for some other reason. 

    When the services are stopped, they will appear in the Spotfire Server Administration Console as offline.

    prodonline_droffline.thumb.png.5af2185e7566ed1235cd386b5b5f2a61.png

    Step 1.9: (optional) Modify the DR Spotfire Server bootstrap file

    If needed, modify the bootstrap.xml file for the DR Spotfire Server(s) located in <install dir>/tomcat/webapps/spotfire/WEB-INF.  The database connection URL may need to be modified to point to the DR Spotfire Server database.  This will need to be done if the Spotfire Server SQL Server database is not manipulated using DNS but by specific machine name.  In this example, I am using an alias for the Spotfire database, so I will just need to modify the DNS entry when the Production environment goes down.

    dr_tss_bootstrap.thumb.png.a976066a60a0575d30314389b3e042ff.png

    One would change the server name in the connection URL in the <database-url> tag of the bootstrap.xml file.

    Step 1.11: Replicate the Production database to the DR database

    Using the vendor database technology or some other technology, the Production database can be replicated to the DR database.  This will keep the DR database up to date with any changes in the Spotfire database, including configuration changes, the Spotfire Library, user directory, etc.

    When a disaster recovery situation occurs: 

    (1) Start the DR Spotfire Server Service(s)

    (2) Start the DR Node Manager Service(s) 

    This screen shot shows what it would look like when one is in a DR scenario with the DR environment on-line and the Production environment off-line.

    dr_online_monitordiag_0.thumb.png.7fd686918670c0075b31d7a89956e6a0.png

    Other items to consider: Hotfix updates and Resource Pools

    Note that any hot fix updates or updates to Resource Pool configuration will have to be duplicated in the DR environment.  One can bring the DR environment up for updating and manipulating the configuration.  Additional details may be needed for resource pools and scheduling.  Schedules are assigned to resource pools.

    For Hotfix updates, the DR Web Player Node Managers need to be up so that one can update the service.  Particular Web Player and Automation Serivces instances could be shutdown so that users cannot access the WP nodes as seen in this screen shot:

    nm_dr_wpinstance_shutdown.png.f82a78b10eaeddfd6f00daeed8bedabd.png

    Hotfix updates should be done during an outage time window.  When one is updating the Production Nodes, the DR Nodes can be brought online and updated at the same time.  

    For Resource Pools, the Node Manager and Web Player Services need to be up, but the specific Web Player instances can be down and offline as seen in the screen shot above.  When the Web Player instances are offline, they can still be assigned to Resource Pools as this screen shot shows:

    dr_wpinstance_resourcepool_offline.thumb.png.6b17549ac074788c835b05b7fccd558a.png

    To review what needs to be done for hot fix updates:

    (1) Any Spotfire Server hot fix updates need to be applied to all Spotfire Servers - Production and DR

    (2) All deployment updates need to be pushed out to the Node Managers and Services in Production and DR.   For DR, the Services need to be up but the instances can be shutdown during the update.

    Once updates have been done.  One could test the DR site by adding it to its own Site (feature added in 7.9), so that active users are not sent to the DR site during testing.

    For Resource Pool manipulation:

    (1) The Web Player services need to be available but the specific instances can be offline.  

    (2) One can bring up the node and then shutdown the instances so that users are not sent to the instances.

    Option #1: Setup Production and DR environments clustered together has used out-of-the-box options and configuration settings to create a DR and Production environment.   The next option uses some manipulation of lower level components.

    Option #2: Copy Production Environment at the OS Level

    Much of the configuration information in the Spotfire environment is in the Spotfire database with some configuration information in configuration files, especially on the Node Managers.  One option for creating a DR environment from a Production environment is to make a copy of the DR components at the file system level.  Basically, copy the entire file system at the OS level to another machine.  

    Note that this option has a few "hacks" in it that may not work the same in all Spotfire versions.  I tested this on version 7.9 and have had customers test this on version 7.10.

    Once the Production Spotfire Server environment is up and running one can then copy the components to a new environment.  This option does not walk through setting up the Production environment but assumes that has already been done.

    In my test, I created copies of the machines using AWS.  I created an AMI from my Production Spotfire Server and Node Manager and then created the DR machines from those AMIs.  After bringing up the DR instances from the AMIs, I renamed the DR instances to be win-tss-dr2 and win-tssnm-dr2

    Step 2.1: Copy Production Database to DR Database

    Make a backup of the Production Database and then restore if for the DR database.  In a real DR scenario, the database will need to be kept in sync, typically, using the database synchronization options that are available.  In my testing, I made a backup of the database in SQL Server, copied the backup file to my DR database server, and then restored the backup into my DR database server.

    Step 2.2: Created a new bootstrap on the DR Spotfire Server

    Since the Spotfire Server machine is a copy of the Prodution machine, it is pointing at the wrong database and has the wrong site-id.  I created a new bootstrap on the DR Spotfire Server to point to the DR database which also fixes any site id information in the database specific to the Spotfire Server.

    Step 2.3: Started the Spotfire Server

    Started the Spotfire Server service after modifying the bootstrap to make sure it was okay.

    Step 2.3: Modfied configuration files on the Node Manager

    This is one of the steps that is a bit of a "hack" and could cause issues in the future.  On the Node Manager machine(s), the configuration files on disk need to be modified to match the new machine name information.  These files were modified in the tibco/tsnm/<version>/nm/config directory:

    • Modified config.json - replaced Production machine name with DR machine name for the following properties: host, serviceURL, serverName and addresses properties, e.g. replaced win-nodemanager with win-tssnm-dr2 for Node Manager and replaced win-tss with win-tss-dr2 for Spotfire Server.
    • Modified nodemanager.properties - replaced Production Node Manager and TSS machine names with DR machine names for nodemanager.host.names, server.name, nodemanager.supervisor.known properties.

    Saved these files back to disk.

    Step 2.4: Revoke trust of and delete Production node manager

    In the Spotfire Server Web Administration Console, one needs to revoke the trust of the Production node manager.  Before I did this, I was not able to get the new DR node listed or trusted at all, since it has the same id as the Production node manager.  It wouldn't even show up as an untrusted node.  

    prod_revoketrust_dr_0.thumb.png.43c17415e0ef73b53ce84dc4b9284873.png

    After revoking trust, Production Node Manager shows up as an untrusted node.  For now, do nothing and let it be. In the next step something interesting happens.

    prod_nm_untrusted.png.552c4f4aad9c99e0bef50a892bb9191b.png

    Step 2.5: Started the Node Manager service

    Start the Node Manager service on the machines after revoking the trust for any Production Node Manager(s).  Because the machines are copies of each other, the production machines need to be untrusted so that the new nodes can be trusted and get new certificates.  The certificates have been copied over in the OS level copy but the DR Spotfire Server has a different name and therefore needs to provide a new certificate.

    After the DR Node Manager has been started for a few minutes, the name in the untrusted nodes will change from the Production name (win-nodemanager) to the DR name (win-TSSNM-DR2) as seen in this screen shot.  I did nothing but wait a few minutes.

    dr_nm_untrusted.png.9aee151eb3fcb89d556ac134c3449371.png

    Step 2.6: Trust the Node Manager

    In the Spotfire Server Web Administration Console under Nodes & Services, once the node has started up, it should now show up as an untrusted node.  Once it is trusted and comes up, all the instances and services should show up.  Because I did not delete the Production Node Manager, all the Resource Pool information came over.  Remember that under-the-hood, the same ids are being used for the instances since the OS level copy has brought over all the same ids.

    Option #2: Copy Production Environment at Copy the OS Level has shown how one can go under the covers a bit and modify configuration files to produce a DR environment.  The main item in this is that an OS level copy was made, e.g. a copy of the Production machines was used to create the DR environment.

    The next option discusses using the Sites feature.

    Option #3: Move the DR components to their own site

    This option can be done after Step 1.8 in Option #1.  If there is only one Spotfire Server in the Default and DRSite, then Clustering that was enabled in Step 1.3 can be disabled.  In this example, we will move the DR Spotfire Server and Node Manager to a new site called "DRSite."  Since we are starting after Step 1.8 from Option #1, it is assumed that the services are already stopped.  The steps to create a site and manipulate it are in the Sites section of the Spotfire Server documentation.

    Resource Pools are specified by site as are routing rules and schedules so that the DR site will not know about any of the scheduling, routing rules, nor resource pools created in the Production Site.  I would recommend AGAINST using the Site feature for supporting DR for this reason.  This section is kept in here for informational purposes.  The use of Sites could be useful when one wants to test the DR site and not bring down Production.  One could add the DR components to a DR Site, bring them up, manually go to the DR Site using a URL specific to the DR Site, do some testing, bring down the components and then move all the components back into the Default Site.  During testing, setting up a temporary DR Site will allow one to test the DR environment without affecting users.  The users will continue to use the Default Site (Production) and not go to the DR Site.

    If one has a Production environment with different Sites, then one needs to consider that when setting up DR environments.

    Step 3.10: Create the site, add the DR Spotfire Server to the site and start the DR Spotfire Server service

    Open a command line window and change to the <server installation directory>/tomcat/bin directory.  Run the following command:

    config create-site -s DRSite

    After creating the site, now one can add the DR Spotfire Server to the site.  One needs the ID of the site which can be found on the server information page in the Spotfire Server administration web console:

    dr_tss_id_with_highlight.png.f04a8dad4d24951c04cc31eaf7128eb9.png

    One can add the DR Spotfire Server to the site using this command:

    config set-site -n 7f9d0743-1085-4239-aabc-ae488087adb6 -s DRSite

    This screen shot shows the commands run to create the site and add the Spotfire Server to the site:

    createsitecmds.png.ada1345ab6e792d446357377b0e8aa1d.png

    As one can see from the screen shot, I made a typo in the server id on my first try so one may want to copy it instead of typing it out.

    After adding the DR Spotfire Server to the DRSite, the Spotfire Server service can be started.

    Step 3.11: Edit the DR Node Manager properties to get the DR Node Manager in the DRSite

    Open the nodemanager.properties file in the <node manager installation>/nm/config directory for editing. 

    Delete the line that begins with nodemanager.supervisor.known.  This will be added back in when the Node Manager connects to the site.  The server.name, server.backend.registration.port, and server.backend.communication.port should be modified to point to a Spotfire Server in the DRSite.

    After the edits are complete, one can save and close the file, and then start the node manager service.

    One can then connect to the DR Spotfire Server and one should see the DR Services only.

    drsite_nodes.thumb.png.55e9ec41efa5f06f0820d37bc46d581d.png

    Once the site is setup correctly, then the DR Spotfire Server and Node Manager services can be stopped until they are needed in a DR scenario.  As before, the Production Spotfire database has to be replicated to the DR Spotfire database.

    It is unclear if using Sites for this would be an advantage or not since one would have to manipulate the resource pools and scheduling separately.  Using sites, one does not see the DR components as part of the Default Production site.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...