Please note: this post is written primarily for Operations Manager 2007, this procedure may or may not work for 2012
All best practices regarding SCOM usually dictate something like ‘backup your database, encryption key and management packs in order to be able to recover from a failure’. While it is very sound advice, it isn’t clear about about how to use these backups to recover from the following scenario:
How do I restore my management group if:
- My virtual VM running the RMS disappears without backup (it does happen!)
- My Physical RMS-server explodes, burns down, gets stolen, resigns,…
- It was the ONLY management server
- And, although optionally, while my SQL database was running on that RMS
Well first, give yourself or your backup admin a slap on the back of the head for not protecting the monitoring workload correctly (using DPM ;)).
Then you begin to wonder how you can put the heart and brains back in your management group with:
- A database backup, but no RMS server that can use it
- An encryption key, but no RMS and database to import it on
Well, while you will of course have downtime, it is perfectly possible to restore your RMS without losing existing data, or touching the remaining agents.
Here is what you’ll have to do:
- Reimage the RMS server, be sure that the name of the server is identical to the one of its previous instance
- Perform a new RMS installation, use the same service accounts as the RMS you are planning to restore. Use the DBWizard tool to create new, empty databases, preferably named identical to the ones you will restore later on.
- After the new RMS is up and running, patch it up the the same level as the previous installation.
- Now, if everything is patched and ready to go, shut down the 3 following RMS-services: data access, system center management and configuration service.
- Delete the health service state folder in the installation directory of the RMS
- Remove the new SCOM databases and put the backups in their place. If necessary, alter the database settings in the registry and the databases to reflect the new location/name of the databases
- Use the securestoragebackup tool in the support tools folder of the SCOM installation media to restore the encryption key
- Enable and start the services that were shut down in step 4
While this is a quick and dirty procedure to perform a recovery in an unoptimized environment, it will prevent that all data is lost, and that you’ll end ages to set up everything like the way it was. The added benefit is that the agents will notice that father is home again, and start checking back in without much of a hitch. I noticed that some service cache flushes might be needed, but it is a small price to pay in my opinion.
Please note that I highly recommend that you set up your SCOM environment properly, with multiple management servers that can take over the RMS-role and a rigid backup plan. An easy way to get a lot of flexibility on this part is to upgrade to 2012, which eliminates the RMS and allows you to deploy management servers in a pool configuration, increasing availability. Bottom line: always deploy at least 2 management servers!
But, if shit hits the fan hard, and you are caught with your pants down, let this article help you het out of the mess ;).