Writing a management pack is an art.
There are so many ways to achieve a specific monitoring implementation, but choosing the best one is the challenge that makes it all worth it.
It is nice if a management pack does what it is supposed to do, but it is beautiful if it does it smoothly without requiring a lot of maintenance.
I want to provide some tips/practices you should take into account when designing your management packs.
Use native modules when possible
Operations Manager offers lots of modules out of the box. Why should you for example use the native service-monitor or event-capturer instead of writing a killer script? Well:
- The native modules are trigger-based, which means they start generating events the instant a service goes down or an event is generated. If you use a script, you will always have an interval-based trigger. When a problem occurs, it is always better to be notified on it right away, instead of waiting for a certain probe action to occur.
- The performance impact of a module is lower than a script because it uses native libraries which don’t have much overhead compared to scripts
- In terms of cost: it requires less time and effort to use a pre-built module than create a whole new script that essentially does the same, but slower.
If you have to script, avoid PowerShell
To make myself clear: I LOVE POWERSHELL! But I must be honest, compared to older scripting languages, it is a really bulky language in terms of speed. When using scripts to monitor your environment, they should be as unobtrusive as possible. I would use PowerShell only if achieving the same result in another scripting language would take tremendously more effort to do. Otherwise the speed and reduced overhead of a VBScript is generally preferred over the superior features of PowerShell.
Choose your intervals wisely
The number one rule to remember (also taking the 2 previous tips in account) is: minimize the overhead monitoring creates on a server. This can be done by choosing the best tools to perform the job, but can also be as simple as tuning the parameters correctly. Trigger-based monitoring excepted, SCOM uses interval to control how often a certain condition is checked on a target system. An interval of 10 seconds is, as you can imagine, very detrimental to your servers health, as it executes the probe-action almost continuously. In order to allow the most breathing room to a system so it can perform its core functions the best, I recommend the following best practices:
- Discovery intervals: 1 day or higher
- Monitor/Rule intervals: 5 minutes or higher
Depending on the specific requirements (volatile data, very critical monitoring), you can choose not to follow this of course, but it is best to keep the amount of exceptions at a minimum!
Use optimized collection on performance collection rules
Optimized collection means that if 2 or more adjacent performance samples taken from a system are equal within a certain margin, no value is written to the database for the second sample. This saves database space, which is really a valuable currency on a large SCOM environment. You can tweak this further by specifying a maximum amount of collections that can occur before a sample is eventually stored in the database.
When creating a management pack with custom modules, try to generalize your module as much as possible. This means, try to create a single, solid module that can serve multiple monitors and rules. Why? Let the explanation from the cookdown analyser say it all:
|The Operations Manager agent or server is capable of running many hundreds, or even thousands, of workflows at any given time. These workflows are defined in management packs as mechanisms that run on a system to provide monitoring and health data; for example, discoveries, monitors, and rules and all workflows. In a typical customer environment, where multiple management packs are being used at one time to monitor all applications in an environment, the result is that many workflows are running simultaneously.
Each workflow that is loaded utilizes a small portion of system resources. For optimal performance of Operations Manager, the fewer system resources that are taken up for monitoring, the better. As a management pack author, it is important to consider minimizing the usage of system resources by your individual management pack to create an optimal experience for customers using your management pack.
One of the most impactful ways to reduce to usage of system resources by your management pack is through cookdown. Cookdown is a concept in Operations Manager which consolidates workflows that use the same configuration. By consolidating workflows, fewer workflows are launched at once, and fewer resources are used to run, for example, a single workflow as opposed to fifty distinct workflows.
To say it simple: better to run a single script on a system that will feed data to a monitor and 2 rules, than running a single script per monitor and rule. This obviously reduces overhead tremendously, and as it is performed automatically, it is easy to implement. It is important however that all items using the same datasource are configured similarly (intervals, script parameters,….).
Assess Benefit/Space Cost
Important to remember is: for every rule, monitor, discovery,… you create in Operations Manager data is generated and stored in the database. Therefore, you should consider if you really need this performance rule enabled out of the box, or really want to collect this event. If you have a big environment, you should imagine having a frequently occurring event being collected on each server. If your data warehouse has a standard retention of 400 days, this can result in a large and sometimes unnecessary bulky database size. Try to make your management pack as slim as possible, while retaining the base requirements for the service you are about to monitor.
These are some of the principle design ethics I use for every management pack I create. Do you have any variations, or other tips you’d like to share? Feel free to comment!