OpsMgr’s processes are constantly collecting, transferring and aggregating data. Depending on the amount of data being processed and the overall amount of config churn, this can have a reasonable performance impact.
A nice blogpost by J.C. Hornbeck described the whole OpsMgr processing model within the context of the Exchange 2010 Management Pack.
This did not only describe best practices for the management pack, but also included tweaks for OpsMgr-servers.
I made a quick summary (taken over from the blogpost) for these tweaks, and it should be considered for environments where lots of monitoring-data will be processed simultaneously.
SCOM performance tweaks
SCOM CU3+ is highly recommended as there are quite a few performance based fixes including setting the standard agent queue size at 100 MB instead of the old 15 MB.
Additionally, there are Registry Keys to update to allow the RMS to more effectively utilize the server resources and reduce additional unneeded churn. The tables below covers some of these keys:
Finally for the RMS, ensure that there are no agents reporting directly to the RMS whenever possible. The Exchange 2010 MP hosts a lot of “Non-Hosted” managed objects on the RMS which has to process a lot of health states as well as all alerting occurs from the RMS. Having the RMS process agent processing and dataflow should if at all possible be avoided.
On all Management Server(s):
For all Management Servers including the RMS, there are a few more registry keys to update to allow for better resource utilization for SCOM processing. The table below covers some of these keys:
Note: These updates do not apply to gateway servers.
SCOM Data warehouse: (Where applicable.)
The Exchange 2010 MP adds some new Datasets to the SCOM DW for custom reporting. These new datasets have their own set of aggregations that can take a bit more time to complete than normal. Thus, you need to increase the timeout for DW processing to allow these aggregations to finish.
Note: This needs to be done on every Management Server (including the RMS).
Note: This may require to be up to 30 minutes for the timeout depending on how much data flow there is. (Mainly event and performance collection.)