A while ago, i was asked to set up alerting for a number of windows servers that would trigger when a critical component would change status, in this example when either the DNS or Active directory service is NOT RUNNING
This post will describe the set up for this kind of alerting. By no means do I claim that this is the only way to achieve this kind of alerting, because it probably is not. But at least it will provide a run down on the components that are required to achieve this. If anyone has any feedback or alternative methods, please drop me a comment below.
The primary requirement for this sort of monitoring, be it through application templates or AppInsight, solarwinds, requires WMI access to the servers in question, usually this is done through some AD service account, which password does not expire. Again, simple SNMP and ICMP access to you server, is not going to work.
This post will follow a number of steps in chronological order:
1-create your application (include you components monitors)
2-assign nodes to the application
3-set up an alert that gets triggered when the application changes status.
so lets get started
1-Create your application (include you components monitors)
This is the first step, really what you need to remember is that by creating an application, solarwinds will start polling its components and this way you can create a trigger resulting in a certain action as soon as the component changes status. I am using the word component here (and so does Solarwinds), because an application is built up from one or more components, depending on what you deem necessary to monitor.
For this post, I will use an existing template called, "Active Directory 2008 R2 - 2012 Services and Counters". For this, go to settings> SAM settings > manage Templates.
Here you can copy an existing, out of the box template and modify it as required.
Lets drill into the template and see what components make up the application (template). I will continue to call it 'application' from now on, as this truely reflects what is actually does. Although i still feel its a somewhat confusing term to use. Ok, so in the picture below you can see some of the component monitors that make up the application. In this example I disable all component monitors but the DNS service.
|Fig. 2 application component monitors|
so the detailed view of this component monitor is as follows (sorry i had to keep the pic so big, so you can actually read it):
|Fig. 3 component monitor detail|
As you can see the fetching method is WMI, so again, this will only work when solarwinds has the appropriate AD access/service account to actually tap into WMI. You dont really need to filll out the creds just yet, because this will be done once assigning nodes to this application, but you can to test access is sucessfull on certain nodes, by setting Test Node and Test, as can be seen on top of the fig.3. screenshot.
2-assign nodes to the application
So now you have created your application, you will need to assign it to individual nodes, or a group of nodes. because I want to assign my application to Active Directory servers, I will add it to a group that contains all AD servers in my organisation, in my case I used a dynamic query, so that shouold new servers get added in the future the would automaticall be assign the application to be monitored (see Fig.4)
|Fig. 4 assign application to nodes|
Next, you will need to assign the WMI credentials that the group will use. in this case we use a single service account. See Fig. 5
|Fig. 5 choose the WMNI creds|
So at this point we have created our application and assigned it to a group of nodes (active directory servers in this case). the final step would be to actually alert when these service get stopped, or better, when the application changes state, when one or more of its components changes status.
3- Create an alert that triggers when the application changes status.
As always, go to settings> mange alerts and create a new alert.
Really the only interesting part of this alert is the trigger, because somehow it needs to tie back to your application created in step 1. So create an application trigger for this. As per fig. 6. below
|Fig. 6 Alert Trigger|
I add a double citerium for the scope, that is can only alert when the node is actually up, otherwise it will alert when the node goes down for a reboot. Because i have already applied the application to the node group, in step 2, there is no need to redifine the nodes that are in scope, its just the application name (which needs to be the same as the on used in step 1), that we need to define. After that its the same as any alert. Maybe spend some time on the action and if you add a email action, have it contain the service that triggered the alert so your NOC people know what to do, rather than trying to solve a puzzle.
In the screeen shot below i have added the component status to be included in the email.
the actual email will then look something like this:Its not a real good example, bacause the email above, is a result of an alert simulation, so services are up, but when a service in the aplication would be truely down, it will tell the engineer what service has the issue