Alarms Engine
Alarms Engine helps decide if a network resource has a problem. Alarms Engine applies conditions defined on data obtained via resource monitoring and decides to mark the status of the resource (a monitor) as Down, Critical, Trouble or Up. This uptime check can be configured in Threshold and Availability and Notification Profile.
Internet Services Monitoring
Monitors like Website, Web Application, DNS, FTP etc. are categorized as Internet Services Monitors. For these monitors, the Alarms Engine monitors the performance and availability from multiple locations. Also Site24x7 eliminates false alarms by applying the "False Alarm Protector".
Whenever a downtime is detected, Site24x7 takes a screenshot via a Real Browser for website checks. To eliminate network failures, Site24x7 will look for any other monitored resources which are available during the same period. If any other monitor is up, it will conclude that this particular monitor is down and the alert will be triggered. If the up notification is not received for any other monitor, Site24x7 checks the accessibility of known websites and would determine the network status. Moreover when website downtimes are returned by error codes thrown by the browser, the alarms engine examines them from other global locations(secondary) and then confirms whether a website is down or not. And when a website is marked down, persistent monitoring is done every minute to reduce the downtime period.
Thresholds on Performance
Apart from uptime monitoring, Site24x7 also examines the performance of your resources, validates response and notifies if there is any problem detected by sending severity status as Trouble, Critical, Down etc. Alarms Engine ensures the validity of data so that corrective actions can be taken when a particular keyword is present or not in your web page. For example, keywords like "Exception", "Error", "Page Not Found" will trigger an alert when present in the web page. Site24x7 also checks for the presence of non-static keywords in your site that are either generated by your scripts (JSP or ASP) or output from your back-end server and also triggers alert when unauthorized changes are made to the web page.
Site24x7 has smart alerting for some of the metrics like Response Time URLs, CPU and Memory Utilization for servers built in.
The Trouble or Critical status is generated based on the following conditions mentioned below:
Advanced Threshold Settings (Strategy):
Threshold and Availability Profiles help the alarms engine decide if a specific resource has to be declared CRITICAL or TROUBLE. Configure Downtime Rules to reduce false alerts for monitors. Individual monitors have unique sets of threshold values that can be configured. Once defined, the threshold profile can be associated with a monitor to trigger trouble or critical alerts, when the set threshold is breached. Using the advanced threshold settings, you can even set trouble or critical alert conditions for all parameters. For example, you can configure thresholds for response time spikes for both Primary and Secondary locations. Poll count serves as the default strategy to validate the threshold breach. You can validate threshold breach by applying multiple conditions (>, <, >=, <=) on your specified threshold strategy. The monitor’s status changes to ”Trouble or Critical” when the condition applied to any of the below threshold strategies hold true:
- Threshold condition validated during the poll count (number of polls): Monitor’s status changes to trouble or critical, when the condition applied to the threshold value is continuously validated for the specified “Poll count”.
- Average value during poll count (number of polls): Monitor’s status changes to trouble or critical, when the average of the attribute values, for the number of polls configured, continuously justifies the condition applied on the threshold value.
- Condition validated during time duration (in minutes): When the specified condition applied on the threshold value is continuously validated, for all the polls, during the time duration configured, monitor’s status changes to trouble or critical.
- Average value during time duration (in minutes): Monitor’s status changes to trouble or critical, when the average of the attribute values, for the time duration configured, continuously justifies the condition applied on the threshold value.
Multiple poll check strategy will not be applied by default. During the conditions where no strategy could be applied, the threshold breach will be validated for a single poll alone.
To make sure the condition applied on the strategy “Strategy-3: Time duration or Strategy-4: Average value during time duration” for threshold breach detection works as intended, you must ensure that you specify a time duration which is at least twice the applied check frequency for that monitor.
To know how Alarms Engine keeps an eye on your server uptime, refer here.
E-mail sample of the RCA report generated during a server downtime
-
On this page
- Internet Services Monitoring
- Thresholds on Performance