Threshold and Availability

Thresholds are established preset values for monitored metrics, which, when exceeded, trigger alerts. This helps in promptly addressing performance issues. The platform's availability monitoring involves consistent checks on resource accessibility to ensure operational status (Up, Down, Trouble, Critical), with generated alerts notifying of any change via your preferred communication medium.

Note

A default threshold and availability profile for the resource type will be automatically listed in Threshold and Availability screen when the users for the first time, log in to Site24x7 and click Admin > Inventory > Monitors > Add Monitor and select the desired service.

Add Threshold and Availability

To add a new threshold and availability profile,

Log in to Site24x7.
Click Admin > Configuration Profiles > Threshold and Availability.
Click Add Threshold and Availability in Threshold and Availability screen.
Specify the following details for adding threshold and downtime rules for the service:
- Choose Monitor Type: Select the desired monitor type from the drop down list.
- Display Name: Provide a label for identification purpose.
- Number of location to report monitor as down: Choose from the drop down to receive alert notification when the web service is down from specified number of locations.
Threshold Configuration
The threshold configuration will be set based on the values you provide for the sections below. A combination of these will be available for configuration, according to the monitor type you choose.
- Condition: Choose amongst <,>, =, <=, or>= to set the criteria to trigger alerts based on its operation with the input value.
- Threshold value: This is a value you can provide for any performance metric. For example, it can be CPU usage percentage, memory usage, or network latency. The values are compared against the conditions you've defined to determine whether an alert should be triggered.
- Poll Strategy: Poll strategy defines the interval at which Site24x7 sends requests to gather metrics. It varies depending on the resource type and the level of granularity you require in monitoring. For example, you might choose to poll a critical resource more frequently but poll less critical resources at much more extended intervals.
- Poll Value: The poll value represents the latest data point collected from a monitored resource during a polling interval. It's the actual measurement that is being monitored. This value is used to evaluate against the conditions you've set.
- Notify As: It defines the status under which alerts should be sent. It allows you to customize when you want to receive notifications about a resource's status change. For instance, based on the values you provide, you can opt to get notified immediately when the value is breached based on the criticality.
  
  5.1 Zia-based threshold:
  The AI-based threshold will track the abnormal spikes using anomaly detection and will offer a dynamic threshold which will be updated accordingly. If you're choosing AI-based threshold, choose associated anomaly severity and the status accordingly.
Advanced Threshold

Advanced thresholds allow users to combine multiple conditions across different attributes, ensuring that alerts are triggered only when meaningful patterns or anomalies occur. You can send customized alerts based on multiple dependent attributes for a single resource (for example, CPU and process metrics of a server) using the logical operators && (AND) and || (OR).
Conditions combining multiple real-time attributes can be set using logical operators. For example, consider a condition defined by the expression A &&(( B && C )|| D) is set to trigger a Critical alert. Here, you can configure:
- A as CPU Utilization and set its threshold as > 80%
- B as Memory Utilization and set its threshold as >75%
- C as Disk I/O wait time and set its threshold as 60 ms
- D as Number of Active Processes and set its threshold as > 200
  
  An attribute cannot be used multiple times within a single condition but it can be used across different conditions. For example, CPU Utilization cannot be configured multiple times within a single condition; however, it can be used in two different conditions. While setting advanced thresholds, Poll Strategy, Poll Value, Notify As, and Automation can be configured.
Note
- You can configure a condition to trigger one of the three statuses: Trouble, Critical, or Down.
- For each status, only one condition can be configured. To add more than one condition for a different status, click the + icon on the right.
- Advanced thresholds are available only for monitor-level attributes and not the child attributes.
Use cases
- A DevOps team managing a data center needs to identify early signs of resource contention to prevent unnecessary scaling. Using advanced thresholds, they configure an alert that triggers when CPU Utilization exceeds 80% and Memory Utilization surpasses 85%, or when Swap Utilization exceeds 70%. This helps detect early resource contention while keeping the system operational.
  Condition: ((a > 80 && b > 85) || c > 70%)
- When a web application experiences performance degradation due to high resource consumption, the IT operations team sets up a Critical alert that triggers if both CPU Utilization exceeds 85% and Memory Utilization exceeds 90%, or if Disk Space falls below 10% and Network Utilization surpasses 90%. This flags performance issues requiring immediate attention.
  Condition: (a > 85% && b > 90%) || (c < 10% && d> 90%)
- A Down alert is activated when a database server nears a complete system failure. To prevent false positives from isolated spikes, a system administrator configures a Down alert to be triggered only when the system is in a complete overload state. It is triggered only when CPU Utilization > 90%, Memory Utilization > 95%, and Swap Utilization > 90% occur simultaneously, ensuring the system is truly at risk before raising an alert.
  Condition: (a > 90 && b > 95 && c > 90%)
Click Save.
The threshold and availability profile created for the service will be automatically listed in Threshold and Availability screen along with the others already created.

Edit Threshold and Availability

Click the profile which you want to edit.
Edit the parameters which needs to be changed in Add Threshold and Availability window.
Click Save.

Delete the Threshold and Availability

Click the profile in the Threshold and Availability screen which needs to be deleted.
This will navigate to Add Threshold and Availability window.
Click Delete.

You can always clone a threshold profile or delete it from the Threshold Profiles list by accessing the hamburger icon. Configure Downtime Rules to reduce false alerts for the following monitors: