Threshold and Availability for SQL Server
Once the SQL server monitor is successfully added to your Site24x7 account, add a threshold and availability profile to help the alarms engine decide if a specific resource has to be declared critical or down. Configure downtime rules to reduce false alerts.
Tip: While setting up a threshold profile, you can also map IT Automation(s) to desired attribute(s). Once the threshold is breached, the corrective automation will be executed and the issue can be fixed without manual intervention. You can map upto five corrective automations per attribute.
- Add a threshold profile
- List of metrics supported
- Edit a threshold and availability profile
- Delete a threshold and availability profile
Add a Threshold Profile
- Log in to Site24x7.
- Go to Admin > Configuration Profiles > Threshold and Availability > Add Threshold Profile. You can also navigate via Server > Server Monitor > Servers > click on the server monitor > hover on the hamburger icon beside the display name > Edit > Configuration Profiles > pencil icon beside Threshold and Availability.
- Specify the following details:
- Monitor Type: Select Microsoft SQL Server from the drop-down.
- Display Name: Specify an appropriate name for the threshold and availability profile.
- Threshold Type: You can choose between Static and AI-based thresholds. Refer the below section for the entire list of metrics for which thresholds can be set.
- Static Thresholds: From the drop-down menu, choose the desired metrics for which thresholds need to be configured. Enter a value specific to the unit, and set the threshold conditions (<, <=, =, >, or >=) and the monitor state (to be notified as) for each metric. You'll receive alerts when these threshold conditions are violated.
- AI-based Thresholds: The AI-based threshold will track the abnormal spikes using anomaly detection and will offer a dynamic threshold which will be updated accordingly. If you're choosing AI-based threshold, choose associated anomaly severity and the status accordingly.
- General Thresholds: Logins, batch requests, SQL compilations, replication merge conflicts
- Memory Thresholds: Target server memory, total server memory, target server memory, SQL cache memory, optimizer memory, granted workspace memory, memory grants pending
- Buffer Manager Thresholds: Checkpoint pages, lazy writes, page reads, page splits, page writes, full scans, number of errors
- Job and Plan Thresholds: Plan cache hit ratio, cache pages, cache objects, queued jobs, failed jobs
- Lock and Latch Thresholds: Lock requests, lock timeouts, deadlocks, average lock wait time
- Advanced Threshold Settings (Strategy):
Poll count serves as the default strategy to validate the threshold breach. You can validate threshold breach by applying multiple conditions (>, <, =, >=, <=)on your specified threshold strategy. The monitor’s status changes to Trouble or Critical when the condition applied to any of the below threshold strategies hold true:- Threshold condition validated during the poll count (number of polls): Monitor’s status changes to Trouble or Critical when the condition applied to the threshold value is continuously validated for the specified “Poll count”.
- Average value during poll count (number of polls): Monitor’s status changes to Trouble or Critical, when the average of the attribute values, for the number of polls configured, continuously justifies the condition applied on the threshold value.
- Condition validated during time duration (in minutes): When the specified condition applied on the threshold value is continuously validated, for all the polls, during the time duration configured, monitor’s status changes to Trouble or Critical.
- Average value during time duration (in minutes): Monitor’s status changes to Trouble or Critical, when the average of the attribute values, for the time duration configured, continuously justifies the condition applied on the threshold value.
Multiple poll check strategy will not be applied by default. During the conditions where no strategy could be applied, the threshold breach will be validated for a single poll alone.
NoteTo make sure the condition applied on the strategy “Strategy-3: Time duration or Strategy-4: Average value during time duration” for threshold breach detection works as intended, you must ensure that you specify a time duration which is at least twice the applied check frequency for that monitor.
- Click Save. The threshold and availability profile created for the SQL server monitor will be automatically listed in the Threshold and Availability screen along with the others already created.
List of Metrics Supported
- General Thresholds:
- Notify when SQL agent service is down: Notifies when SQL agent service is down. By default, it is set as No.
- Notify when SQL browser service is down: Notifies when SQL browser service is down. By default, it is set as No.
- User Connection: Get alerted as Trouble or Critical when the number of users connected to the system exceeds the specified value.
- Login: Get alerted as Trouble or Critical when the total number of logins started per second exceed the specified value.
- Logout: Get alerted as Trouble or Critical when the total number of logouts started per second exceeds the specified value.
- Batch Request: Enter a value to get notified as Trouble or Critical when the number of SQL batch requests received by the server per second meets the condition specified.
- SQL Compilations: Enter a value to get notified as Trouble or Critical when the number of SQL compilations per second meets the condition specified for the set value.
- SQL Re-Compilations: Enter a value to get notified as Trouble or Critical when the number of SQL re-compilations per second meets the condition specified for the set value.
- Replication Merge Conflicts: Specify a value to get notified as Trouble or Critical when the number of conflicts per second occurring during the merge process meets the condition specified for the configured value.
- Backup Device Throughput: Specify a value to get notified as Trouble or Critical when the read/write throughput for a backup device per second meets the condition specified for the configured value.
- Transaction Exceeds: Get alerted as Trouble or Critical when the longest running time of any transaction in seconds meets the condition specified for the specified value.
- Number of Errors: Get alerted as Trouble or Critical when the number of errors per second meets the condition set for the specified value.
- Page Splits: Specify a value to get notified as Trouble or Critical when the number of page split per second meets the condition specified for the set value.
- Full Scans: Get alerted as Trouble or Critical when number of unrestricted full scan per second meets the condition specified for the configured value.
- Memory Thresholds:
- Target Server Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is willing to consume meets the condition specified for the configured value.
- Total Server Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is currently consuming meets the condition specified for the configured value.
- SQL Cache Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is using for the dynamic SQL cache meets the condition specified for the configured value.
- Optimizer Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is using for query optimization meets the condition specified for the configured value.
- Connection Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is using for maintaining connections meets the condition specified for the configured value.
- Lock Memory: Get alerted as Trouble or Critical when the total amount of dynamic memory the server is using for locks meets the condition specified for the configured value.
- Granted Workspace Memory: Get alerted as Trouble or Critical when the total amount of memory granted to executing processes meets the condition specified for the configured value.
- Pending Memory Grants: Get alerted as Trouble or Critical when the current number of processes waiting for a workspace memory grant meets the condition specified for the configured value.
- Buffer Manager Thresholds:
- Buffer Cache Hit Ratio: Get alerted as Trouble or Critical when the percentage of pages that were found in the buffer pool without having to incur a read from disk is below the configured value.
- Checkpoint Pages: Get alerted as Trouble or Critical when the number of pages flushed by checkpoint or other operations that require all dirty pages to be flushed meets the condition specified for the configured value.
- Lazy Writes: Get alerted as Trouble or Critical when the number of buffers written by buffer manager's lazy writer meets the configured value.
- Page Life Expectancy: Get alerted as Trouble or Critical when the number of seconds a page will stay in the buffer pool without references meets the condition specified for the configured value.
- Page Read: Get alerted as Trouble or Critical when the number of physical database page reads issued per second meets the condition specified for the configured value.
- Page Write: Get alerted as Trouble or Critical when the number of physical database page writes issued per second meets the condition specified for the configured value.
- Total Page: Get alerted as Trouble or Critical when the number of pages in the buffer pool meets the condition specified for the configured value.
- DB Pages: Get alerted as Trouble or Critical when the number of pages in the buffer pool with database content meets the condition specified for the configured value.
- Free Pages: Get alerted as Trouble or Critical when the total number of pages on all free lists meets the condition specified for the configured value.
- Lock and Latch Thresholds:
- Lock Requests: Get alerted as Trouble or Critical when the number of new locks and lock conversions requested per second from the lock manager meets the condition specified for the configured value.
- Lock Timeouts: Get alerted as Trouble or Critical when the number of lock requests that timed out per second meets the condition specified for the configured value.
- Lock Waits: Get alerted as Trouble or Critical when the total wait time (milliseconds) for locks in the last second meets the condition specified for the configured value.
- Deadlocks: Get alerted as Trouble or Critical when the number of lock requests per second that resulted in a deadlock meets the condition specified for the configured value.
- Average Lock Wait Time: Get alerted as Trouble or Critical when the average amount of wait time (milliseconds) for each lock request that resulted in a wait meets the condition specified for the configured value.
- Average Latch Wait Time: Get alerted as Trouble or Critical when the average latch wait time (milliseconds) for latch requests that had to wait meets the condition specified for the configured value.
- Latch Wait: Get alerted as Trouble or Critical when the number of latch requests that could not be granted immediately and had to wait before being granted meets the condition specified for the configured value.
- Job and Plan Thresholds:
- Plan Cache Hit Ratio: Get alerted as Trouble or Critical when the ratio between cache hits and lookups meet the condition specified for the configured value.
- Cache Pages: Get alerted as Trouble or Critical when the number of 8k pages used by cache objects meet the condition specified for the configured value.
- Cache Objects: Get alerted as Trouble or Critical when the number of cache objects in the cache meets the condition specified for the configured value.
- Active Jobs Count: Get alerted as Trouble or Critical when the number of running jobs meet the condition specified for the configured value.
- Job Success Rate: Get alerted as Trouble or Critical when the percentage of successful jobs from the total number of executed jobs meet the condition specified for the configured value.
- Queued Jobs: Get alerted as Trouble or Critical when the number of jobs queued meets the condition specified for the configured value.
- Failed Jobs: Get alerted as Trouble or Critical when the number of failed jobs meets the condition specified for the configured value.
- Database Thresholds:
- Data File Size: Trouble or Critical alerts when the data file size meets the condition specified for the configured value.
- Log Cache Hit Ratio: Receive Trouble or Critical alerts when the percentage of log cache meets the condition specified for the configured value.
- Log File Size: Receive Trouble or Critical alerts when the log file size meets the condition specified for the configured value.
- Log File Used Size: Receive Trouble or Critical alerts when the log file used size meets the condition specified for the configured value.
- Log File Used Size Percent: Receive Trouble or Critical alerts when the percentage of log file used size meets the condition specified for the configured value.
- Log Growths: Receive Trouble or Critical alerts when the total number of log growths meets the condition specified for the configured value.
- Log Shrinks: Receive Trouble or Critical alerts when the number of log shrinks meets the condition specified for the configured value.
- Transaction/sec: Receive Trouble or Critical alerts when the number of transactions meets the condition specified for the configured value.
- Active Transactions: Receive Trouble or Critical alerts when the number of active update transactions meets the condition specified for the configured value.
Edit a Threshold and Availability Profile
- Click the profile which you want to edit.
- Edit the parameters which needs to be changed in Add Threshold and Availability window.
- Click Save.
Delete a Threshold and Availability Profile
- Click the profile in the Threshold and Availability screen which needs to be deleted.
- This will navigate to Add Threshold and Availability window.
- Click Delete.