Help Docs

AWS Database Migration Service Monitoring Integration

AWS Database Migration Service (DMS) is a service designed to migrate data from one database to another. It supports both homogeneous migrations, such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora.

With Site24x7's integration with AWS DMS, you can monitor database endpoints at source and target, and ensure a seamless data migration. We help you address database workload challenges during migration by keeping a close watch on your AWS DMS replication tasks and replication instances.

Setup and configuration

1. If you haven't already, enable access to your AWS resources in your AWS account and Site24x7's AWS account by either:

  • Creating Site24x7 as an IAM user.
  • Creating a cross-account IAM role. Learn more

2. On the Integrate AWS Account page, check the appropriate box for DMS Replication Task and DMS Replication Instance. Learn more

Policy and permissions

Site24x7 uses various AWS DMS APIs to collect information about your migration service. Assign the AWS managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more

  • "dms:DescribeAccountAttributes",
  • "dms:DescribeReplicationInstances",
  • "dms:DescribeReplicationTasks",
  • "dms:DescribeTableStatistics",
  • "dms:DescribeCertificates",
  • "dms:DescribeConnections",
  • "dms:DescribeEndpoints",
  • "dms:ListTagsForResource",
  • "dms:DescribeEvents",
  • "logs:DescribeLogStreams",
  • "logs:GetLogEvents"

Polling Frequency

Site24x7 queries AWS to collect AWS DMS performance metrics according to the configured polling frequency. The polling interval is one hour by default. Learn more

IT Automations

You can add automations for the AWS services supported by Site24x7. Log in to Site24x7 and go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.

You can now start, stop, resume, and reload AWS DMS replication tasks automatically using the AWS Data Migration Service automations.

Performance metrics for AWS DMS replication tasks

AttributeDescriptionStatisticData type
Full Load Throughput Bandwidth Source Incoming data received from a full load from the source, measured in kilobytes per second. Average KB/sec
Full Load Throughput Bandwidth Target Outgoing data transmitted from a full load for the target, measured in kilobytes per second. Average KB/sec
Full Load Throughput Rows Source Incoming changes from a full load from the source, measured in rows per second. Average Count/sec
Full Load Throughput Rows Target Outgoing changes from a full load for the target, measured in rows per second. Average Count/sec
CDC Incoming Changes The total number of change events at a point in time that are waiting to be applied to the target. Note that this is not the same as a measure of the transaction change rate of the source endpoint. A large number for this metric usually indicates AWS DMS is unable to apply captured changes in a timely manner, thus causing high target latency. Sum Count
CDC Changes Memory Source The amount of rows accumulated in memory and waiting to be committed from the source. You can view this metric together with CDCChangesDiskSource. Sum Count
CDC Changes Memory Target The amount of rows accumulated in memory and waiting to be committed to the target. You can view this metric together with CDCChangesDiskTarget. Sum Count
CDC Changes Disk Source The amount of rows accumulated on the disk and waiting to be committed from the source. You can view this metric together with CDCChangesMemorySource. Sum Count
CDC Changes Disk Target The amount of rows accumulated on the disk and waiting to be committed to the target. You can view this metric together with CDCChangesMemoryTarget. Sum Count
CDC Throughput Bandwidth Source Incoming data received for the source, measured in kilobytes per second. CDCThroughputBandwidth records incoming data received on sampling points. If no task network traffic is found, the value is zero. Because CDC does not issue long-running transactions, network traffic may not be recorded. Average KB/sec
CDC Throughput Bandwidth Target Outgoing data transmitted for the target, measured in kilobytes per second. CDCThroughputBandwidth records outgoing data transmitted on sampling points. If no task network traffic is found, the value is zero. Because CDC does not issue long-running transactions, network traffic may not be recorded. Average KB/sec
CDC Throughput Rows Source Incoming task changes from the source, measured in rows per second. Average Count/sec
CDC Throughput Rows Target Outgoing task changes for the target, measured in rows per second. Average Count/sec
CDC Latency Source The gap, in seconds, between the last event captured from the source endpoint and current system time stamp of the AWS DMS instance. CDCLatencySource represents the latency between source and replication instance. High CDCLatencySource means the process of capturing changes from source is delayed. To identify latency in an ongoing replication, you can view this metric together with CDCLatencyTarget. If both CDCLatencySource and CDCLatencyTarget are high, investigate CDCLatencySource first. Average Seconds
CDC Latency Target CDC Latency Target represents the latency between replication instance and target. When CDC Latency Target is high, it indicates the process of applying change events to the target is delayed. Average Seconds
CPU Utilization The percent of CPU being used by a task. Average Percent
CPU Allocated The percent of CPU maximally allocated for the task (0 means no limit). Average Percent
Memory Allocated The maximum allocation of memory for the task (0 means no limit). Average MB
Swap Usage The amount of swap used by the task. Average Bytes
Validation Succeeded Record Count The number of rows that AWS DMS validated per minute. Sum Count
Validation Attempted Record Count The number of rows where validation was attempted per minute. Sum Count
Validation Failed Overall Count The number of rows where validation failed. Sum Count
Validation Suspended Overall Count The number of rows where validation was suspended. Sum Count
Validation Pending Overall Count The number of rows where validation is still pending. Sum Count
Validation Bulk Query Source Latency AWS DMS can do data validation in bulk, especially in certain scenarios during a full-load or ongoing replication when there are many changes. This metric indicates the latency required to read a bulk set of data from the source endpoint. Average Milliseconds
Validation Bulk Query Target Latency AWS DMS can do data validation in bulk, especially in certain scenarios during a full-load or ongoing replication when there are many changes. This metric indicates the latency required to read a bulk set of data on the target endpoint. Average Milliseconds
Validation Item Query Source Latency During ongoing replication, data validation can identify ongoing changes and validate them. This metric indicates the latency in reading those changes from the source. Validation can run more queries than required, based on the number of changes, if there are errors during validation. Average Milliseconds
Validation Item Query Target Latency During ongoing replication, data validation can identify ongoing changes and validate them row by row. This metric provides the latency in reading those changes from the target. Validation may run more queries than required, based on the number of changes, if there are errors during validation. Average Milliseconds
Full Load Throughput Bandwidth Total The total full load throughput bandwidth at Target and Source. Average KB/sec
Full Load Throughput Rows Total The total full load throughput rows at Target and Source. Average Count/sec
CDC Changes Memory Total The total number of CDC Changes in memory at Target and Source. Sum Count
CDC Changes Disk Total The total number of CDC Changes in disk at Target and Source. Sum Count
CDC Throughput Bandwidth Total The total CDC throughput bandwidth at Target and Source. Average Count/sec
CDC Throughput Rows Total The total CDC throughput bandwidth at Target and Source. Average Count/sec
CDC Latency Total The total CDC latency at Target and Source. Average Seconds
Validation Bulk Query Total Latency The total latency of validation bulk query at Target and Source. Average Milliseconds
Validation Item Query Total Latency The total latency of validation item query at Target and Source Average Milliseconds

Performance metrics for AWS DMS replication instances

AttributeDescriptionStatisticData type
CPU Utilization The amount of CPU used. Average Percent
Free Storage Space The amount of available storage space. Average Bytes
Freeable Memory The amount of available random access memory. Average Bytes
Write IOPS The average number of disk write I/O operations per second. Average Count/sec
Read IOPS The average number of disk read I/O operations per second. Average Count/sec
Read Throughput The average number of bytes read from disk per second. Average Bytes/sec
Read Latency The average amount of time taken per disk I/O (input) operation. Average Milliseconds
Swap Usage The amount of swap space used on the replication instance. Average Bytes
Network Receive Throughput The incoming (Receive) network traffic on the replication instance, including both customer database traffic and AWS DMS traffic used for monitoring and replication. Average Bytes/sec

Forecast

Estimate future values of the following Database Migration Service Instance performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.

  • CPU Utilization
  • Read IOPS
  • Write IOPS
  • Freeable Memory
  • Swap Usage
  • Disk Queue Depth

Similarly, you can also view the forecast for the following metrics of Database Migration Service tasks:

  • CPU Utilization
  • Memory Usage 

Site24x7's AWS DMS monitoring interface

Summary

Gain an overview of the different events occurring within each replication task or replication instance with time series charts. This section provides you with operational details like CPU utilization, memory usage, full load bandwidth, full load throughput rows, change data capture (CDC) incoming changes, CDC changes in disk and memory, CDC latency, and many more metrics.

There is a separate Task Summary tab for replication instances, which displays task details and real-time statistics for individual tasks. For each task detail, you have the option to bulk edit the threshold profiles as well.

Monitored Resources

Various resource availability statuses are provided here, with information on resource name, type, display name, status, and action. The Action column allows you to set alerts and add automations for when a monitored resource is marked as Down, Critical, or Trouble. 

Endpoint Details

The DMS Replication Task section provides you with the endpoint details of each task. This section has various details on connections, source endpoints, and target endpoints. The Connections section lets you configure thresholds, set alerts, and add automations for each endpoint when it is Down.

Outages

A history of your resources’ various states, like down, trouble, critical, or maintenance, is displayed in the Outages tab. Details on the start time and end time of an outage, duration, and comments (if any) are provided in this section. You can also edit or delete comments.

Log Report

Here you can view the audit log data for a replication instance or replication task, along with details on the timestamp, status, CPU utilization, free storage, and freeable memory.

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!