Pagerduty Cloudwatch integrationยถ
It is possible to send your own custom payload to the Pagerduty Cloudwatch integration from a Lambda (instead of via a Cloudwatch alarm). Pagerduty does not document the internals but if you publish a custom message to the SNS topic that you have a HTTPS subscription to Pager duty following these simple rules you will see the event in Pagerduty.
SNS Subject:
The message subject is important it must start with
ALARM:
# Note the space after the colon
It doesnโt matter what you put after the colon
The alarm status (in this case
ALARM
) must match theNewStateValue
in the SNS message body or it will be discarded.You can also clear the incident in Pagerduty by following the above rules and replacing
ALARM
withOK
SNS Message:
The integration is very strict when it parses the JSON message any slight syntax errors will cause it to be discarded
You can put anything else you want into the JSON payload and it will be visible in Pagerduty.
A minimal
message
looks like this:
{
"NewStateValue": "ALARM",
"foo": "bar"
}
This is what Cloudwatch SNS sends to Pagerduty.
{
"Type" : "Notification",
"MessageId" : "c2228c71-f550-5e3d-b92c-d7dada9f6d76",
"TopicArn" : "arn:aws:sns:ap-southeast-1:003422198502:testbc",
"Subject" : "ALARM: \"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c...\" in Asia Pacific (Singapore)",
"Message" : "{\"AlarmName\":\"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"AlarmDescription\":\"DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.\",\"AWSAccountId\":\"003422198502\",\"AlarmConfigurationUpdatedTimestamp\":\"2022-09-26T04:41:30.103+0000\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"2022-09-26T04:41:51.610+0000\",\"Region\":\"Asia Pacific (Singapore)\",\"AlarmArn\":\"arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"OldStateValue\":\"INSUFFICIENT_DATA\",\"OKActions\":[],\"AlarmActions\":[\"arn:aws:sns:ap-southeast-1:003422198502:testbc\",\"arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e\"],\"InsufficientDataActions\":[],\"Trigger\":{\"MetricName\":\"CPUUtilization\",\"Namespace\":\"AWS/ECS\",\"StatisticType\":\"Statistic\",\"Statistic\":\"AVERAGE\",\"Unit\":\"Percent\",\"Dimensions\":[{\"value\":\"testservice\",\"name\":\"ServiceName\"},{\"value\":\"test\",\"name\":\"ClusterName\"}],\"Period\":60,\"EvaluationPeriods\":1,\"DatapointsToAlarm\":1,\"ComparisonOperator\":\"LessThanThreshold\",\"Threshold\":11.700000000000001,\"TreatMissingData\":\"breaching\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
"Timestamp" : "2022-09-26T04:41:51.652Z",
"SignatureVersion" : "1",
"Signature" : "Zr8NlG6+KlEfOcj1ZS96BU4Z3K3aKWpJpf8pWc9/u84rbG6Q5kPdqJEY0jiLK4WCbEwmrZFols/ULvKB/W0Z5goBnyQmMlW7XIxpDIoU7I4aGd9XvQNyDed/TEUQ3IK280PerWmBRPPsxgTKN48emazGbch5Ea84DThT/tpw8L98KvC0yzgV04mB2fPgXGdytoRupn/bYitwcgTkkccynzHFHDAWCQkhcYql/wCt41eANLtIAfbdg02uKVs44LPwcoiJv5fO/jo/qMOQZd7i2xNBh6yD9Vn8kkNE6FCmEiIzRmiiOA6sqB9HZB/xQueBhJz/kboyR/Qe6IMpcjb21A==",
"SigningCertURL" : "https://sns.ap-southeast-1.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem",
"UnsubscribeURL" : "https://sns.ap-southeast-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:ap-southeast-1:003422198502:testbc:894babc8-8186-4b49-b68d-ff18e204e59a"
}
Cleaned up Message
field extracted from above:
{
"AlarmName": "2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
"AlarmDescription": "DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.",
"AWSAccountId": "003422198502",
"AlarmConfigurationUpdatedTimestamp": "2022-09-26T04:41:30.103+0000",
"NewStateValue": "ALARM",
"NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).",
"StateChangeTime": "2022-09-26T04:41:51.610+0000",
"Region": "Asia Pacific (Singapore)",
"AlarmArn": "arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
"OldStateValue": "INSUFFICIENT_DATA",
"OKActions": [],
"AlarmActions": ["arn:aws:sns:ap-southeast-1:003422198502:testbc", "arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e"],
"InsufficientDataActions": [],
"Trigger": {
"MetricName": "CPUUtilization",
"Namespace": "AWS/ECS",
"StatisticType": "Statistic",
"Statistic": "AVERAGE",
"Unit": "Percent",
"Dimensions": [{
"value": "testservice",
"name": "ServiceName"
}, {
"value": "test",
"name": "ClusterName"
}],
"Period": 60,
"EvaluationPeriods": 1,
"DatapointsToAlarm": 1,
"ComparisonOperator": "LessThanThreshold",
"Threshold": 11.700000000000001,
"TreatMissingData": "breaching",
"EvaluateLowSampleCountPercentile": ""
}
}
This code will let you send the pager duty alarm.
This is a small tool that was run behind ngrok which the SNS https subscription was pointed at to inspect the SNS content of a Cloudwatch alarm payload.
"""
Very simple HTTP server in python for logging requests
Usage::
./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
class S(BaseHTTPRequestHandler):
def _set_response(self):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
def do_GET(self):
logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
self._set_response()
self.wfile.write("GET request for {}".format(self.path).encode('utf-8'))
def do_POST(self):
content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
post_data = self.rfile.read(content_length) # <--- Gets the data itself
logging.info("POST request,\nPath: %s\nHeaders:\n%s\n\nBody:\n%s\n",
str(self.path), str(self.headers), post_data.decode('utf-8'))
self._set_response()
self.wfile.write("POST request for {}".format(self.path).encode('utf-8'))
def run(server_class=HTTPServer, handler_class=S, port=8080):
logging.basicConfig(level=logging.INFO)
server_address = ('', port)
httpd = server_class(server_address, handler_class)
logging.info('Starting httpd...\n')
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
httpd.server_close()
logging.info('Stopping httpd...\n')
if __name__ == '__main__':
from sys import argv
if len(argv) == 2:
run(port=int(argv[1]))
else:
run()