Pagerduty Cloudwatch integration

It is possible to send your own custom payload to the Pagerduty Cloudwatch integration from a Lambda (instead of via a Cloudwatch alarm). Pagerduty does not document the internals but if you publish a custom message to the SNS topic that you have a HTTPS subscription to Pager duty following these simple rules you will see the event in Pagerduty.

SNS Subject:

  • The message subject is important it must start with

ALARM:
# Note the space after the colon
  • It doesn’t matter what you put after the colon

  • The alarm status (in this case ALARM) must match the NewStateValue in the SNS message body or it will be discarded.

  • You can also clear the incident in Pagerduty by following the above rules and replacing ALARM with OK

SNS Message:

  • The integration is very strict when it parses the JSON message any slight syntax errors will cause it to be discarded

  • You can put anything else you want into the JSON payload and it will be visible in Pagerduty.

  • A minimal message looks like this:

{
 "NewStateValue": "ALARM",
 "foo": "bar"
 }

This is what Cloudwatch SNS sends to Pagerduty.

{
  "Type" : "Notification",
  "MessageId" : "c2228c71-f550-5e3d-b92c-d7dada9f6d76",
  "TopicArn" : "arn:aws:sns:ap-southeast-1:003422198502:testbc",
  "Subject" : "ALARM: \"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c...\" in Asia Pacific (Singapore)",
  "Message" : "{\"AlarmName\":\"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"AlarmDescription\":\"DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.\",\"AWSAccountId\":\"003422198502\",\"AlarmConfigurationUpdatedTimestamp\":\"2022-09-26T04:41:30.103+0000\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"2022-09-26T04:41:51.610+0000\",\"Region\":\"Asia Pacific (Singapore)\",\"AlarmArn\":\"arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"OldStateValue\":\"INSUFFICIENT_DATA\",\"OKActions\":[],\"AlarmActions\":[\"arn:aws:sns:ap-southeast-1:003422198502:testbc\",\"arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e\"],\"InsufficientDataActions\":[],\"Trigger\":{\"MetricName\":\"CPUUtilization\",\"Namespace\":\"AWS/ECS\",\"StatisticType\":\"Statistic\",\"Statistic\":\"AVERAGE\",\"Unit\":\"Percent\",\"Dimensions\":[{\"value\":\"testservice\",\"name\":\"ServiceName\"},{\"value\":\"test\",\"name\":\"ClusterName\"}],\"Period\":60,\"EvaluationPeriods\":1,\"DatapointsToAlarm\":1,\"ComparisonOperator\":\"LessThanThreshold\",\"Threshold\":11.700000000000001,\"TreatMissingData\":\"breaching\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
  "Timestamp" : "2022-09-26T04:41:51.652Z",
  "SignatureVersion" : "1",
  "Signature" : "Zr8NlG6+KlEfOcj1ZS96BU4Z3K3aKWpJpf8pWc9/u84rbG6Q5kPdqJEY0jiLK4WCbEwmrZFols/ULvKB/W0Z5goBnyQmMlW7XIxpDIoU7I4aGd9XvQNyDed/TEUQ3IK280PerWmBRPPsxgTKN48emazGbch5Ea84DThT/tpw8L98KvC0yzgV04mB2fPgXGdytoRupn/bYitwcgTkkccynzHFHDAWCQkhcYql/wCt41eANLtIAfbdg02uKVs44LPwcoiJv5fO/jo/qMOQZd7i2xNBh6yD9Vn8kkNE6FCmEiIzRmiiOA6sqB9HZB/xQueBhJz/kboyR/Qe6IMpcjb21A==",
  "SigningCertURL" : "https://sns.ap-southeast-1.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem",
  "UnsubscribeURL" : "https://sns.ap-southeast-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:ap-southeast-1:003422198502:testbc:894babc8-8186-4b49-b68d-ff18e204e59a"
}

Cleaned up Message field extracted from above:

{
    "AlarmName": "2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
    "AlarmDescription": "DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.",
    "AWSAccountId": "003422198502",
    "AlarmConfigurationUpdatedTimestamp": "2022-09-26T04:41:30.103+0000",
    "NewStateValue": "ALARM",
    "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).",
    "StateChangeTime": "2022-09-26T04:41:51.610+0000",
    "Region": "Asia Pacific (Singapore)",
    "AlarmArn": "arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
    "OldStateValue": "INSUFFICIENT_DATA",
    "OKActions": [],
    "AlarmActions": ["arn:aws:sns:ap-southeast-1:003422198502:testbc", "arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e"],
    "InsufficientDataActions": [],
    "Trigger": {
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/ECS",
        "StatisticType": "Statistic",
        "Statistic": "AVERAGE",
        "Unit": "Percent",
        "Dimensions": [{
            "value": "testservice",
            "name": "ServiceName"
        }, {
            "value": "test",
            "name": "ClusterName"
        }],
        "Period": 60,
        "EvaluationPeriods": 1,
        "DatapointsToAlarm": 1,
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 11.700000000000001,
        "TreatMissingData": "breaching",
        "EvaluateLowSampleCountPercentile": ""
    }
}

This is a small tool that was run behind ngrok which the SNS https subscription was pointed at to inspect the SNS content of a Cloudwatch alarm payload.

"""
Very simple HTTP server in python for logging requests
Usage::
    ./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging

class S(BaseHTTPRequestHandler):
    def _set_response(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

    def do_GET(self):
        logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
        self._set_response()
        self.wfile.write("GET request for {}".format(self.path).encode('utf-8'))

    def do_POST(self):
        content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
        post_data = self.rfile.read(content_length) # <--- Gets the data itself
        logging.info("POST request,\nPath: %s\nHeaders:\n%s\n\nBody:\n%s\n",
                str(self.path), str(self.headers), post_data.decode('utf-8'))

        self._set_response()
        self.wfile.write("POST request for {}".format(self.path).encode('utf-8'))

def run(server_class=HTTPServer, handler_class=S, port=8080):
    logging.basicConfig(level=logging.INFO)
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    logging.info('Starting httpd...\n')
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()
    logging.info('Stopping httpd...\n')

if __name__ == '__main__':
    from sys import argv

    if len(argv) == 2:
        run(port=int(argv[1]))
    else:
        run()

Comments

comments powered by Disqus