Profile Applicability:

  • Level 1

Description:

AWS Systems Manager (SSM) Incidents is a service that helps you manage, respond to, and automate the response to incidents in your AWS environment. When you enable SSM Incidents, it allows you to create and execute incident response plans that guide you through the process of responding to incidents, including automation and task assignments.

Enabling SSM Incidents with response plans ensures that your organization has a structured, automated, and repeatable process for responding to incidents. Response plans can include automated actions, runbooks, and incident documentation, making it easier to handle incidents efficiently while maintaining a record for compliance and post-incident analysis.

This SOP ensures that SSM Incidents is enabled and that response plans are created and configured properly.

Rationale:

Enabling SSM Incidents with response plans is essential for:

  • Efficient Incident Response: Provides a structured approach to handling incidents, reducing response times and ensuring consistent actions.

  • Automation: Automates common tasks and actions during an incident, improving the speed and accuracy of response efforts.

  • Documentation and Compliance: Ensures that all actions taken during an incident are logged and documented, aiding in compliance audits and post-incident reviews.

  • Improved Coordination: Ensures that multiple team members can collaborate efficiently, with tasks and responsibilities clearly defined.

Impact:

Pros:

  • Improved Efficiency: Automates parts of the incident response process, reducing the need for manual intervention and improving response times.

  • Consistent Procedures: Ensures that all incidents are handled according to predefined response plans, leading to more consistent and effective actions.

  • Better Incident Tracking: Provides detailed documentation and tracking for all incidents, helping with root cause analysis and continuous improvement.

Cons:

  • Initial Setup Complexity: Creating and configuring incident response plans and automation may take time initially, especially for complex environments.

  • Over-Reliance on Automation: While automation is helpful, it’s important to regularly review and update response plans to ensure they remain effective and accurate.

Default Value:

By default, SSM Incidents is not enabled, and no response plans are created. These features need to be manually set up and configured to align with your organization's incident response requirements.

Pre-requisite:

  • AWS IAM Permissions:

    • ssm:CreateOpsItem

    • ssm:DescribeOpsItems

    • ssm:CreateAutomation

    • ssm:UpdateOpsItem

    • ssm:PutOpsItem

  • AWS CLI installed and configured.

  • Familiarity with SSM (AWS Systems Manager) and incident response processes.

  • AWS services such as CloudWatch and SNS might be integrated into response plans for automation and notifications.

Remediation:

Test Plan:

Using AWS Console:

  1. Sign in to the AWS Management Console.

  2. Navigate to Systems Manager under Services.

  3. In the Systems Manager Dashboard, go to Incident Manager.

  4. Check whether SSM Incidents is enabled for your environment.

  5. Under Incident Manager, go to Response Plans:

    • Ensure that response plans have been created for different types of incidents (e.g., security breach, system failures).

    • Review the automation and tasks that are defined for each response plan to ensure they align with your incident response strategy.

  6. Ensure that incident templates, runbooks, and automated actions are properly configured for incident resolution.

  7. Test a response plan by simulating an incident (e.g., creating a mock ops item) to confirm that the plan triggers the appropriate actions.

Using AWS CLI:

To check if SSM Incidents is enabled, run:

aws ssm describe-incident-records

To list existing response plans, run:

aws ssm describe-response-plans


If no response plans are listed, you can create one with the following command:

aws ssm create-response-plan --name "ExampleResponsePlan" --incident-template-file "template.json" --actions "action1,action2"

To verify that the response plan triggers the expected actions, simulate an incident or check the ops items associated with the response plan:

aws ssm describe-ops-items --filters "IncidentId=<incident-id>"


Implementation Steps:

Using AWS Console:

  1. Sign in to the AWS Management Console and navigate to Systems Manager.

  2. Under Incident Manager, click on Response Plans.

  3. If no response plans exist, click Create response plan.

  4. Name your response plan (e.g., “Critical Incident Response”).

  5. Select or create an incident template that defines the type of incident.

  6. Define tasks that need to be executed during the incident response (e.g., notifying stakeholders, invoking automation).

  7. Configure automation actions, such as triggering AWS Lambda functions, running runbooks, or calling SNS topics for notifications.

  8. Ensure that escalation procedures are in place if the issue is not resolved within the defined time frame.

Using AWS CLI:

To create a response plan, run the following CLI command:

aws ssm create-response-plan \

--name "IncidentResponsePlan" \

--incident-template-file "path-to-template.json" \

--actions "runbook1,runbook2"


To test a response plan, simulate an incident using the CLI:

aws ssm simulate-incident --incident-id <incident-id> --response-plan <response-plan-id>


Backout Plan:

If enabling SSM Incidents or configuring response plans causes issues:

  1. Identify the affected response plan or incident configuration.

To disable or remove a response plan, run:

aws ssm delete-response-plan --response-plan-id <response-plan-id>


  1. Verify that the incident management system is no longer using the problematic response plan by checking the list of active plans.

  2. Adjust the response plan configuration or tasks to ensure the correct actions are taken in future incidents.

Note:

  • Incident Simulation: Periodically simulate incidents to ensure that SSM Incidents and response plans work as expected, and that the team is well-practiced in responding to incidents using automated processes.

  • Automation and Runbooks: Ensure that your automation and runbooks are kept up-to-date to address evolving incident response requirements.

References:

CIS Controls Mapping:

Version

Control ID

Control Description

IG1

IG2

IG3

v8

3.4

Encrypt Data on End-User Devices – Ensure data encryption during file system access.

v8

6.7

Implement Application Layer Filtering and Content Control – Ensure appropriate content filtering is applied to sensitive files.

v8

6.8

Define and Maintain Role-Based Access Control – Implement and manage role-based access for file systems.

v8

14.6

Protect Information Through Access Control Lists – Apply strict access control to file systems.