Profile Applicability:

  • Level 1

Description:

In Docker Swarm mode, it is recommended to have at least three manager nodes for high availability and fault tolerance. A minimum of three manager nodes ensures that the Swarm can tolerate the failure of one manager node and still function properly. This ensures resilience and consistent management in the Swarm environment.

Rationale:

A Swarm with only one or two manager nodes is vulnerable to failure, as losing a single node would make it impossible to achieve a quorum in the cluster, leading to potential unavailability or a split-brain scenario. Having a minimum of three manager nodes ensures fault tolerance and availability.

Impact:

Pros:

  • High availability and fault tolerance in case of node failures.

  • Maintains consistent management and orchestration in a Docker Swarm cluster.

Cons:

  • More resources are required to run additional manager nodes.

  • Potential for additional overhead in terms of coordination and consensus.

Default Value:

By default, Docker Swarm creates a single manager node. This needs to be explicitly configured for more manager nodes.

Pre-requisites:

  • Docker Swarm mode must be enabled.

  • Sufficient resources available on the system for additional manager nodes.

Remediation:

Test Plan:

Using AWS Console:

  1. Navigate to the EC2 instances running the Docker Swarm manager.

  2. Verify the number of manager nodes by running docker node ls on each manager node to ensure at least three nodes are listed with the manager role.

Using AWS CLI:

  1. Connect to the EC2 instance running the Docker Swarm manager.

  2. Run the following command to verify the number of manager nodes:

docker node ls

Implementation Plan:

Using AWS Console:

  1. Log in to the EC2 instances running Docker Swarm.

  2. Add additional manager nodes by running the following command on worker nodes:

docker swarm join --token <manager-join-token> <manager-node-ip>:2377
  1. On each manager node, verify that the node has been added with:

docker node ls

Using AWS CLI:

  1. Use SSM to add manager nodes to the Docker Swarm.

  2. Run the following command to add a manager node:

aws ssm send-command --document-name "AWS-RunShellScript" --targets "Key=instanceIds,Values=instance_id" --parameters 'commands=["docker swarm join --token <manager-join-token> <manager-node-ip>:2377"]'

Backout Plan:

Using AWS Console:

  1. Log in to the EC2 instance running the Docker Swarm manager.

  2. Remove the node from the swarm by running:

    docker swarm leave --force
    
    
  3. Verify the node has been removed by running docker node ls on another manager node.

Using AWS CLI:

  1. Use SSM to remove the manager node from the Docker Swarm.

  2. Run the following command:

aws ssm send-command --document-name "AWS-RunShellScript" --targets "Key=instanceIds,Values=instance_id" --parameters 'commands=["docker swarm leave --force"]'

References:

  • Docker Swarm Mode Documentation

  • Docker Swarm Best Practices