Profile Applicability:

  • Level 1

Description:

Elasticsearch and OpenSearch are distributed search and analytics engines used to store, search, and analyze large volumes of data. In a production environment, fault tolerance is critical to ensure high availability and resilience to hardware failures. Data nodes are the core components of Elasticsearch/OpenSearch clusters that store data and handle search and indexing requests. Ensuring that data nodes are fault-tolerant means configuring your cluster to have sufficient replicas and availability zones to handle node failures without data loss or significant service disruption.

Rationale:

Fault-tolerant data nodes help ensure that Elasticsearch/OpenSearch clusters remain highly available and resilient to failures. This reduces the risk of service interruptions and data loss in case of node or infrastructure failure. By configuring the cluster with multiple data nodes and replica shards across different availability zones (AZs), the system can continue to function even when individual nodes or entire availability zones experience failures. This aligns with best practices for high availability in distributed systems.

Impact:

If Elasticsearch/OpenSearch domains do not have fault-tolerant data nodes:

  • The cluster could experience data loss if a primary node fails and there are no replicas.

  • Service disruptions may occur if critical data nodes go down and there are insufficient replicas.

  • Non-compliance with high availability and disaster recovery best practices, which could lead to downtime during hardware or network failures.

Default Value:

By default, Elasticsearch/OpenSearch can be configured with fault-tolerant data nodes. However, depending on the cluster setup, some clusters may not have replicas or sufficient distribution across availability zones, reducing fault tolerance.

Pre-requisites:

  • Access to AWS Management Console or the OpenSearch service dashboard.

  • Existing Elasticsearch/OpenSearch domain configured and running.

  • Basic understanding of Elasticsearch/OpenSearch architecture, including data nodes and replica shards.

  • Access to AWS EC2 instances or Amazon OpenSearch Service configurations.

Remediation:

Test Plan:

Using AWS Console :

  1. Go to the Amazon OpenSearch Service in the AWS Management Console.

       

  1. Select the domain you want to check.

       

  1. Under the Cluster Configuration section, verify the following:

    • Ensure the data nodes are distributed across at least two Availability Zones (AZs).

    • Check that the replica count is set to at least 1 (preferably more, depending on the cluster’s size and usage).

  1. Review the Node-to-node encryption and automatic snapshots settings for additional fault tolerance and backup.

Using AWS CLI :

List the domains in the OpenSearch service:

aws opensearch list-domain-names

Describe the domain configuration:

aws opensearch describe-domain --domain-name <domain-name>

  1. Check the cluster configuration for the number of data nodes and replica settings. Specifically, ensure the Zone Awareness setting is enabled for multi-AZ deployment, and the Number of Replicas is greater than 0.

For Self-Hosted Elasticsearch/OpenSearch Clusters:

Check the cluster health using the _cluster/health API:
curl -X GET "localhost:9200/_cluster/health?pretty"

Verify that the cluster is set up with at least two data nodes by checking the node information:

curl -X GET "localhost:9200/_cat/nodes?v&h=id,ip,role"

Ensure that the shard allocation includes replicas by checking the index settings:

curl -X GET "localhost:9200/_settings?pretty"

Implementation Plan:

Using AWS Console :

  1. Open the Amazon OpenSearch Service console.

           

  1. Select the domain to review.

   

  1. Modify the domain settings if necessary:

    • Ensure the domain is spread across at least two availability zones (AZs).

    • Set the number of replica shards to at least 1.

  1. Save the settings and monitor the cluster to confirm that the configuration changes are applied successfully.

Using AWS CLI:

Use the describe-domain command to check the domain configuration:

aws opensearch describe-domain --domain-name <domain-name>

  1. If necessary, modify the domain to ensure fault tolerance:

    • Enable multi-AZ configuration by adjusting the Zone Awareness settings.

Set the replica count for indexes using the following:

aws opensearch update-domain-config --domain-name <domain-name> --cluster-config "InstanceType=m5.large.search,InstanceCount=3,ZoneAwarenessEnabled=true"

For Self-Hosted Elasticsearch/OpenSearch Clusters:

  1. If the cluster is not spread across multiple nodes or AZs, adjust the configuration in elasticsearch.yml or opensearch.yml to ensure nodes are properly distributed.

Increase the replica count by updating the index settings:

curl -X PUT "localhost:9200/my_index/_settings" -H 'Content-Type: application/json' -d '{

  "number_of_replicas": 1

}'
  1. Restart the cluster nodes if necessary to apply the new settings.

Backout Plan:

Using AWS Console :

  1. Sign in to the AWS Management Console and navigate to Amazon OpenSearch Service.
  2. Go to Domains and select the domain you want to modify.
  3. In the Domain details page, click Modify.
  4. In the Cluster Configuration section, choose Multiple Availability Zones and ensure that data nodes are spread across at least two availability zones.

Using AWS CLI:

If deploying data nodes across multiple AZs causes issues (e.g., performance degradation or unexpected costs):

Revert the OpenSearch domain to use data nodes in a single availability zone:

aws opensearch update-domain-config --domain-name <domain-name> --elasticsearch-cluster-config ZoneAwarenessEnabled=false

  1. Monitor the domain to ensure it operates correctly and that the issue is resolved.

References:

CIS Controls Mapping:

Version

Control ID

Control Description

IG1

IG2

IG3

v8

3.4

Encrypt Data on End-User Devices – Ensure data encryption during file system access.

v8

6.7

Implement Application Layer Filtering and Content Control – Ensure appropriate content filtering is applied to sensitive files.

v8

6.8

Define and Maintain Role-Based Access Control – Implement and manage role-based access for file systems.

v8

14.6

Protect Information Through Access Control Lists – Apply strict access control to file systems.