Profile Applicability:
Level 2

Description:
Amazon SageMaker allows you to run machine learning training jobs securely and at scale. Network isolation is a feature that ensures the training job runs in a fully isolated environment with no internet access. This is critical when handling sensitive data or when you need to ensure that the training job does not communicate with external resources over the internet. Network isolation is particularly useful in environments with strict data privacy requirements or when training models using private datasets.

Rationale:
 Enabling network isolation for SageMaker training jobs ensures that the job operates in a secure, isolated environment. It prevents the training job from accessing any resources on the public internet, thereby protecting the data and reducing the risk of exposure to unauthorized services. This feature is essential for maintaining compliance with privacy standards and preventing data leaks during training.

Impact:
 Pros:

  • Ensures that training jobs cannot communicate with external networks, reducing the risk of unauthorized access.

  • Provides an additional layer of security, especially for sensitive data.

  • Helps meet regulatory requirements that mandate isolated network environments.

Cons:

  • May limit access to external resources such as public datasets, external APIs, or data repositories during training.

  • Requires careful management of resources and dependencies that need to be available to the training job.

Default Value:
 By default, SageMaker training jobs do not have network isolation enabled. It must be explicitly configured during job creation or modification.

Pre-requisites:

  • AWS IAM permissions:
        
    sagemaker:CreateTrainingJob
        sagemaker:DescribeTrainingJob
        ec2:DescribeSecurityGroups
        ec2:DescribeSubnets

  • An appropriate VPC with private subnets where the training job can be isolated.

  • Access to Amazon SageMaker and permissions to modify training job configurations.

Remediation:

Test Plan:

Using AWS Console:

  1. Sign in to the AWS Management Console.

  2. Navigate to Amazon SageMaker and go to Training jobs.

  3. Select the training job to check for network isolation.

  4. In the Job details section, look for the Network isolation setting.

  5. Verify if Network isolation is enabled.

  6. If network isolation is not enabled, modify the job to enable this feature by selecting the "Enable network isolation" option.

Using AWS CLI:

  1. List the SageMaker training jobs:

    aws sagemaker list-training-jobs --query "TrainingJobSummaries[*].TrainingJobName"

  2. For each training job, check if network isolation is enabled:

    aws sagemaker describe-training-job --training-job-name <TRAINING_JOB_NAME>

  3. In the response, check for the NetworkIsolation field to verify if network isolation is enabled.

  4. If not enabled, configure network isolation by running:

    aws sagemaker create-training-job --training-job-name <TRAINING_JOB_NAME> --role-arn <IAM_ROLE> --algorithm-specification TrainingImage=<IMAGE_URL>,TrainingInputMode=File --input-data-config <INPUT_DATA_CONFIG> --output-data-config <OUTPUT_DATA_CONFIG> --resource-config <RESOURCE_CONFIG> --vpc-config "Subnets=<SUBNET_ID>,SecurityGroupIds=<SECURITY_GROUP_ID>" --network-isolation "True"

Implementation Plan:

Using AWS Console:

  1. Navigate to Amazon SageMaker and select Create Training Job.

  2. Under Network Isolation, enable the Network Isolation checkbox.

  3. Select the appropriate VPC, subnets, and security groups for the isolated environment.

  4. Complete the setup and start the training job with network isolation enabled.

  5. Monitor the training job to ensure it operates in isolation and has no access to the internet.

Using AWS CLI:

  1. When creating a training job, enable network isolation by specifying the network-isolation parameter:

    aws sagemaker create-training-job --training-job-name <TRAINING_JOB_NAME> --role-arn <IAM_ROLE> --algorithm-specification TrainingImage=<IMAGE_URL>,TrainingInputMode=File --input-data-config <INPUT_DATA_CONFIG> --output-data-config <OUTPUT_DATA_CONFIG> --resource-config <RESOURCE_CONFIG> --vpc-config "Subnets=<SUBNET_ID>,SecurityGroupIds=<SECURITY_GROUP_ID>" --network-isolation "True"

  2. Verify that the job is running with network isolation:

    aws sagemaker describe-training-job --training-job-name <TRAINING_JOB_NAME>

Backout Plan: 

Using AWS Console:

  1. Sign in to the AWS Management Console.

  2. Navigate to Amazon SageMaker.

  3. In the Training Jobs section, select the Training Job that has network isolation enabled.

  4. Click on Edit to modify the Network settings.

  5. Disable Network isolation by unchecking the option for Network isolation or configuring the VPC settings to allow internet access.

  6. Save the changes and monitor the training job to ensure it functions correctly without network isolation.

Using AWS CLI:

  1. If network isolation is causing issues, run the following command to update the training job and disable network isolation:

    aws sagemaker update-training-job \--training-job-name <TRAINING_JOB_NAME> \--vpc-config "Subnets=<SUBNET_ID>,SecurityGroupIds=<SECURITY_GROUP_ID>,EnableNetworkIsolation=false"

  1. Verify that network isolation has been disabled:

    aws sagemaker describe-training-job --training-job-name <TRAINING_JOB_NAME>

  2. Check the VpcConfig section in the output to confirm that EnableNetworkIsolation is set to false.

  3. Monitor the training job to ensure it continues to function correctly without network isolation.

Reference:

CIS Controls:

Version

Control ID

Control Description

7.1

3.1

Ensure network isolation is enabled for all sensitive cloud workloads, including SageMaker training jobs, to prevent unauthorized external access.

7.1

8.1

Enable network isolation for cloud services to ensure that critical resources, such as training jobs, operate in a secure, isolated environment.