Profile Applicability:
Level 1
Description:
AWS Glue is a fully managed ETL (Extract, Transform, Load) service that facilitates the process of preparing and loading data for analytics. When running ETL jobs, logging helps in tracking the job's execution, detecting errors, and troubleshooting issues in real-time. Logging is essential for monitoring the performance of ETL jobs, understanding data transformations, and ensuring that data flows are working as expected.
This SOP ensures that AWS Glue ETL Jobs have logging enabled, which ensures that all job runs are logged to Amazon CloudWatch Logs or S3 (based on the configuration), providing visibility into ETL job execution.
Rationale:
Enabling logging for AWS Glue ETL jobs provides several benefits:
Debugging: Logs provide detailed information that can help identify and troubleshoot errors in the ETL process.
Audit and Monitoring: Logs allow for tracking job performance and monitoring the execution of ETL workflows.
Compliance: Helps meet requirements for logging and audit trails, ensuring the integrity and traceability of the ETL process.
Operational Transparency: Ensures transparency by recording job events, error messages, and execution details for analysis.
Impact:
Pros:
Improved Troubleshooting: Logs provide the necessary details for debugging and identifying errors in the ETL process.
Better Monitoring: CloudWatch Logs integration allows for real-time monitoring of ETL job runs and metrics.
Compliance: Provides an audit trail for ETL jobs, supporting compliance requirements for traceability and security.
Operational Insights: Helps monitor job performance, detect inefficiencies, and optimize data workflows.
Cons:
Storage Costs: Storing job logs in CloudWatch Logs or S3 could incur additional costs, especially for large volumes of log data.
Potential Overhead: Enabling logging adds some overhead to job execution, which may slightly impact job performance.
Default Value:
By default, AWS Glue ETL jobs do not have logging enabled. You need to explicitly enable logging either to Amazon CloudWatch Logs or Amazon S3 when creating or modifying an ETL job.
Pre-requisite:
AWS IAM Permissions:
glue:DescribeJob
glue:GetJob
glue:UpdateJob
logs:CreateLogStream
logs:PutLogEvents
s3:PutObject
AWS CLI installed and configured.
Basic knowledge of AWS Glue ETL jobs, CloudWatch Logs, and S3.
Remediation:
Test Plan:
Using AWS Console:
Sign in to the AWS Management Console.
Navigate to AWS Glue under Services.
In the AWS Glue Dashboard, go to Jobs and select the ETL job you want to inspect.
In the Job Details section, check the Logging settings:
CloudWatch Logs: Ensure that CloudWatch Logs is enabled, and a log group and stream are specified.
S3: Ensure that logs are being written to an S3 bucket if that option is chosen.
If logging is not enabled, enable CloudWatch Logs or S3 logging and save the job settings.
Using AWS CLI:
To describe the ETL job and check if logging is enabled, run:
aws glue get-job --job-name <job-name> --query 'Job.LogUri'
The output should show the LogUri (CloudWatch Log Group or S3 URI) if logging is enabled. Example output:
{ "Job": { "LogUri": "s3://my-log-bucket/glue/jobs/my-etl-job/" } }
If LogUri is missing or empty, logging is not enabled for the job.
Implementation Steps:
Using AWS Console:
Sign in to the AWS Management Console and navigate to AWS Glue.
In the AWS Glue Dashboard, go to Jobs and select the ETL job to modify.
In the Job Details page, find the Logging section.
Enable CloudWatch Logs or S3 logging:
For CloudWatch Logs, select Enable logging and specify the Log Group and Log Stream.
For S3, select Enable logging and specify the S3 bucket and prefix.
Save the changes to apply logging configuration.
Using AWS CLI:
To enable CloudWatch Logs for an existing job, run the following command:
aws glue update-job \ --job-name <job-name> \ --job-update '{"Logging": {"CloudWatchLogs": {"Enabled": true, "LogGroup": "/aws/glue/jobs"}}}'
To enable S3 logging for an existing job, use the following command:
aws glue update-job \ --job-name <job-name> \ --job-update '{"Logging": {"S3": {"Enabled": true, "LogUri": "s3://<bucket-name>/logs/"}}}'
Verify that logging is now enabled by using the get-job command:
aws glue get-job --job-name <job-name> --query 'Job.LogUri
Backout Plan:
If enabling logging causes issues, such as excessive logging or resource limitations:
Identify the affected ETL job.
To disable logging, run:
aws glue update-job \ --job-name <job-name> \ --job-update '{"Logging": {"CloudWatchLogs": {"Enabled": false}}}
Verify that logging has been disabled and the job is functioning correctly.
Note:
CloudWatch Logs: Consider setting up CloudWatch Log Retention policies to manage the storage of job logs and prevent high costs.
Log Storage Costs: Ensure that logging to S3 or CloudWatch is appropriately managed, as large volumes of logs can incur additional storage costs.
Granularity: You can configure the level of detail logged by choosing the appropriate logging level (e.g., ERROR, INFO, or DEBUG) for your Glue jobs.