Profile Applicability
- Level 2
Description:
BigQuery tables may store sensitive data that requires classification for security and compliance purposes. Google's Sensitive Data Protection tools can be used to automatically discover, classify, and protect data within BigQuery across an organization.
Rationale:
Classifying data in BigQuery is crucial for managing and protecting sensitive information. Leveraging tools like Google Cloud's Sensitive Data Protection, which employs machine learning and pattern matching, automates the discovery and classification of sensitive data, ensuring robust data governance and reducing the risk of accidental exposure.
Impact:
Cost: Implementing Google Cloud's Sensitive Data Protection or third-party tools incurs additional costs.
Resource Management: Continuous monitoring and classification require regular configuration and oversight.
Default Value:
By default, BigQuery data is not classified unless manually configured.
Audit Steps:
Using Google Cloud Console
Navigate to Cloud DLP.
Confirm the presence of a discovery scan configuration for either the organization or project.
Remediation Steps:
Enable Data Profiling
Access Cloud DLP Configurations.
Click Create Configuration.
Set up data profiling:
For projects, refer to Profiling Projects.
For organizations or folders, refer to Profiling Organizations or Folders.
Review Findings:
Identify columns or tables with high data risk that contain sensitive data without proper protections.
Mitigation options:
Apply BigQuery policy tags to restrict access to specific roles.
Use de-identification techniques, such as masking or tokenization, to protect sensitive data.
Integrate Findings Into Security Operations:
Publish data profiles to services such as:
Use Pub/Sub for automating remediation or alerting about new or altered data risks.
Backout Plan:
Step 1: Disable Data Profiling & Classification Scans
If classification scans cause disruptions, disable Cloud DLP scans
gcloud dlp jobs delete JOB_ID
Replace JOB_ID with the actual Data Profiling job ID.
Step 2: Remove Data Classification Labels
If classification is incorrectly applied, remove dataset labels:
bq update --clear_labels PROJECT_ID:DATASET_NAME
Replace PROJECT_ID and DATASET_NAME accordingly.
Step 3: Notify Stakeholders
- Inform data security teams before making classification changes.