Description:

Deploying Azure Databricks in a customer-managed virtual network (VNet) ensures that all network traffic between Azure Databricks clusters and other Azure resources remains secure within a private network. This configuration allows organizations to control network access, enhance security by isolating services, and maintain compliance with regulatory requirements for data transmission and network access.

Rationale:

Deploying Azure Databricks in a customer-managed VNet helps maintain strict control over the network environment. It allows the customer to define private IP ranges, enforce network security groups (NSGs), and apply routing controls to isolate Databricks workloads from public networks. This approach strengthens security and meets the requirements of organizations with compliance obligations such as GDPR, HIPAA, or SOC 2.

Impact:

Deploying Azure Databricks in a customer-managed VNet will limit external access, making Databricks clusters and associated resources private. However, it requires additional configuration steps such as creating the VNet, subnet, and configuring DNS settings. This setup may also introduce complexity in accessing other Azure services through the VNet.

Default Value:

By default, Azure Databricks is deployed in an Azure-managed VNet, with public access enabled. To deploy in a customer-managed VNet, configuration must be explicitly done.

Pre-requisites:

  • Azure account.

  • Customer-managed Virtual Network (VNet).

  • Azure Databricks Workspace.

  • The user must have appropriate permissions to manage virtual networks and Databricks (e.g., Network Admin, Databricks Admin, Owner).

Audit:

  1. Sign in to the Azure portal as a Network Admin, Owner, or Databricks Admin.

  2. Navigate to the Azure Databricks Workspace and verify the network settings.

  3. Ensure that the Databricks workspace is deployed in a customer-managed VNet by reviewing the VNet configuration in the Networking section of the Databricks workspace.

Implementation Steps:

  1. Sign in to the Azure portal with Network Admin, Owner, or Databricks Admin permissions.

  2. Create a Virtual Network (VNet) or use an existing one within your desired region.

  3. Ensure that the VNet has at least one subnet with sufficient IP addresses to support the Databricks cluster nodes.

  4. Go to the Azure Databricks workspace creation page and select the option to deploy in a custom VNet.

  5. Under the Networking section of the Databricks workspace creation page, select the customer-managed VNet and subnet.

    • Choose VNet injection or private link to ensure that Databricks will connect securely to other Azure resources without public internet access.

  6. Complete the Databricks workspace creation process and verify the successful deployment.

  7. Configure DNS settings (private DNS zones) to ensure that all communication within the VNet is resolved correctly for Azure Databricks.

  8. Apply Network Security Groups (NSGs) or firewall rules to control traffic flow into and out of the Databricks workspace.

Backout Plan:

  1. Sign in to the Azure portal as a Network Admin, Owner, or Databricks Admin.

  2. Navigate to the Azure Databricks workspace and remove the customer-managed VNet configuration.

  3. Delete the VNet or change the Databricks workspace settings to revert to the default Azure-managed VNet.

  4. Reapply any necessary DNS and networking settings to switch to the public or default network access.

  5. Save and confirm that the Databricks workspace is reverted to the original configuration.

References: