Integrating Azure Databricks with Azure SQL Database using Private Endpoints

Integrating Azure Databricks with Azure SQL Database using Private Endpoints

In today’s data-driven world, efficiently processing and analysing data is pivotal. Two essential tools that provide such capabilities are Azure Databricks and Azure SQL Database. Azure Databricks is an Apache Spark-based analytics platform optimised for the Azure cloud services platform, while Azure SQL Database is a managed relational database service provided by Microsoft’s cloud.

This blog post explores how to integrate Azure Databricks with Azure SQL Database using Azure Private Link’s Private Endpoint feature.

Azure Private Link ensures that your data is kept secure within the Azure network without exposing it to the public Internet. This service creates a private network interface within your Azure Virtual Network (VNet) for each Private Endpoint.

The following step-by-step guide illustrates the process.

First, navigate to your Azure SQL Database on the Azure portal.

  1. Click on the ‘Private Endpoint Connections’ under the Security section.
  2. Click on ‘Add’.
  3. Fill in the details (Subscription, Resource Group, Name, Region, etc.).
  4. In the ‘Resource’ section, select the Azure SQL Database instance you’d like to connect to.
  5. Create a new ‘Private DNS Zone’ if you don’t already have one.
  6. Review the settings and click ‘Create’.

Next, in your Databricks workspace, use the following JDBC connection format to connect to the Azure SQL Database:

1
jdbc_url = "jdbc:sqlserver://<private_endpoint_url>:1433;database=<your_database>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

Ensure you replace <private_endpoint_url> with the URL of your private endpoint, and <your_database> with the name of your database.

In a Databricks notebook, write the following code:

1
2
3
4
5
6
7
connection_properties = {
  "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
  "user" : dbutils.secrets.get(scope = "<scope>", key = "<key>"),
  "password" : dbutils.secrets.get(scope = "<scope>", key = "<key>")
}

df = spark.read.jdbc(url=jdbc_url, table="<table_name>", properties=connection_properties)

Make sure to replace <scope> and <key> with the correct values for your Databricks secret scope and key, and replace <table_name> with the name of the table, you’d like to connect to.

You can verify the connection by displaying the DataFrame:

1
display(df)

This code will show the data in the table, confirming the connection to the Azure SQL Database.

Integrating Azure Databricks with Azure SQL Database using Azure Private Link provides secure and streamlined data processing and analysis.

Bear in mind that understanding the security, management, and operational implications of using Private Link is crucial before implementation. However, its advantages in creating a secure and scalable data analytics platform are undeniable, making it a worthy addition to your Azure toolkit.

Remember to follow the principle of least privilege and only grant access to those who require it. Stay tuned for more insights on Azure services and happy data crunching!