Starting from 1st of April 2024 dbfs init scrips are depreciated. They do provide an option to extend this function until end of September 2024. In any case its the customer responsibility to migrate out of dbfa init scripts which are ow called as legacy init script.
The other alternate approach is to host the init script in one of the places below and call that script.
- Azure storage account, gcp or aws bucket
- workspace
- inline (for global)
One of our customer has lot of workspaces with many init scripts, most of them are legacy. However we need an automated solution to find all the legacy init scripts (dbfs) so that we can migrate it.
I came up with a powershell script to find this across all the workspace in a given subscription. The script finds the init script from workspace, interactive clusters and job clusters.
You are right the customer cloud provider is Azure and this script is for Azure. It’s not a big deal to change the script to run in AWS or in GCP. Major chunk of the script will remain the same.
The script outputs collected dataset into temp directory in C drive. Please do change the subscription name and output path, the rest remains the same. The script provides all type of init scripts, you need to filter the dbfs type which is what is considered as legacy and depreciated.
Script
$subscriptionName = "<<subscription mame>>"
Login-AzAccount -Subscription $subscriptionName
$allWorkspace = Get-AzDatabricksWorkspace | Select-Object Name, Location, ResourceGroupName, SkuName, Url
$getAuthToken = Get-AzAccessToken -Resource "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"
$output = New-Object System.Collections.Generic.List[Object]
$headers = @{
Authorization = "Bearer $($getAuthToken.Token)"
}
foreach ($workspace in $allWorkspace) {
Write-Output "Collecting init script details for $($workspace.Name) Workspace"
$baseUri = "https://$($workspace.Url)"
$getGlobalInitScriptsListUri = "$baseUri/api/2.0/global-init-scripts"
$getClusterUri = "$baseUri/api/2.0/clusters/list"
$getJobsUri = "$baseUri/api/2.1/jobs/list"
$getJobDetailsUri = "$baseUri/api/2.1/jobs/get"
#Check Global Init Scripts
$globalInitScriptsList = (Invoke-WebRequest -Uri $getGlobalInitScriptsListUri -Headers $headers -Method Get).Content | ConvertFrom-Json
foreach ($globalInitScript in $globalInitScriptsList.scripts) {
$getGlobalInitScriptContent = (Invoke-WebRequest -Uri "$getGlobalInitScriptsListUri/$($globalInitScript.script_id)" -Headers $headers -Method Get).Content | ConvertFrom-Json
$outObject = [PSCustomObject]@{
WorkSpace = $workspace.name
Cluster = "N/A"
JobName = "N/A"
Scope = "Global"
ScriptName = $getGlobalInitScriptContent.name
ScriptType = "N/A"
ScriptContent = [Text.Encoding]::Utf8.GetString([Convert]::FromBase64String($getGlobalInitScriptContent.script))
}
$output.Add($outObject)
}
#Check Interactive Init Scripts
$clusterUIList = (Invoke-WebRequest -Uri $getClusterUri -Headers $headers -Method Get).Content | ConvertFrom-Json
foreach ($clusterUI in $clusterUIList.clusters | Where-Object cluster_source -eq "UI") {
foreach ($script in $clusterUI.init_scripts) {
$outObject = [PSCustomObject]@{
WorkSpace = $workspace.name
Cluster = $clusterUI.cluster_name
JobName = "N/A"
Scope = "Cluster"
ScriptName = "N/A"
ScriptType = $script.psobject.properties.Name
ScriptContent = $script.psobject.properties.Value.destination
}
$output.Add($outObject)
}
}
#Get job cluster
$allJobs = (Invoke-WebRequest -Uri $getJobsUri -Headers $headers -Method Get).Content | ConvertFrom-Json
foreach ($job in $allJobs.jobs) {
$jobParam = @{job_id = $job.job_id }
$jobDetail = (Invoke-WebRequest -Uri $getJobDetailsUri -Headers $headers -Method Get -Body $jobParam).Content | ConvertFrom-Json
foreach ($jobCluster in $jobDetail.settings.job_clusters) {
$newJobCluster = $jobCluster.new_cluster
foreach ($script in $newJobCluster.init_scripts) {
$outObject = [PSCustomObject]@{
WorkSpace = $workspace.name
Cluster = $jobCluster.job_cluster_key
JobName = $jobDetail.settings.name
Scope = "Job"
ScriptName = "N/A"
ScriptType = $script.psobject.properties.Name
ScriptContent = $script.psobject.properties.Value.destination
}
$output.Add($outObject)
}
}
}
}
Write-Output $output
$output | Export-Csv C:\temp\output.csv
Leave a Reply