dbfs init script – depreciated – databricks

Starting from 1st of April 2024 dbfs init scrips are depreciated. They do provide an option to extend this function until end of September 2024. In any case its the customer responsibility to migrate out of dbfa init scripts which are ow called as legacy init script.

The other alternate approach is to host the init script in one of the places below and call that script.

  • Azure storage account, gcp or aws bucket
  • workspace
  • inline (for global)

One of our customer has lot of workspaces with many init scripts, most of them are legacy. However we need an automated solution to find all the legacy init scripts (dbfs) so that we can migrate it.

I came up with a powershell script to find this across all the workspace in a given subscription. The script finds the init script from workspace, interactive clusters and job clusters.

You are right the customer cloud provider is Azure and this script is for Azure. It’s not a big deal to change the script to run in AWS or in GCP. Major chunk of the script will remain the same.

The script outputs collected dataset into temp directory in C drive. Please do change the subscription name and output path, the rest remains the same. The script provides all type of init scripts, you need to filter the dbfs type which is what is considered as legacy and depreciated.

Script

$subscriptionName = "<<subscription mame>>"
Login-AzAccount -Subscription $subscriptionName
$allWorkspace = Get-AzDatabricksWorkspace | Select-Object Name, Location, ResourceGroupName, SkuName, Url 
$getAuthToken = Get-AzAccessToken -Resource "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"
$output = New-Object System.Collections.Generic.List[Object]
$headers = @{
    Authorization = "Bearer $($getAuthToken.Token)"
}
foreach ($workspace in $allWorkspace) {
    Write-Output "Collecting init script details for $($workspace.Name) Workspace"
    $baseUri = "https://$($workspace.Url)"
    $getGlobalInitScriptsListUri = "$baseUri/api/2.0/global-init-scripts"
    $getClusterUri = "$baseUri/api/2.0/clusters/list"
    $getJobsUri = "$baseUri/api/2.1/jobs/list"
    $getJobDetailsUri = "$baseUri/api/2.1/jobs/get"

    #Check Global Init Scripts
    $globalInitScriptsList = (Invoke-WebRequest -Uri $getGlobalInitScriptsListUri -Headers $headers -Method Get).Content | ConvertFrom-Json
    foreach ($globalInitScript in $globalInitScriptsList.scripts) {
        $getGlobalInitScriptContent = (Invoke-WebRequest -Uri "$getGlobalInitScriptsListUri/$($globalInitScript.script_id)" -Headers $headers -Method Get).Content | ConvertFrom-Json
        $outObject = [PSCustomObject]@{
            WorkSpace     = $workspace.name
            Cluster       = "N/A"
            JobName       = "N/A"
            Scope         = "Global"
            ScriptName    = $getGlobalInitScriptContent.name
            ScriptType    = "N/A"
            ScriptContent = [Text.Encoding]::Utf8.GetString([Convert]::FromBase64String($getGlobalInitScriptContent.script))
        }    
        $output.Add($outObject)
      
    }
    #Check Interactive Init Scripts 

    $clusterUIList = (Invoke-WebRequest -Uri $getClusterUri -Headers $headers -Method Get).Content | ConvertFrom-Json
    foreach ($clusterUI in $clusterUIList.clusters | Where-Object cluster_source -eq "UI") {
        foreach ($script in $clusterUI.init_scripts) {
            $outObject = [PSCustomObject]@{
                WorkSpace     = $workspace.name
                Cluster       = $clusterUI.cluster_name
                JobName       = "N/A"
                Scope         = "Cluster"
                ScriptName    = "N/A"
                ScriptType    = $script.psobject.properties.Name
                ScriptContent = $script.psobject.properties.Value.destination
            }    
            $output.Add($outObject)
        }
    }

    #Get job cluster
    $allJobs = (Invoke-WebRequest -Uri $getJobsUri -Headers $headers -Method Get).Content | ConvertFrom-Json
    foreach ($job in $allJobs.jobs) {
        $jobParam = @{job_id = $job.job_id }
        $jobDetail = (Invoke-WebRequest -Uri $getJobDetailsUri -Headers $headers -Method Get -Body $jobParam).Content | ConvertFrom-Json
        foreach ($jobCluster in $jobDetail.settings.job_clusters) {
            $newJobCluster = $jobCluster.new_cluster
            foreach ($script in $newJobCluster.init_scripts) {
                $outObject = [PSCustomObject]@{
                    WorkSpace     = $workspace.name
                    Cluster       = $jobCluster.job_cluster_key
                    JobName       = $jobDetail.settings.name
                    Scope         = "Job"
                    ScriptName    = "N/A"
                    ScriptType    = $script.psobject.properties.Name
                    ScriptContent = $script.psobject.properties.Value.destination
                }    
                $output.Add($outObject)
            }

        }

    }
}
Write-Output $output
$output | Export-Csv C:\temp\output.csv


Posted

in

, , , ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *