[M] DBT Core (Self Hosted DBT)
DBT Core is the self managed version of DBT. Artifacts that generated by DBT core need to be updated to include their project name and then pushed to the K landing directory.
Files to be loaded
DBT generates manifest, catalog and run_result files that can be used to load metadata and logs into K.
The following are the naming conventions for these dbt artefacts.
The project name can be found in the dbt_project.yaml file.
<project_name>_manifest.json | generated by |
<project_name>_catalog.json | generated by |
<project_name>_run_results_YYYYMMDDHHmmss.json | generated from Optional but required for usage information in KADA. |
mapping.json | manually created for your environment. |
Generating the mapping.json
A mapping file is required to map dbt projects to KADA sources. Example
{
"raw": "af33141.australia-east.azure",
"transformed": "af33141.australia-east.azure"
}
The file is a json payload containing key, value pairs:
key - dbt project name.
value - host of the database matching the KADA onboarded source’s host.
This can be found in Platform Admin > Sources > Edit source > See host name.
You can find the project name in the dbt_project.yaml file like below
Update the mapping file when you create a new project in dbt.
Onboarding dbt source in KADA
Create a DBT source by going to Platform Settings > Integration / Sources
Click Add Source and choose DbtSelect Load DBT File
Fill in the following details about your DBT instance
Name: Give your DBT source a name
Host: Add the host name of your DBT server
DBT Server: Same as your host name of your DBT Server
Ignore DBT Cloud Account ID
Click Finish Setup
Obtain credentials to the KADA Platform landing directory by contacting the KADA Support for keys
Manually create the mapping.json file and upload to the landing directory.
Configure your orchestration tool (eg Airflow) that executes dbt to:
Rename files as per Key Artefacts
Send the files to the landing directory.
The landing directory can be found by going to Platform Settings > Sources > Select Edit on the DBT source you created
You will see the landing directory at the bottom
Go back to Sources and Select Edit Schedule
Select a load frequency that aligns with your pipeline