Scroll ignore | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||
About Collectors
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
...
Pre-requisites
Python 3.6 8 - 3.1011
Access to K landing directory
Access toDBT Cloud
...
You can download the Latest Core Library and Athena whl via Platform Settings → Sources → Download Collectors
...
You will also need to install the common library kada_collectors_lib -1.0.0 for this collector to function properly.
...
The collector requires a set of parameters to connect to and extract metadata from DBT Cloud
FIELD | FIELD TYPE | DESCRIPTION | EXAMPLE | ||
---|---|---|---|---|---|
account_id | string | DBT cloud account Id | “xxxxx.australia-east.azure” | ||
environment_ids | list<string> | List of environment Ids to extract | 12345,234234 | ||
token | string | Generated token from the DBT console |
| ||
output_path | string | Absolute path to the output location where files are to be written | “/tmp/output” | ||
timeout | integer | By default we allow 20 seconds for the API to respond, for slower connections it may take longer, so adjust accordingly. | 20 | ||
mapping | JSON | Mapping between DBT project ids and their corresponding database host value in K. | The keys are DBT project ids where as the host is corresponding onboarded host in K
| ||
dry_run | boolean | If you enable dry run, the extractor will simply produce the mapping.json file only which helps you map all your projects to a corresponding database host. | false | ||
compress | boolean | To gzip the output or not | true |
These parameters can be added directly into the run or you can use pass the parameters in via a JSON file. The following is an example you can use that is included in the example run code below.
...
Code Block |
---|
import os import argparse from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger from kada_collectors.extractors.dbt import Extractor get_generic_logger('root') # Set to use the root logger, you can change the context accordingly or define your own logger _type = 'dbt' dirname = os.path.dirname(__file__) filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type)) parser = argparse.ArgumentParser(description='KADA DBT Extractor.') parser.add_argument('--config', '-c', dest='config', default=filename, help='Location of the configuration json, default is the config json in the same directory as the script.') parser.add_argument('--name', '-n', dest='name', default=_type, help='Name of the collector instance.') args = parser.parse_args() start_hwm, end_hwm = get_hwm(_typeargs.name) ext = Extractor(**load_config(args.config)) ext.test_connection() ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm}) publish_hwm(_type, end_hwm) |
...