Scroll ignore | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||
About Collectors
Collectors are extractors that are developed and managed by you (A customer of K).
...
Deploying and orchestrating the extract code
Managing a high water mark so the extract only pull the latest metadata
Storing and pushing the extracts to your K instance.
...
Pre-requisites
Python 3.6 - 3.9
Access to K landing directory
Access to Postgres (see section below)
Postgres AccessCollector Server Minimum Requirements
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
Postgres Requirements
Access to Postgres
The user used for the extractor will need access to a number of pg_catalog tables outlined below
PG Catalog
Generally all users should have access to the pg_catalog tables on DB creation. In the event the user doesn’t have access, explicit grants will need to be done per new DB in Postgres.
...
pg_class
pg_namespace
pg_proc
pg_database
pg_language
pg_type
pg_collation
pg_depend
pg_sequence
pg_constraint
pg_authid
pg_auth_members
Databases
All other databases that you want onboarded
...
Go to Settings, Select Sources and click Add Source
Select “Load from File” option
Give the source a Name - e.g. Postgres Production
Add the Host name for the Postgres Server
Click Finish Setup
...
You can download the latest Core Library and whl via Platform Settings → Sources → Download Collectors
...
Download Collectors
...
You can request the whl from the Kada support team (support@kada.ai).
Info |
---|
From 5.33 (Late October 2023) you can download the whl directly from the Platform |
Run the following command to install the collector.
...
The collector requires a set of parameters to connect to and extract metadata from Postgres.
FIELD | FIELD TYPE | DESCRIPTION | EXAMPLE |
---|---|---|---|
host | string | Postgres host as per what was onboarded in the K platform, generally we onboard it as the same value as server, but if you did it differently, use that value | “example.postgres.localhost” |
server | string | Postgres host to establish a connection | “example.postgres.localhost” |
username | string | Username to log into Postgres | “postgres_user” |
password | string | Password to log into the Postgres | |
databases | list<string> | A list of databases to extract from Postgres | [“dwh”, “adw”] |
port | integer | Postgres port, general default is 5432 | 5432 |
output_path | string | Absolute path to the output location where files are to be written | “/tmp/output” |
mask | boolean | To enable masking or not | true |
compress | boolean | To gzip the output or not | true |
meta_only | boolean | To extract metadata only or not, note as of this current version only metadata can be extracted regardless of this value | true |
These parameters can be added directly into the run or you can use pass the parameters in via a JSON file. The following is an example you can use that is included in the example run code below.
...
Code Block | ||
---|---|---|
| ||
import os import argparse from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger from kada_collectors.extractors.postgres import Extractor get_generic_logger('root') # Set to use the root logger, you can change the context accordingly or define your own logger _type = 'postgres' dirname = os.path.dirname(__file__) filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type)) parser = argparse.ArgumentParser(description='KADA Postgres Extractor.') parser.add_argument('--config', '-c', dest='config', default=filename, help='Location of the configuration json, default is the config json in the same directory as the script.') parser.add_argument('--name', '-n', dest='name', default=_type, help='Name of the collector instance.') args = parser.parse_args() start_hwm, end_hwm = get_hwm(_typeargs.name) ext = Extractor(**load_config(args.config)) ext.test_connection() ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm}) publish_hwm(_type, end_hwm) |
...