Scroll ignore | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||
About Collectors
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
...
You can download the latest Core Library and Snowflake whl via Platform Settings → Sources → Download Collectors
...
The collector requires a set of parameters to connect to and extract metadata from Snowflake
FIELD | FIELD TYPE | DESCRIPTION | EXAMPLE | ||
---|---|---|---|---|---|
account | string | Snowflake account | “abc123.australia-east.azure” | ||
username | string | Username to log into the snowflake account, if use_private_key is true, this must be the user associated to the private key | |||
password | string | Password to log into the snowflake account, if use_private_key is true then this is the password/passphrase to that private key, if your private key for some reason is NOT encrypted, then you can leave this blank. | |||
information_database | string | Database where all the required tables are located, generally this is snowflake | “snowflake” | ||
role | string | The role to access the required account_usage tables, generally this is accountadmin | “accountadmin” | ||
warehouse | string | The warehouse to execute the queries against | “xs_analytics” | ||
databases | list<string> | A list of databases to extract from Snowflake | [“dwh”, “adw”] | ||
login_timeout | integer | The max amount of time in seconds allowed for the extractor to establish and authenticate a connection, generally 5 is sufficient but if you have a slow network you can increase this up to 20 | 5 | ||
output_path | string | Absolute path to the output location where files are to be written | “/tmp/output” | ||
mask | boolean | To enable masking or not | true | ||
compress | boolean | To gzip the output or not | true | ||
use_private_key | boolean | To use private key or not | true | ||
private_key | string | The private key value as text. The key requires formatting
|
| ||
host | string | The host value for snowflake that was onboarded in K | “abc123.australia-east.azure.snowflakecomputing.com” |
These parameters can be added directly into the run or you can use pass the parameters in via a JSON file. The following is an example you can use that is included in the example run code below.
...
Code Block | ||
---|---|---|
| ||
import os import argparse from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger from kada_collectors.extractors.snowflake import Extractor get_generic_logger('root') # Set to use the root logger, you can change the context accordingly or define your own logger _type = 'snowflake' dirname = os.path.dirname(__file__) filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type)) parser = argparse.ArgumentParser(description='KADA Snowflake Extractor.') parser.add_argument('--config', '-c', dest='config', default=filename, help='Location of the configuration json, default is the config json in the same directory as the script.') args = parser.parse_args() start_hwm, end_hwm = get_hwm(_type) ext = Extractor(**load_config(args.config)) ext.test_connection() ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm}) publish_hwm(_type, end_hwm) |
Info |
---|
If your organisation has a proxy operating on where this script runs and you are using an private link for snowflake you may encounter an issue resulting in a 403 error when fetching result batches. In such a scenario this is due to the private link not requiring a proxy but the s3 data fetch which snowflake uses requires a proxy, you will need to set the following. export HTTP_PROXY=”http://username:password@proxyserver.company.com:80” Then explicitly call out snowflake itself to not use a proxy export NO_PROXY=”.snowflakecomputing.com” On a windows setup you would use “set” instead of “export” in the command line. |
Advance options:
If you wish to maintain your own high water mark files elsewhere you can use the above section’s script as a guide on how to call the extractor. The configuration file is simply the keyword arguments in JSON format. Refer to this document for more information https://kadaai.atlassian.net/wiki/spaces/KSL/pages/1902411777/Additional+Notes#Storing-HWM-in-another-location
...