Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Scroll ignore
scroll-viewporttrue
scroll-pdftrue
scroll-officetrue
scroll-chmtrue
scroll-docbooktrue
scroll-eclipsehelptrue
scroll-htmltrue
scroll-epubtrue

Open in new tab

About Collectors

Insert excerpt
Collector Method
Collector Method
nameabout

...

  • account_usage.history

  • account_usage.views

  • account_usage.tables

  • account_usage.columns

  • account_usage.copy_history

  • account_usage.grants_to_roles

  • account_usage.grants_to_users

  • account_usage.schemata

  • account_usage.databases

You can use the following code:

...

It is recommended to use a python environment such as pyenv or pipenv if you are not intending to install this package at the system level.Some python packages also have dependencies on the OS level packages, so you may be required to install additional OS packages if the below fails to install.

You can download the Latest latest Core Library and Athena Snowflake whl via Platform Settings → SourcesDownload Collectors

...

Run the following command to install the collector.

Code Block
pip install kada_collectors_extractors_<version>-none-any.whl

...

Code Block
pip install kada_collectors_lib-<version>-none-any.whl

Some python packages also have dependencies on the OS level packages, so you may be required to install additional OS packages if the below fails to install. These are some known possible packages you may require depending on your OS, this is not exhaustive and only serves as a guide.

...

The collector requires a set of parameters to connect to and extract metadata from Snowflake

FIELD

FIELD TYPE

DESCRIPTION

EXAMPLE

account

string

Snowflake account

“abc123.australia-east.azure”

username

string

Username to log into the snowflake account, if use_private_key is true, this must be the user associated to the private key

password

string

Password to log into the snowflake account, if use_private_key is true then this is the password/passphrase to that private key, if your private key for some reason is NOT encrypted, then you can leave this blank.

information_database

string

Database where all the required tables are located, generally this is snowflake

“snowflake”

role

string

The role to access the required account_usage tables, generally this is accountadmin

“accountadmin”

warehouse

string

The warehouse to execute the queries against

“xs_analytics”

databases

list<string>

A list of databases to extract from Snowflake

[“dwh”, “adw”]

login_timeout

integer

The max amount of time in seconds allowed for the extractor to establish and authenticate a connection, generally 5 is sufficient but if you have a slow network you can increase this up to 20

5

output_path

string

Absolute path to the output location where files are to be written

“/tmp/output”

mask

boolean

To enable masking or not

true

compress

boolean

To gzip the output or not

true

use_private_key

boolean

To use private key or not

true

private_key

string

The private key value as text. The key requires formatting

  1. Convert your key file to UNIX end of line character.

  2. Convert unix end of line characters to literal \n

  3. Include the header and footer BEGIN and END

Code Block
-----BEGIN ENCRYPTED PRIVATE KEY-----\nblah\nblah\nblah\n-----END ENCRYPTED PRIVATE KEY-----

host

string

The host value for snowflake that was onboarded in K

“abc123.australia-east.azure.snowflakecomputing.com”

These parameters can be added directly into the run or you can use pass the parameters in via a JSON file. The following is an example you can use that is included in the example run code below.

...

Code Block
languagepy
import os
import argparse
from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger
from kada_collectors.extractors.snowflake import Extractor

get_generic_logger('root') # Set to use the root logger, you can change the context accordingly or define your own logger

_type = 'snowflake'
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type))

parser = argparse.ArgumentParser(description='KADA Snowflake Extractor.')
parser.add_argument('--config', '-c', dest='config', default=filename, help='Location of the configuration json, default is the config json in the same directory as the script.')
args = parser.parse_args()

start_hwm, end_hwm = get_hwm(_type)

ext = Extractor(**load_config(args.config))
ext.test_connection()
ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm})

publish_hwm(_type, end_hwm)
Info

If your organisation has a proxy operating on where this script runs and you are using an private link for snowflake you may encounter an issue resulting in a 403 error when fetching result batches.

In such a scenario this is due to the private link not requiring a proxy but the s3 data fetch which snowflake uses requires a proxy, you will need to set the following.

export HTTP_PROXY=”http://username:password@proxyserver.company.com:80”
export HTTPS_PROXY=”http://username:password@proxyserver.company.com:80”

Then explicitly call out snowflake itself to not use a proxy

export NO_PROXY=”.snowflakecomputing.com”

On a windows setup you would use “set” instead of “export” in the command line.

Advance options:

If you wish to maintain your own high water mark files elsewhere you can use the above section’s script as a guide on how to call the extractor. The configuration file is simply the keyword arguments in JSON format. Refer to this document for more information https://kadaai.atlassian.net/wiki/spaces/KSL/pages/1902411777/Additional+Notes#Storing-HWM-in-another-location

...