Copy of Configuring the KADA Great Expectations Plugin
Introducing the KADA Great Expectations (GX) Plugin
The KADA GX plugin is used by Great Expectations to push validation results to your K instance. The Plugin can be provided by request - please reach out to support@kada.ai. The plugin will be made available for downloading directly from K in the near future.
The plugin will handle uploading the validation results to the correct landing directory in K and also handle formatting the file name and adding some additional metadata to the validation result to get best experience in K.
1. Installing the KADA GX Plugin
Install the python wheel into your GX environment
pip install kada-ge-store-plugin
The Kada plugin has been tested with GX versions 0.15.41 - 0.17.2
and Python 3.8 - 3.11
Once installed, you will need to complete the following:
Add the storage action to your checkpoint yamls
Add batch_metadata to configured datasource and predefined assets inside your great_expectations.yaml
Add kada_targets to query based batches inside checkpoint yamls
LIMITATIONS
If you are running expectations over query results in GX then we can only associate your validation results to tables at the lower grain, we will not be able to associate the validation to specific columns within that query
2. Add the Plugin to Checkpoint Action List
Add the plugin to your checkpoint.yaml files as part of the action_list
AZURE_STORAGE_ACCOUNT
is the Storage account nameAZURE_SAS_TOKEN
is the Storage level SAS Token, it should have permissions to the Blob Service with all Resource Types allowed and Read/Write/List/Add/Create permissionsAZURE_CONTAINER
is the container within the Storage account in which files will be stored under the prefix
path
action_list:
- name: store_kada_validation_result
action:
class_name: KadaStoreValidationResultsAction
module_name: kada_ge_store_plugin.kada_store_validation
azure_container: ${AZURE_CONTAINER}
prefix: lz/ge_landing/landing
azure_storage_account: ${AZURE_STORAGE_ACCOUNT}
azure_sas_token: ${AZURE_SAS_TOKEN}
If you simply want to test the action locally and target a local file directory first you can populate provide a test_directory
to the action.
For example the below configuration will push formatted validation results to /tmp/ge_validations/lz/ge_landing/landing on your local file directory
action_list:
- name: store_kada_validation_result
action:
class_name: KadaStoreValidationResultsAction
module_name: kada_ge_store_plugin.kada_store_validation
container: whatever
prefix: lz/ge_landing/landing
test_directory: /tmp/ge_validations
Remove the test_directory parameter once you are ready to push to our Landing Area
If you have another action that stores the results already and add this action, GX will simply just push the validations in both locations, so it won’t impact any existing process you may have that requires the validation results
3. Coding Standards
To get the best experience of viewing your Data Quality objects in K, you should add the following to your existing setup or keep these conventions in mind when coding for GX.
As a general rule, upfront defined assets within the configuration.yaml should include batch_metadata
, assets not defined upfront and query assets should include kada_targets
under evaluation_parameters
in the checkpoint.yaml files.
3.1. Great Expectations Configuration
You will need to add batch_metadata
/ batch_spec_passthrough
with the following values to the different listed connection types
kada_database_name
kada_host_name
kada_database_name will hold the name of the targeted database
kada_host_name will hold the service name or host of the targeted database
3.1.1. ConfiguredDatasourceConnectors
For datasources where the assets associated to the datasource are defined upfront in the great_expectations.yaml add the batch_metadata
section to each defined asset, note for non-fluent style (v15.x.x or older) datasources please use batch_spec_passthrough
instead of batch_metadata
Where MY_DB
and MY_HOST
can be either hard coded or environment driven
For query type assets you have the option to do the same, but this is not required as you will be adding a value called kada_targets in your checkpoint file which is explained in 3.2. Checkpoints
If using GX v15.x.x or older when using non fluent style, use batch_spec_passthrough
instead
3.1.2. InferredDatasourceConnectors
No additions are required, these additions will be required in the checkpoint yaml file level instead, applicable for non File based
3.1.3. RuntimeDatasourceConnectors
No additions are required, these additions will be required in the checkpoint yaml file level instead
3.2. Checkpoints
For query based assets or run time assets that are query based add evaluation_parameters
if it does not already exist to each applicable batch request. Under this element add
kada_targets
See 3.1. Great Expectations Configuration for the definition of kada_targets
This will define what the intended target table is for the query asset or run time query asset
It should be in the form
Not the period delimitation is important as it tells K which part of the naming is Database/Schema/Table etc., so if your names contain a period, please replace it with an underscore (_)
For run time query assets such as runtime_defined_test_node
below
For inferred asset types
Similar for configured query assets such as this predefined query asset query_asset_node_ref
If you define your batch requests in python, simply add evaluation_parameters
to the kwargs for the BatchRequest
/ RuntimeBatchRequest
object
Future Improvements
In the future K will allow certain properties you can assign to the expectation meta
within each expectation suite to allow further enrichment of the expectation Data Quality object in the K platform.