Table of Contents

Setup Infrastructure

Kubernetes deployment

Platform:
Setup a Kubernetes on any cloud provider (AWS, Azure, Google Cloud) or on-premise solution (e.g. OpenShift)
Specification:
The Kubernetes cluster requires a minimum of 4 nodes with each node having 4 CPU, 16GB MEM
Storage:
1. Setup an object store such as AWS s3, Azure Blob etc.
  1. Setup: s3 bucket setup
2. ~200GB storage to be mounted into the Kubernetes cluster. In cloud setups the Kubernetes services will automatically provision this.
3. Where the organisation defines their own PV definitions eg OpenShift, set the Reclaim Policy is set to Retain. This is important to ensure there is no data lost during prolonged outage at the Kubernetes layer.
Networking:
1. Firewall rules may be required to enable access to HTTPS (443)
2. You may choose to use your own Kubernetes ingress services or use the one provided by KADA's configuration scripts.

...

This setup requires a single machine with a minimum spec of 16CPU, 64GB MEM, 200GB (minimum) storage
Install docker: https://docs.docker.com/engine/install/
Install docker compose: https://docs.docker.com/compose/install/

Configuring Access to KADA Image Repository

The KADA installation will require access to the KADA Image repository.

...

Code Block

docker login -u _json_key --password-stdin https://asia.gcr.io < /tmp/kada-client-[code]-key.json

# Run the following to test connectivity
docker pull busybox:1.28
docker pull asia.gcr.io/kada-external/postgres:1.7.0-pg11

KADA Platform Installation

KADA is packaged as a set of configuration files.
Download the latest package.

...

Edit the following kada_docker_compose.env and set the following value
Code Block
KADA_ROOT_DATA_DIR= KADA_DOCKER_NETWORK= KADA_URL=
In conf/kada-realm.json replace DOMAIN_URL with your base url of your installation. Eg https://example.com
Generate CA Certs base on the domain name of the host. In conf/ rename and replace the cortex.crt and cortex.key with your generated CA Certificates.

Deploy the environment

Code Block
./kada_docker_compose.sh setup #Create storage_root_folder folder (default name is kada-data) cd ${KADA_ROOT_DATA_DIR}/cerebrum/data mkdir kada-data

KADA Platform Configuration

Platform Settings
On the bottom left of screen click the GEAR icon. And select Platform Settings.

Then setup the following properties depending on your deployment setup

Code Block

storage_type = [LOCAL, S3, AZURE]

# For Azure blob storage
root_folder = <s3 bucket name>
storage_azure_access_key
storage_azure_storage_account

# For aws S3 configuration
root_folder = <s3 bucket name>
storage_aws_region 
storage_aws_access_key
storage_aws_secret_access_key

# Set this for LOCAL storage
storage_local_root_directory

# Set this if connecting to sources directly
enable_connection_test = true

Integrating sources to KADA
KADA needs to be configured for each source that you want to integrate. Setup can be configure via the KADA front end. See KADA Integration Setup and Management How to: Onboard a new source
KADA Platform Initial load
1. Setup the following Platform Setting values for initial load
  Code Block
  celery_batch_task_soft_time_limit = 0 celery_batch_task_time_limit = 0 metric_window = 30
2. KADA provides a built in Batch manager for triggering the loading of sources.
3. Seehttps://kadaai.atlassian.net/wiki/spaces/KS/pages/675708946/KADAHow+to%3A+IntegrationOnboard+Setupa+andnew+Management#4source#4.-Manually-Triggering-Source-loads
4. Once the sources have been loaded. Manually trigger the following platform jobs. See https://kadaai.atlassian.net/wiki/spaces/KS/pages/1740931226/KADA+Batch+Manager#Manually-triggering-a-Platform-job
  1. GATHER_METRICS_AND_STATS
  2. POST_PROCESS_QUERIES
  3. DAILY
Schedule sources to load.
KADA provided a scheduler to periodically load the source you have configured.
Setup the following Platform Setting value to enable the scheduler to run.
Code Block
enable_platform_batch = true
Each Source can now be scheduled to run. See https://kadaai.atlassian.net/wiki/spaces/KS/pages/675708946/KADAHow+to%3A+IntegrationOnboard+Setupa+andnew+Management#3source#3.-Scheduling-a-Source

Upgrading KADA

KADA generally releases new updates each month. See our Release versions to see what the latest version available is.

To check your version see How to: Check the version of K platform

...

Code Block

# Download new package
# Transfer package over to kada server under /kada

# unzip package
cd /kada
tar -xvf kada_5.5.0_docker_compose.tar.gz
cd /kada/d_pkg_5.5.0

# From the current version directory copy over the following config
cp /kada/d_pkg_5.4.0/conf/cortex.key /kada/d_pkg_5.5.0/conf/cortex.key
cp /kada/d_pkg_5.4.0/conf/cortex.crt /kada/d_pkg_5.5.0/conf/cortex.crt
cp /kada/d_pkg_5.4.0/conf/kada-realm.json /kada/d_pkg_5.5.0/conf/kada-realm.json

# Update these variables in the newer version's kada_docker_compose.env
KADA_ROOT_DATA_DIR=
KADA_DOCKER_NETWORK=
KADA_URL=

# The restart the environment
./kada_docker_compose_setup.sh update yes-i-can-proceed

KADA Integrations

1. Updating Kubernetes Configs for Source Onboarding

Some sources in KADA require additional configuration to establish connectivity. This section details the additional configuration steps per integration source.

...

Code Block

# Upload edited config
kubectl apply -f path/to/config/file.yaml

# Get service details
kubectl get pods

# Restart cerebrum services. 
# Replace the XXXXXXXXXX-XXXXX with the ID in your environment
kubectl delete pod cerebrum-deployment-5dcd4df76c-s972x-XXXXXXXXXX-XXXXX 
kubectl delete pod cerebrum-celery-batch-worker-deployment-XXXXXXXXXX-XXXXX 
kubectl delete pod cerebrum-celery-worker-deployment-XXXXXXXXXX-XXXXX

1.1. Teradata

Uses ODBC so an update is required to the cerebrum-odbc-ini.yaml file

...

Read access to the DBC and PDCRINFO schemas.
Specifically these tables:
1. PDCRINFO.DBQLogTbl Otherwise DBC.DBQLogTbl
2. PDCRINFO.DBQLSqlTbl Otherwise DBC.DBQLSqlTbl
3. DBC.TABLESV
4. DBC.TABLESV
5. DBC.INDICESV
6. DBC.ALL_RI_CHILDRENV

1.2. SQLServer 2012+

Uses ODBC so an update is required to the cerebrum-odbc-ini.yaml file

...

Code Block

--Query To Create Extended Events Session
CREATE EVENT SESSION [KADA] ON SERVER ADD EVENT sqlserver.sp_statement_completed (
	ACTION(package0.collect_system_time, package0.event_sequence, sqlos.task_time, sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_id, sqlserver.database_name, sqlserver.nt_username, sqlserver.query_hash, sqlserver.server_instance_name, sqlserver.server_principal_name, sqlserver.server_principal_sid, sqlserver.session_id, sqlserver.session_nt_username, sqlserver.transaction_id, sqlserver.username) WHERE (
		(
			[statement] LIKE '%CREATE %'
			OR [statement] LIKE '%DROP %'
			OR [statement] LIKE '%MERGE %'
			OR [statement] LIKE '%FROM %'
			)
		AND [sqlserver].[server_principal_name] <> N'USERS_TO_EXCLUDE'
		AND [sqlserver].[is_system] = (0)
		AND NOT [statement] LIKE 'Insert into % Values %'
		AND [sqlserver].[Query_hash] <> (0)
		)
	), ADD EVENT sqlserver.sql_statement_completed (
	SET collect_statement = (1) ACTION(package0.collect_system_time, package0.event_sequence, sqlos.task_time, sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_id, sqlserver.database_name, sqlserver.nt_username, sqlserver.query_hash, sqlserver.server_instance_name, sqlserver.server_principal_name, sqlserver.session_id, sqlserver.session_nt_username, sqlserver.transaction_id, sqlserver.username) WHERE (
		(
			[statement] LIKE '%CREATE %'
			OR [statement] LIKE '%DROP %'
			OR [statement] LIKE '%MERGE %'
			OR [statement] LIKE '%FROM %'
			)
		AND [sqlserver].[server_principal_name] <> N'N'USERS_TO_EXCLUDE'
		AND [sqlserver].[is_system] = (0)
		AND NOT [statement] LIKE 'Insert into % Values %'
		AND [sqlserver].[Query_hash] <> (0)
		)
	) ADD TARGET package0.event_file (SET filename = N'G:\extended events\Extendedevents.xel', max_file_size = (20), max_rollover_files = (100))
	WITH (MAX_MEMORY = 4096 KB, EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS, MAX_DISPATCH_LATENCY = 30 SECONDS, MAX_EVENT_SIZE = 0 KB, MEMORY_PARTITION_MODE = NONE, TRACK_CAUSALITY = ON, STARTUP_STATE = ON)
GO

-- Check if the session is dropping events and see other data about the session
-- https://sqlperformance.com/2019/10/extended-events/understanding-event-loss-with-extended-events
SELECT
   s.name, 
   s.total_regular_buffers,
   s.regular_buffer_size,
   s.total_large_buffers,
   s.large_buffer_size,
   s.dropped_event_count,
   s.dropped_buffer_count,
   s.largest_event_dropped_size
FROM sys.dm_xe_sessions AS s;

-- Also check log growth rate. Apply filters to remove noise.
-- some filters:
-- [sqlserver].[server_principal_name] = N'name of principal'
-- [sqlserver].[is_system] = (0)
-- [sqlserver].[client_app_name] = N'name of app'

-- NB grouping by XML is not supported in SQLServer. Need to use an external tool for this.

1.3. Oracle Cloud and Oracle Analytics

Required an oracle wallet and the following. items to be updated in the cerebrum-oic.yaml file.

...

Read access to the SYS.DBA_* tables
Specifically the following tables:
1. SYS.DBA_PROCEDURES
2. SYS.DBA_VIEWS
3. SYS_DBA_MVIEWS
4. SYS.DBA_CONSTRAINTS
5. SYS.DBA_CONS_COLUMNS
6. SYS.DBA_TAB_COLUMNS
7. SYS.dba_hist_active_sess_history
8. SYS.dba_hist_snapshot
9. SYS.dba_users
10. SYS.dba_hist_sqltext

1.4. Snowflake

No additional configuration is required. Snowflake uses a python native driver.

...

Read access to the SNOWFLAKE.ACCOUNT_USAGE schema
User must have access to role: ACCOUNTADMIN
1. Alternatively grant to other role. https://docs.snowflake.com/en/sql-reference/account-usage.html#enabling-account-usage-for-other-roles

1.5. Tableau

Permissions

Read user to workgroup Tableau postgres database
Tableau enabled for Metadata API
1. https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_start.html#enable-the-tableau-metadata-api-for-tableau-server
Tableau Server enabled for respository access
1. https://help.tableau.com/current/server/en-us/perf_collect_server_repo.htm
Create a Tableau Server with the following access roles: Site Administrator Creator or Server Administrator

1.6 Informatica (coming soon)

Permissions

Read access to Informatica all repository tables.

1.7 DBT (coming soon)

Permissions

Read access a location that contains the manifest.json and catalogue.json file for each dbt project.

Version	Old Version 21	New Version 22
Changes made by	Ben	Dean Nguyen
Saved on	Aug 21, 2021	Sept 08, 2021

Versions Compared

Key

Setup Infrastructure

Configuring Access to KADA Image Repository

KADA Platform Installation

KADA Platform Configuration

Upgrading KADA

KADA Integrations

1. Updating Kubernetes Configs for Source Onboarding

1.1. Teradata

1.2. SQLServer 2012+

1.3. Oracle Cloud and Oracle Analytics

1.4. Snowflake

1.5. Tableau

1.6 Informatica (coming soon)

1.7 DBT (coming soon)

Content Comparison

Versions Compared

Key

Setup Infrastructure

Configuring Access to KADA Image Repository

KADA Platform Installation

KADA Platform Configuration

Upgrading KADA

KADA Integrations

1. Updating Kubernetes Configs for Source Onboarding

1.1. Teradata

1.2. SQLServer 2012+

1.3. Oracle Cloud and Oracle Analytics

1.4. Snowflake

1.5. Tableau

1.6 Informatica (coming soon)

1.7 DBT (coming soon)