Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Applicable to customer on-premise/cloud deployments

Table of Contents

Setup Infrastructure

Kubernetes deployment

  1. Platform:
    Setup a Kubernetes on any cloud provider (AWS, Azure, Google Cloud) or on-premise solution (e.g. OpenShift)

  2. Environment Sizing for Deployment of KADA into your infrastructure:

    1. Small Environments ( <1 Million objects)

      1. 4 nodes each node 4CPU, 16Gb Memory, Provisioned storage SSD disk any IOPS

    2. Large Environments (more than 1 Million objects)

      1. 5 nodes

        1. 4 nodes each node 4CPU, 16Gb Memory, Provisioned storage SSD disk any IOPS

        2. 1 node 8CPU, 32Gb Memory, Provisioned storage SSD with 1100 IOPS minimum.

    3. For very complex environments 10M+ objects or large volume of historical data, infrastructure requirements can scale out according to data volumes.

  3. Storage:

    1. Setup an object store such as AWS s3, Azure Blob etc.

      1. Setup: s3 bucket setup

    2. Minimum 200GB storage, to be mounted into Persistent volumes in the Kubernetes cluster. PV Class definitions need to be configured to meet the minimum IOPS requirements above.

    3. Where the organisation defines their own PV definitions eg OpenShift, set the Reclaim Policy is set to Retain. This is important to ensure there is no data lost during prolonged outage at the Kubernetes layer.

  4. Networking:

    1. Firewall rules may be required to enable access to HTTPS (443)

    2. You may choose to use your own Kubernetes ingress services or use the one provided by KADA's configuration scripts.

Docker deployment (not recommended for production environments)

  1. This setup requires a single machine with a minimum spec of 16CPU, 64GB MEM, 200GB (minimum) storage

  2. Install docker: https://docs.docker.com/engine/install/

  3. Install docker compose: https://docs.docker.com/compose/install/

Configuring Access to KADA Image Repository

The KADA installation will require access to the KADA Image repository.

The Kubernetes or Docker environment will need internet access to this repository to install the platform.

If your environment is air gap please advise the KADA Team and we will arrange an alternative method for loading imaging into your environment.

KADA will provide customers with a unique kada-client-[unique_customer_code]-key.json to access the repository.

Kubernetes deployment

To setup the access key in your Kubernetes environment run the following.

Code Block
kubectl create secret docker-registry kada-image-credentials \
  --docker-server=https://asia.gcr.io \
  --docker-username=_json_key \
  --docker-email=kada-client-[unique_customer_code]@kada-external.iam.gserviceaccount.com \
  --docker-password="$(cat kada-client-[unique_customer_code]-key.json)"
  
kubectl patch serviceaccount <REPLACE WITH THE SERVICE ACCOUNT NAME OR default> \
  -p "{\"imagePullSecrets\": [{\"name\": \"kada-image-credentials\"}]}"
  
# Run the following to test connectivity
docker pull busybox:1.28
docker pull asia.gcr.io/kada-external/postgres:1.7.0-pg11

Docker deployment

To setup the access key in your Docker environment run the following.

Code Block
docker login -u _json_key --password-stdin https://asia.gcr.io < /tmp/kada-client-[code]-key.json

# Run the following to test connectivity
docker pull busybox:1.28
docker pull asia.gcr.io/kada-external/postgres:1.7.0-pg11

KADA Platform Installation

  1. KADA is packaged as a set of configuration files.

  2. Download the latest package.

Kubernetes deployments

  1. In keycloak/k8s/keycloak-kada-realm.yaml replace DOMAIN_URL with your base url of your installation. Eg https://example.com

  2. Platform credentials for internal services can be updated from their default values

    1. Edit postgres/k8s/credentials.yaml to set your own password

      Code Block
      POSTGRES_PASS=
    2. Edit keycloak/k8s/keycloak-credentials.yaml to set your own password

      Code Block
      POSTGRES_PASSWORD=
      KEYCLOAK_PASSWORD=
  3. Generate CA Certificates

    1. Generate CA Certs base on the domain name of the host.
      Once generate run the following command to upload to certs into Kubernetes

      Code Block
      kubectl create secret tls <SET A CERT SECRET NAME> --cert /path/to/yourdomain.cer --key /path/to/yourdomain.key
    2. If you are using your own Kubernetes ingress service. The service needs to map the ports as per cortex/k8s/ingress-service.yaml. Make sure certs have been added to your ingress service.

    3. If you are using the KADA ingress services update cortex/k8s/ingress-service.yaml and set the following

      Code Block
      tls:
        - secretName: <SET TO THE CERT SECRET NAME from step i) >
  4. Deploy the Kubernetes config to start the platform

    1. Upload config

      Code Block
      kubectl apply -f postgres/k8s
      kubectl apply -f zookeeper/k8s
      kubectl apply -f solr/k8s
      kubectl apply -f redis/k8s
      kubectl apply -f keycloak/k8s
      kubectl apply -f cerebrum/k8s
      kubectl apply -f cortex/k8s/know-app.yaml
    2. Check environment is up

      Code Block
      kubectl get pods
      
      NAME                                                       READY   STATUS      RESTARTS   AGE
      cerebrum-celery-batch-worker-deployment-6fb6f7b99c-p79xd   1/1     Running     0          6h24m
      cerebrum-celery-scheduler-deployment-7f6d98c74c-6tfp7      1/1     Running     0          6h24m
      cerebrum-celery-worker-deployment-788897cd9f-nmpsj         1/1     Running     0          6h24m
      cerebrum-deployment-5dcd4df76c-s972x                       1/1     Running     0          6h24m
      cortex-deployment-f54885458-llltw                          1/1     Running     0          7h16m
      keycloak-deployment-7d9fc89664-4mq75                       1/1     Running     0          7h12m
      keycloak-postgres-deployment-c4b8bf9b6-7dnvp               1/1     Running     0          7h14m
      postgres-deployment-6dfdd95569-jjjl8                       1/1     Running     0          7h45m
      redis-deployment-5db49b6796-9mqrj                          1/1     Running     0          7h16m
      setup-solr-56nxk                                           0/1     Completed   0          8h
      solr-gatekeeper-deployment-867d67db98-lwvck                1/1     Running     0          8h
      solr-statefulset-0                                         1/1     Running     0          8h
      solr-statefulset-1                                         1/1     Running     0          8h
      zookeeper-statefulset-0                                    1/1     Running     0          7h52m
      zookeeper-statefulset-1                                    1/1     Running     0          7h52m
      zookeeper-statefulset-2                                    1/1     Running     0          7h52m
    3. Deploy ingress-service (if not using your own)

      Code Block
      kubectl apply -f cortex/k8s/ingress-service.yaml
      kubectl apply -f cortex/k8s-ingress-nginx

Docker deployment

  1. Edit the following kada_docker_compose.env and set the following value

    Code Block
    KADA_ROOT_DATA_DIR=
    KADA_DOCKER_NETWORK=
    KADA_URL=
  2. In conf/kada-realm.json replace DOMAIN_URL with your base url of your installation. Eg https://example.com

  3. Generate CA Certs base on the domain name of the host. In conf/ rename and replace the cortex.crt and cortex.key with your generated CA Certificates.

  4. Deploy the environment

    Code Block
    ./kada_docker_compose.sh setup
    
    #Create storage_root_folder folder (default name is kada-data)
    cd ${KADA_ROOT_DATA_DIR}/cerebrum/data
    mkdir kada-data 

KADA Platform Configuration

  1. Platform Settings
    On the bottom left of screen click the GEAR icon. And select Platform Settings.

    Then setup the following properties depending on your deployment setup

    Code Block
    storage_type = [LOCAL, S3, AZURE]
    
    # For Azure blob storage
    root_folder = <s3 bucket name>
    storage_azure_access_key
    storage_azure_storage_account
    
    # For aws S3 configuration
    root_folder = <s3 bucket name>
    storage_aws_region 
    storage_aws_access_key
    storage_aws_secret_access_key
    
    # Set this for LOCAL storage
    storage_local_root_directory
    
    # Set this if connecting to sources directly
    enable_connection_test = true
  2. Integrating sources to KADA
    KADA needs to be configured for each source that you want to integrate. Setup can be configure via the KADA front end. See How to: Onboard a new source

  3. KADA Platform Initial load

    1. Setup the following Platform Setting values for initial load

      Code Block
      celery_batch_task_soft_time_limit = 0
      celery_batch_task_time_limit = 0
      metric_window = 30
    2. KADA provides a built in Batch manager for triggering the loading of sources.

    3. Seehttps://kadaai.atlassian.net/wiki/spaces/KS/pages/675708946/How+to%3A+Onboard+a+new+source#4.-Manually-Triggering-Source-loads

    4. Once the sources have been loaded. Manually trigger the following platform jobs. See https://kadaai.atlassian.net/wiki/spaces/KS/pages/1740931226/KADA+Batch+Manager#Manually-triggering-a-Platform-job
      1. GATHER_METRICS_AND_STATS
      2. POST_PROCESS_QUERIES
      3. DAILY

  4. Schedule sources to load.
    KADA provided a scheduler to periodically load the source you have configured.
    Setup the following Platform Setting value to enable the scheduler to run.

    Code Block
    enable_platform_batch = true

    Each Source can now be scheduled to run. See https://kadaai.atlassian.net/wiki/spaces/KS/pages/675708946/How+to%3A+Onboard+a+new+source#3.-Scheduling-a-Source

Upgrading KADA

KADA generally releases new updates each month. See our Release versions to see what the latest version available is.

To check your version see How to: Check the version of K platform

If a new version is available use the following steps to upgrade

To update your platform perform the following steps. Then follow any manual steps outlined in the release notes.

Kubernetes deployments

Code Block
# Delete the following from the installer package
cerebrum/k8s/cerebrum-auth-credentials.yaml
cerebrum/k8s/cerebrum-fernet-key.yaml
cerebrum/k8s/cerebrum-odbc-ini.yaml
cerebrum/k8s/cerebrum-oic.yaml
keycloak/k8s/keycloak-credentials.yaml
keycloak/k8s/keycloak-ldap-cert.yaml
keycloak/k8s/keycloak-kada-realm.yaml
postgres/k8s/credentials.yaml

# Set KEYCLOAK_FRONTEND_URL value in keycloak/k8s/keycloak.yaml
value: https://example.com/keycloak/auth

# Compare this file with your current version.
# Update the file with any changes
cerebrum/k8s/cerebrum-extract-scripts.yaml

kubectl scale deployment --replicas 0 cerebrum-deployment 
kubectl scale deployment --replicas 0 cerebrum-celery-scheduler-deployment

# Check in Platform Settings > Monitor that no batches are currently running.

kubectl apply -f postgres/k8s
kubectl apply -f zookeeper/k8s
kubectl apply -f solr/k8s
kubectl apply -f redis/k8s
kubectl apply -f keycloak/k8s
kubectl apply -f cerebrum/k8s
kubectl apply -f cortex/k8s/know-app.yaml

Docker deployments

Code Block
# Download new package
# Transfer package over to kada server under /kada

# unzip package
cd /kada
tar -xvf kada_5.5.0_docker_compose.tar.gz
cd /kada/d_pkg_5.5.0

# From the current version directory copy over the following config
cp /kada/d_pkg_5.4.0/conf/cortex.key /kada/d_pkg_5.5.0/conf/cortex.key
cp /kada/d_pkg_5.4.0/conf/cortex.crt /kada/d_pkg_5.5.0/conf/cortex.crt
cp /kada/d_pkg_5.4.0/conf/kada-realm.json /kada/d_pkg_5.5.0/conf/kada-realm.json

# Update these variables in the newer version's kada_docker_compose.env
KADA_ROOT_DATA_DIR=
KADA_DOCKER_NETWORK=
KADA_URL=

# The restart the environment
./kada_docker_compose_setup.sh update yes-i-can-proceed

KADA Integrations

1. Updating Kubernetes Configs for Source Onboarding

Some sources in KADA require additional configuration to establish connectivity. This section details the additional configuration steps per integration source.

Queries to extract from a source may need to be altered for a customer's deployment. These can be edited in the cerebrum-extract-scripts.yaml file. Each extract script is prefixed with the relevant vendor source name.

After editing any of the config yaml files, upload the edited yaml file and restart the cerebrum services for the new configurations to take effect.

Code Block
# Upload edited config
kubectl apply -f path/to/config/file.yaml

# Get service details
kubectl get pods

# Restart cerebrum services. 
# Replace the XXXXXXXXXX-XXXXX with the ID in your environment
kubectl delete pod cerebrum-deployment-5dcd4df76c-s972x-XXXXXXXXXX-XXXXX 
kubectl delete pod cerebrum-celery-batch-worker-deployment-XXXXXXXXXX-XXXXX 
kubectl delete pod cerebrum-celery-worker-deployment-XXXXXXXXXX-XXXXX 

1.1. Teradata

Uses ODBC so an update is required to the cerebrum-odbc-ini.yaml file

Code Block
apiVersion: v1
data:
  odbc.ini: |-
    [Kada Teradata Extractor]
    Description = Teradata Connection
    Driver      = /opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
    DBCName     = ec2-54-66-46-3.ap-southeast-2.compute.amazonaws.com

    [Kada SQLServer Extractor]
    Description = SQLServer Connection
    Driver      = FreeTDS
    Server      = sql1
    Port        = 1433
    TDS_Version = 8.0
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: cerebrum-odbc-ini

Update the DBCName to the server for your Teradata, if there are multiple sources, then create a new DSN Entry for each one making sure to use the same format as in the Kada Teradata Extractor populate example. Do not change the Driver path.

Permissions:

  1. Read access to the DBC and PDCRINFO schemas.

  2. Specifically these tables:

    1. PDCRINFO.DBQLogTbl Otherwise DBC.DBQLogTbl

    2. PDCRINFO.DBQLSqlTbl Otherwise DBC.DBQLSqlTbl

    3. DBC.TABLESV

    4. DBC.TABLESV

    5. DBC.INDICESV

    6. DBC.ALL_RI_CHILDRENV

1.2. SQLServer 2012+

Uses ODBC so an update is required to the cerebrum-odbc-ini.yaml file

Code Block
apiVersion: v1
data:
  odbc.ini: |-
    [Kada Teradata Extractor]
    Description = Teradata Connection
    Driver      = /opt/teradata/client/ODBC_64/lib/tdataodbc_sb64.so
    DBCName     = <TERADATA SERVER DSN>

    [Kada SQLServer Extractor]
    Description = SQLServer Connection
    Driver      = FreeTDS
    Server      = <UPDATE WITH YOUR SERVER DNS>
    Port        = 1433
    TDS_Version = 8.0
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: cerebrum-odbc-ini

Update the Server, Port according to your SQLServer, if there are multiple sources, then create a new DSN Entry for each one making sure to use the same format as in the Kada SQLServer Extractor populate example. Do not change the Driver or TDS_Version values.

Permissions

  1. Read access to the information_schema per database

  2. Permission to create extended events

  3. Permission to read extended events log file.

Log Capture

SqlServer Extended events need to be setup to capture query log data.

Here is a template for KADA’s extended events. Note that this will require some tuning depending on how much activity and the types of queries occurring in your SQLServer environment.

Code Block
--Query To Create Extended Events Session
CREATE EVENT SESSION [KADA] ON SERVER ADD EVENT sqlserver.sp_statement_completed (
	ACTION(package0.collect_system_time, package0.event_sequence, sqlos.task_time, sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_id, sqlserver.database_name, sqlserver.nt_username, sqlserver.query_hash, sqlserver.server_instance_name, sqlserver.server_principal_name, sqlserver.server_principal_sid, sqlserver.session_id, sqlserver.session_nt_username, sqlserver.transaction_id, sqlserver.username) WHERE (
		(
			[statement] LIKE '%CREATE %'
			OR [statement] LIKE '%DROP %'
			OR [statement] LIKE '%MERGE %'
			OR [statement] LIKE '%FROM %'
			)
		AND [sqlserver].[server_principal_name] <> N'USERS_TO_EXCLUDE'
		AND [sqlserver].[is_system] = (0)
		AND NOT [statement] LIKE 'Insert into % Values %'
		AND [sqlserver].[Query_hash] <> (0)
		)
	), ADD EVENT sqlserver.sql_statement_completed (
	SET collect_statement = (1) ACTION(package0.collect_system_time, package0.event_sequence, sqlos.task_time, sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_id, sqlserver.database_name, sqlserver.nt_username, sqlserver.query_hash, sqlserver.server_instance_name, sqlserver.server_principal_name, sqlserver.session_id, sqlserver.session_nt_username, sqlserver.transaction_id, sqlserver.username) WHERE (
		(
			[statement] LIKE '%CREATE %'
			OR [statement] LIKE '%DROP %'
			OR [statement] LIKE '%MERGE %'
			OR [statement] LIKE '%FROM %'
			)
		AND [sqlserver].[server_principal_name] <> N'N'USERS_TO_EXCLUDE'
		AND [sqlserver].[is_system] = (0)
		AND NOT [statement] LIKE 'Insert into % Values %'
		AND [sqlserver].[Query_hash] <> (0)
		)
	) ADD TARGET package0.event_file (SET filename = N'G:\extended events\Extendedevents.xel', max_file_size = (20), max_rollover_files = (100))
	WITH (MAX_MEMORY = 4096 KB, EVENT_RETENTION_MODE = ALLOW_MULTIPLE_EVENT_LOSS, MAX_DISPATCH_LATENCY = 30 SECONDS, MAX_EVENT_SIZE = 0 KB, MEMORY_PARTITION_MODE = NONE, TRACK_CAUSALITY = ON, STARTUP_STATE = ON)
GO

-- Check if the session is dropping events and see other data about the session
-- https://sqlperformance.com/2019/10/extended-events/understanding-event-loss-with-extended-events
SELECT
   s.name, 
   s.total_regular_buffers,
   s.regular_buffer_size,
   s.total_large_buffers,
   s.large_buffer_size,
   s.dropped_event_count,
   s.dropped_buffer_count,
   s.largest_event_dropped_size
FROM sys.dm_xe_sessions AS s;

-- Also check log growth rate. Apply filters to remove noise.
-- some filters:
-- [sqlserver].[server_principal_name] = N'name of principal'
-- [sqlserver].[is_system] = (0)
-- [sqlserver].[client_app_name] = N'name of app'

-- NB grouping by XML is not supported in SQLServer. Need to use an external tool for this.

1.3. Oracle Cloud and Oracle Analytics

Required an oracle wallet and the following. items to be updated in the cerebrum-oic.yaml file.

  1. cwallet.sso → Binary Text

  2. ewallet.p12 → Binary Text

  3. tnsnames.ora → Text

NB: sqlnet.ora if updated must have DIRECTORY="/etc/oracle_config"

Code Block
apiVersion: v1
binaryData:
  cwallet.sso: <USERs CWALLET>
  ewallet.p12: <USER EWALLET>
data:
  sqlnet.ora: |-
    WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/etc/oracle_config")))
    SSL_SERVER_DN_MATCH=yes
  tnsnames.ora: ""
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: cerebrum-oic

To generate the binary or the text replacement, simply run to get the output in the console

Code Block
kubectl create configmap test --from-file=<path to file> -o yaml --dry-run

You can use the output to replace the specific data/binaryData in each section of the cerebrum-oci.yaml file.

Alternatively if you have all the files, add each file as a --from-file argument to generate the whole config file again

Code Block
kubectl create configmap cerebrum-oic --from-file=../image/conf/cwallet.sso --from-file=../image/conf/ewallet.p12 --from-file=../image/conf/sqlnet.ora --from-file=../image/conf/tnsnames.ora  -o yaml --dry-run > cerebrum-oic.yaml

Permissions

  1. Read access to the SYS.DBA_* tables

  2. Specifically the following tables:

    1. SYS.DBA_PROCEDURES

    2. SYS.DBA_VIEWS

    3. SYS_DBA_MVIEWS

    4. SYS.DBA_CONSTRAINTS

    5. SYS.DBA_CONS_COLUMNS

    6. SYS.DBA_TAB_COLUMNS

    7. SYS.dba_hist_active_sess_history

    8. SYS.dba_hist_snapshot

    9. SYS.dba_users

    10. SYS.dba_hist_sqltext

1.4. Snowflake

No additional configuration is required. Snowflake uses a python native driver.

Permissions

  1. Read access to the SNOWFLAKE.ACCOUNT_USAGE schema

  2. User must have access to role: ACCOUNTADMIN

    1. Alternatively grant to other role. https://docs.snowflake.com/en/sql-reference/account-usage.html#enabling-account-usage-for-other-roles

1.5. Tableau

Permissions

  1. Read user to workgroup Tableau postgres database

  2. Tableau enabled for Metadata API

    1. https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_start.html#enable-the-tableau-metadata-api-for-tableau-server

  3. Tableau Server enabled for respository access

    1. https://help.tableau.com/current/server/en-us/perf_collect_server_repo.htm

  4. Create a Tableau Server with the following access roles: Site Administrator Creator or Server Administrator

1.6 Informatica (coming soon)

Permissions

Read access to Informatica all repository tables.

1.7 DBT (coming soon)

Permissions

Read access a location that contains the manifest.json and catalogue.json file for each dbt project.