Overview

Introduction to K

K is a Data Knowledge platform for discovering, profiling and understanding how data products (data sets, analysis, reports etc) across an Enterprise is used.

K focuses on identifying and storing how users work with data; leveraging this information to enable data producers to improve their products; data owners to take accountability for the proper use of their data; and to scale hidden knowledge to all data workers. The product vision is to become the central platform for all Enterprise data users to easily discover, understand and govern the use data of data.

K Architecture

...

Services

UsageInventory

Component	Description
IngestionExtractors	The service is used for connecting to, extracting and loading metadata and logs from data sources and tools. The extractors can also be deployed as a collector service for on-premise sources when using the K SaaS offering if access to between the on-premise source and the SaaS offering is not available.
Profiler	The service is used to identify and profile data assets and their usage. A set of proprietary algorithms are used to automatically match and analyse data assets over their lifecycle.	The service is used to monitor and track data assets over time.
Identity	The service is used to integrate with the Enterprise Identity Management service to provide single sign on.
Search	The service provides fast, accurate and contextual search for all assets within K.
Applications	The service is used to access dedicated applications built to solve specific data problems. E.g. migration assessment, impact assessment etc.	The service manages the hierarchical structure for all assets within.
Scheduler	The service manages the integration and scheduling of ingestion of metadata and logs into K.

Interfaces

Component	Description
API	This interface is used by applications and services to interact and access data managed by K.
Web Portal	This interface is used by end users (e.g. Data managers, analysts etc) to access K and its services.
Chrome Extension	This interface is used to connect web-based data tools to K to enable inline data profiling and search.
Notifications	This interface is used to engage with end users via push notifications e.g. Email.

...

Stores

Component	Description
MetastoreMetadata	The metastore metadata store is used to store the details and relationships between data assets, reports, users, teams and other objects within the data ecosystem.
Timeseries	The timeseries is used to store each data asset, person or content item and its lifecycle over time.
Index	Each object in the data ecosystem is added to a search index to enable the contextual search service.

...

Component	Description
Data Sources	Data sources (e.g. Teradata, Hadoop, Snowflake, SQL Server etc.) where data is stored and used by the Enterprise data teams. K has integrators for many on-premise and cloud data sources and can also ingest custom data sources through the K ingestion framework.
Data Tools	Reporting and Analytics applications (e.g. Tableau, Power BI etc.) used by the Enterprise data teams to create, manage and distribute content. K has integrators for common data tools and can also ingest custom data tools through the K ingestion framework.
Identity / SSO	Identity provider and user management sources (e.g. LDAP, SAML, OpenID Connect) that can provide single sign on and user and team data.

...

Deploying into the Enterprise

Kubernetes

K is deployed using Kubernetes on infrastructure that is managed by the Client. This can be on premise or in the Client’s cloud. A SaaS offering is also available.

...

There are 2 options for deploying the K platform; on your cloud or using the KADA hosted platform. The following document covers deploying the K platform in your cloud. Please contact us for more details about the SaaS option.

Your cloud

K is deployed in your cloud using a Kubernetes service.

Typical Kubernetes services used to deploy K include OpenShift, AWS’s Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS) and Google Kubernetes Engine (GKE). The following diagram outlines how K is deployed in a typical Enterprise environment.

Image RemovedImage Added

Kubernetes Service

Components	Details
Nodes	K is deployed across 3 nodesa number of nodes (minimum of 4 nodes). Each node requires a minimum of 4 vCPU and 16gb Memory. Contact the KADA team to work through the right sizing for your data ecosystem Common deployment optionsExample specifications for cloud services include: AWS Elastic Kubernetes Services (EKS) - m5xlarge Azure Kubernetes Services (AKS) - D4as_v4 Openshift OCP v3/v4. Non supported point versions will need to be reviewed but unlikely to cause any issues
Image Registry	The Client Image Registry connects to the KADA repository hosted externally (internet access required) to deploy and update the K platform.
File Storage	Object store	A location for landing files from data sources and data tools before processing by K. This location must be accessible by the Kubernetes Service The typical size for the Object store is 200Gb but may need to expand depending on your data retention needs.

Other Components

Components

Details

KADA RepositoryKADA

provides a repository for clients Your Image Registry connects to the KADA repository hosted externally (internet access required) to deploy and update the K platform.

This approach enables you to quickly and easily download the K product and updates.

Considerations

There a several considerations that should be checked prior to setting up K on your Kubernetes environment.

...

Version	Old Version 8	New Version Current
Changes made by	Dean Nguyen	Chichi Lo (Unlicensed)
Saved on	Sept 10, 2021	Mar 12, 2022

Versions Compared

Key

Overview

Introduction to K

K Architecture

Deploying into the Enterprise

Kubernetes

Your cloud

Considerations

Content Comparison

Versions Compared

Key

Overview

Introduction to K

K Architecture

Deploying into the Enterprise

Kubernetes

Your cloud

Considerations