Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

K Platform 5.6.0

Technical Architecture Document

...

K is a Data Product Analytics platform for capturing, profiling and discovering how data products (data sets, analysis, reports etc) across an Enterprise is used. K focuses on identifying and storing how users work with data; leveraging this information to enable data producers to improve their products; data owners to take accountability for the proper use of their data; and to scale hidden knowledge to all data workers. The product vision is to become the central platform for all Enterprise data users to easily discover, understand and use data[BC1] .

 

K Architecture

...

Services

...

Component

...

Description

...

Ingestion

...

The service is used for loading metadata and logs from data sources and tools.

...

Profiler

...

The service is used to identify and profile data assets and their usage. A set of proprietary algorithms are used to automatically match and analyse data assets over their lifecycle.

...

Usage

...

The service is used to monitor and track data assets over time.

...

Identity

...

The service is used to integrate with the Enterprise Identity Management service to provide single sign on.

...

Search

...

The service provides fast, accurate and contextual search for all assets within K.

...

Applications

...

The service is used to access dedicated applications built to solve specific data problems. E.g. migration assessment, impact assessment etc.

...

Inventory

...

The service manages the hierarchical structure for all assets within.

...

Scheduler

...

The service manages the integration and scheduling of ingestion of metadata and logs into K.

Interfaces

...

Component

...

Description

...

API

...

This interface is used by applications and services to interact and access data managed by K.

...

Web Portal

...

This interface is used by end users (e.g. Data managers, analysts etc) to access K and its services.

...

Chrome Extension

...

This interface is used to connect web-based data tools to K to enable inline data profiling and search.

...

Notifications

...

This interface is used to engage with end users via push notifications e.g. Email.

 

Stores

...

Component

...

Description

...

Metastore

...

The metastore is used to store the details and relationships between data assets, reports, users, teams and other objects within the data ecosystem.

...

Timeseries

...

The timeseries is used to store each data asset, person or content item and its lifecycle over time.

...

Index

...

Each object in the data ecosystem is added to a search index to enable the contextual search service.

 

Inputs

...

Component

...

Description

...

Data Sources

...

Data sources (e.g. Teradata, Hadoop, Snowflake, SQL Server etc.) where data is stored and used by the Enterprise data teams. K has integrators for many on-premise and cloud data sources and can also ingest custom data sources through the K ingestion framework.

...

Data Tools

...

Reporting and Analytics applications (e.g. Tableau, Power BI etc.) used by the Enterprise data teams to create, manage and distribute content[BC1] . K has integrators for common data tools and can also ingest custom data tools through the K ingestion framework.

...

...

Identity provider and user management sources (e.g. LDAP, SAML, OpenID Connect) that can provide single sign on and user and team data.[BC2] 

Deploying into the Enterprise

 

Kubernetes

K is deployed using Kubernetes on infrastructure that is managed by the Client. This can be on premise or in the Client’s cloud. A SaaS offering is also available.

 

Deploying on Kubernetes

Typical Kubernetes services used to deploy K include OpenShift, AWS’s Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS) and Google Kubernetes Engine (GKE). The following diagram outlines how K is deployed in a typical Enterprise environment.

Image RemovedImage Removed

K Platform 5.6.0

Technical Architecture Document

Last updated: 11/05/2021

Overview

 

Introduction to K

K is a Data Product Analytics platform for capturing, profiling and discovering how data products (data sets, analysis, reports etc) across an Enterprise is used. K focuses on identifying and storing how users work with data; leveraging this information to enable data producers to improve their products; data owners to take accountability for the proper use of their data; and to scale hidden knowledge to all data workers. The product vision is to become the central platform for all Enterprise data users to easily discover, understand and use data[BC1] .

 

K Architecture

...

Services

Component

Description

Ingestion

The service is used for loading metadata and logs from data sources and tools.

Profiler

The service is used to identify and profile data assets and their usage. A set of proprietary algorithms are used to automatically match and analyse data assets over their lifecycle.

Usage

The service is used to monitor and track data assets over time.

Identity

The service is used to integrate with the Enterprise Identity Management service to provide single sign on.

Search

The service provides fast, accurate and contextual search for all assets within K.

Applications

The service is used to access dedicated applications built to solve specific data problems. E.g. migration assessment, impact assessment etc.

Inventory

The service manages the hierarchical structure for all assets within.

Scheduler

The service manages the integration and scheduling of ingestion of metadata and logs into K.

...

Component

Description

Data Sources

Data sources (e.g. Teradata, Hadoop, Snowflake, SQL Server etc.) where data is stored and used by the Enterprise data teams. K has integrators for many on-premise and cloud data sources and can also ingest custom data sources through the K ingestion framework.

Data Tools

Reporting and Analytics applications (e.g. Tableau, Power BI etc.) used by the Enterprise data teams to create, manage and distribute content[BC1] . K has integrators for common data tools and can also ingest custom data tools through the K ingestion framework.

Identity / SSO

Identity provider and user management sources (e.g. LDAP, SAML, OpenID Connect) that can provide single sign on and user and team data.[BC2] 

...

Deploying into the Enterprise

 

Kubernetes

K is deployed using Kubernetes on infrastructure that is managed by the Client. This can be on premise or in the Client’s cloud. A SaaS offering is also available.

 

Deploying on Kubernetes

Typical Kubernetes services used to deploy K include OpenShift, AWS’s Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS) and Google Kubernetes Engine (GKE). The following diagram outlines how K is deployed in a typical Enterprise environment.

...

Components

Details

Nodes

K is deployed across 3 nodes. Each node requires a minimum of 4 vCPU and 16gb Memory.

 

Common deployment options:

AWS Elastic Kubernetes Services (EKS) - m5xlarge

Azure Kubernetes Services (AKS) - D4as_v4

Openshift OCP v3.11/v4. 6. Older versions may be supported depending on the available Kubernetes version.

  • Non supported point versions will need to be reviewed but unlikely to cause any issues

Image Registry

The Client Image Registry connects to the KADA repository hosted externally (internet access required) to deploy and update the K platform.

File Storage

A location for landing files from data sources and data tools before processing by K. This location must be accessible by the Kubernetes Service

...