Overview
Introduction to K
K is a Data Knowledge platform for discovering, profiling and understanding how data products (data sets, analysis, reports etc) across an Enterprise is used.
K focuses on identifying and storing how users work with data; leveraging this information to enable data producers to improve their products; data owners to take accountability for the proper use of their data; and to scale hidden knowledge to all data workers. The product vision is to become the central platform for all Enterprise data users to easily discover, understand and use data .
K Architecture
Services
Component | Description |
Ingestion | The service is used for loading metadata and logs from data sources and tools. |
Profiler | The service is used to identify and profile data assets and their usage. A set of proprietary algorithms are used to automatically match and analyse data assets over their lifecycle. |
Usage | The service is used to monitor and track data assets over time. |
Identity | The service is used to integrate with the Enterprise Identity Management service to provide single sign on. |
Search | The service provides fast, accurate and contextual search for all assets within K. |
Applications | The service is used to access dedicated applications built to solve specific data problems. E.g. migration assessment, impact assessment etc. |
Inventory | The service manages the hierarchical structure for all assets within. |
Scheduler | The service manages the integration and scheduling of ingestion of metadata and logs into K. |
Interfaces
Component | Description |
API | This interface is used by applications and services to interact and access data managed by K. |
Web Portal | This interface is used by end users (e.g. Data managers, analysts etc) to access K and its services. |
Chrome Extension | This interface is used to connect web-based data tools to K to enable inline data profiling and search. |
Notifications | This interface is used to engage with end users via push notifications e.g. Email. |
Stores
Component | Description |
Metastore | The metastore is used to store the details and relationships between data assets, reports, users, teams and other objects within the data ecosystem. |
Timeseries | The timeseries is used to store each data asset, person or content item and its lifecycle over time. |
Index | Each object in the data ecosystem is added to a search index to enable the contextual search service. |
Inputs
Component | Description |
Data Sources | Data sources (e.g. Teradata, Hadoop, Snowflake, SQL Server etc.) where data is stored and used by the Enterprise data teams. K has integrators for many on-premise and cloud data sources and can also ingest custom data sources through the K ingestion framework. |
Data Tools | Reporting and Analytics applications (e.g. Tableau, Power BI etc.) used by the Enterprise data teams to create, manage and distribute content. K has integrators for common data tools and can also ingest custom data tools through the K ingestion framework. |
Identity / SSO | Identity provider and user management sources (e.g. LDAP, SAML, OpenID Connect) that can provide single sign on and user and team data. |
Deploying into the Enterprise
Kubernetes
K is deployed using Kubernetes on infrastructure that is managed by the Client. This can be on premise or in the Client’s cloud. A SaaS offering is also available.
Deploying on Kubernetes
Typical Kubernetes services used to deploy K include OpenShift, AWS’s Elastic Kubernetes Service (EKS), Azure Kubernetes Services (AKS) and Google Kubernetes Engine (GKE). The following diagram outlines how K is deployed in a typical Enterprise environment.
Kubernetes Service
Components | Details |
Nodes | K is deployed across 3 nodes. Each node requires a minimum of 4 vCPU and 16gb Memory.
Common deployment options: AWS Elastic Kubernetes Services (EKS) - m5xlarge Azure Kubernetes Services (AKS) - D4as_v4 Openshift OCP v3/v4.
|
Image Registry | The Client Image Registry connects to the KADA repository hosted externally (internet access required) to deploy and update the K platform. |
File Storage | A location for landing files from data sources and data tools before processing by K. This location must be accessible by the Kubernetes Service |
Other Components
Components | Details |
KADA Repository | KADA provides a repository for clients to quickly and easily download the K product and updates. |
Considerations
There a several considerations that should be checked prior to setting up K on your Kubernetes environment.
Considerations | Details |
Policies | Kubernetes service must have access to the Object Store. In the case where the Kubernetes service is using a Cloud Provider’s managed service (e.g. AWS, GCP, Azure) this may require cloud policies to be created to enable the service with the right read/write permissions. Please consult your Cloud Provider’s documentation |
Internet Access | The K platform does NOT need internet access. The Kubernetes service needs to have internet access to download the K images from the KADA repository. |