Dataset upstream source extract
There may be a time when you need to see all the upstream objects for datasets in K.
A current workaround is to follow the below instructions to generate an extract
Extract details
Columns | Description | Example |
---|---|---|
name | Name of the dataset | Customer model |
object_type | Type of the dataset | Dataset |
object_id | ID of the dataset | 753f5c32-5cf3-3af6-a499-364d749344e5 |
source_name | Name of the dataset source | Power BI |
upstream_name | Name of the upstream object | dim_customer |
upstream_object_type | Type of the upstream object | Table |
upstream_object_signature | Fully qualified location fo the upstream object | source.database.schema.table |
upstream_id | ID of the upstream object | 8a05f40a-0e6a-3ab1-93eb-9a6db10e0601 |
upstream_source_name | Name of the upstream source | Snowflake |
Instructions to produce the extract
# CONNECT to the postgres pod and start a psql session
kubectl exec -it postgres-statefulset-0 -- psql -U postgres -d cerebrum
# Run this query
COPY (
SELECT DISTINCT
ds.name AS name,
dsr.name AS object_type,
ds.id AS object_id,
dss.name AS source_name,
r.name AS upstream_name,
rr.name AS upstream_object_type,
r.signature AS upstream_object_signature,
r.id AS upstream_id,
rs.name AS upstream_source_name
FROM node ds
INNER JOIN node_ref dsr ON dsr.id = ds.node_ref_id
INNER JOIN source dss ON dss.id = ds.source_id
INNER JOIN edge ON edge.source_node_id = ds.id AND edge.source_node_ref_id = ds.node_ref_id
INNER JOIN node r ON r.id = edge.target_node_id AND r.node_ref_id = edge.target_node_ref_id
INNER JOIN source rs ON rs.id = r.source_id
INNER JOIN node_ref rr ON rr.id = r.node_ref_id
WHERE ds.node_ref_id IN (15,27)
AND edge.edge_ref_id IN (6,26,32)
AND edge.source_node_ref_id IN (15,27)
AND edge.target_node_ref_id IN (3,4)
AND r.node_ref_id IN (3,4)
ORDER BY source_name, object_type, name, upstream_source_name, upstream_object_type, upstream_object_signature
) TO '/tmp/extract.csv' DELIMITER ',' CSV HEADER;
# Exist out of postgres pod.
# Use kubectl to copy the csv out of the pod.
kubectl cp postgres-statefulset-0:/tmp/extract.csv extract_table_size.csv