Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Scroll ignore
scroll-viewporttrue
scroll-pdftrue
scroll-officetrue
scroll-chmtrue
scroll-docbooktrue
scroll-eclipsehelptrue
scroll-htmltrue
scroll-epubtrue

Open in new tab

About Collectors

Insert excerpt
Collector Method
Collector Method
nameabout

...

Setting up Azure Synapse for metadata extraction using a service principal (Application).

Step 1: Registing an Entra App for KADA Azure Synapse Collector Application

Create an Entra Application for the kada synapse collector or reusing an existing kada application in Entra. TBD (Same steps as setup for power bi / AD integration. https://docs.kada.ai/home/powerbi#PowerBI-Step2)RegisteringPowerBIAppinAzureAD

Generate a secret for the Entra Application and note down the application id, tenant id.

Step 2: Establish Azure Synapse Access

Note

Apply in MASTER using an Azure Synapse Admin user

Code Block
CREATE USER [<SERVICE_PRINCIPAL_<ENTRA APPLICATION NAME>] FROM EXTERNAL PROVIDER; 
Note

Apply per database in scope for metadata collection.

Code Block
CREATE USER [<SERVICE_PRINCIPAL_<ENTRA APPLICATION NAME>] FROM EXTERNAL PROVIDER;
GRANT VIEW DEFINITION TO [<SERVICE_PRINCIPAL_<ENTRA APPLICATION NAME>];
GRANT VIEW DATABASE STATE TO [<SERVICE_PRINCIPAL_<ENTRA APPLICATION NAME>];

The following table should also be available to SELECT by the user created in each database

  • INFORMATION_SCHEMA.ROUTINES

  • INFORMATION_SCHEMA.VIEWS

  • INFORMATION_SCHEMA.TABLE_CONSTRAINTS

  • INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE

  • INFORMATION_SCHEMA.TABLES

  • INFORMATION_SCHEMA.COLUMNS

  • INFORMATION_SCHEMA.VIEWS

  • sys.foreign_key_columns

  • sys.objects

  • sys.tables

  • sys.schemas

  • sys.columns

  • sys.databases

Note

Synapse has the concept of a serverless and dedicated sql pool per workspace.

If you use both serverless and dedicated sql pools in a workspace you will need to onboard each as a separate source in K.

For serverless sql pools the master database can’t be selected for metadata extraction.

To onboard multiple Synapse workspace each workspace will need to be onboarded as a new source in K.

...

Step 1: Create the Source in K

...

FIELD

FIELD TYPE

DESCRIPTION

EXAMPLE

client

string

Onboarded client in Azure to access Azure Synapse

 

secret

string

Onboarded client secret in Azure to access Azure Synapse

 

tenant

string

Tenant ID of where Azure Synapse exists

server

string

Azure Synapse server.

If using a custom port append with comma Example: “tcp:<workspace“<workspace-name>.sql.azuresynapse.net,1433”

“tcp:<workspace“<workspace-name>.sql.azuresynapse.net,1433”

host

string

The onboarded host value in K, generally this would be the same as the server value, depending on what you onboard it as.

“<workspace-name>.sql.azuresynapse.net,1433”

database_name

string

The name of the database that will be used to test the connection

master

Note for serverlesss. sqlpools master is not accessible.

databases

list<string>

A list of databases to extract from SQLServer Azure

[“dwh”, “adw”]

driver

string

This is the ODBC driver, generally its ODBC Driver 17 for SQL Server, if you another driver installed please use that instead.

“ODBC Driver 17 for SQL Server”

meta_only

boolean

Do you want to extract metadata only without enabling extended events? We currently only support true

true

output_path

string

Absolute path to the output location where files are to be written

“/tmp/output”

mask

boolean

To enable masking or not

true

compress

boolean

To gzip the output or not

true

connection_timeout

integer

Timeout in seconds allowed against Synapse Sql Pool connection, this is defaulted as 30

30

...

Code Block
languagejson
{
    "client": "",
    "secret": "",
    "tenant": "",
    "server": "",
    "host": "",
    "driver": "ODBC Driver 17 for SQL Server",
    "database_name": "master",
    "databases": [""],
    "output_path": "/tmp/output",
    "mask": true,
    "compress": true,
    "meta_only": true,
    "connection_timeout": 30
}

...