...
Deploying and orchestrating the extract code
Managing a high water mark so the extract only pull the latest metadata
Storing and pushing the extracts to your K instance.
...
Pre-requisites
...
Collector Server Minimum Requirements
Insert excerpt | ||||||||
---|---|---|---|---|---|---|---|---|
|
SSIS Requirements
MSDB database / SQLServer DB access
if using SSISDB to store package you will need to download each project via SQL Server Management Studio. See https://kadaai.atlassian.net/wiki/spaces/KSL/pages/1960706049/SSIS+Collector+method+-+v3.1.0#Downloading-SSISDB-packages
The collector will need access to the underlying SQLServer Database with permissions to read the following tables is the SSIS main databases:
MSDB.DBO.SYSSSISPACKAGES
<SSIS Logging Database>.DBO.SYSSSISLOG where <SSIS Logging Database> is the database configured for SSIS logging
Check your SSIS instance port
Run the following query and note the local tcp port.
Code Block SELECT local_tcp_port FROM sys.dm_exec_connections WHERE session_id = @@SPID GO
Downloading SSISDB packages
In SQL Server Studio Manager Go to Integration Services Catalog > Projects.
Right click on a Project > Export…
Save project file
Change the file extraction from
.ispac
to.zip
Repeat for each project.
...
The collector requires a set of parameters to connect to and extract metadata from SSIS.
FIELD | FIELD TYPE | DESCRIPTION | EXAMPLE | ||
---|---|---|---|---|---|
server | string | SQLServer server host Note if the default port is not used append the port to the server name. Example
| “10.1.18.19” | ||
username | string | Username to log into the SQLServer account | “myuser” | ||
password | string | Password to log into the SQLServer account |
| ||
logging_database | string | Database where the SSIS Logging has been setup | “ssis_logging” | ||
mapping | JSON | Mapping file of data source names against the onboarded host and database name in K | Assuming I have a “myDSN” data source name in powerbi, I’ll map it to host “myhost” and database “mydatabase” onboarded in K, snowflake type references are handled automatically
| ||
driver | string | This is the ODBC driver, generally its ODBC Driver 17 for SQL Server, if you another driver installed please use that instead | “ODBC Driver 17 for SQL Server” | ||
output_path | string | Absolute path to the output location where files are to be written | “/tmp/output” | ||
input_path | string | Absolute path to the input folder where SSISDB project zips are extract to. See https://kadaai.atlassian.net/wiki/spaces/KSL/pages/1960706049/SSIS+Collector+method+-+v3.1.0#Downloading-SSISDB-packages Note that within this folder there should be seperate folders for each project and the .manifest/.dtsx/.conmgr (conmgr may or may nor be applicable) files should sit within this folder, for e.g.
| “/tmp/input” | ||
mask | boolean | To enable masking or not | true | ||
compress | boolean | To gzip the output or not | true | ||
uses_ssissdb | boolean | Are packages deployed to ssisdb or msdb? | false | ||
meta_only | boolean | Do you wish to extract metadata only and don’t want to set up logging for SSIS packages? | false |
These parameters can be added directly into the run or you can use pass the parameters in via a JSON file. The following is an example you can use that is included in the example run code below.
...
Code Block | ||
---|---|---|
| ||
import os import argparse from kada_collectors.extractors.utils import load_config, get_hwm, publish_hwm, get_generic_logger from kada_collectors.extractors.ssis import Extractor get_generic_logger('root') # Set to use the root logger, you can change the context accordingly or define your own logger _type = 'ssis' dirname = os.path.dirname(__file__) filename = os.path.join(dirname, 'kada_{}_extractor_config.json'.format(_type)) parser = argparse.ArgumentParser(description='KADA SSIS Extractor.') parser.add_argument('--config', '-c', dest='config', default=filename, help='Location of the configuration json, default is the config json in the same directory as the script.') parser.add_argument('--name', '-n', dest='name', default=_type, help='Name of the collector instance.') args = parser.parse_args() start_hwm, end_hwm = get_hwm(_typeargs.name) ext = Extractor(**load_config(args.config)) ext.test_connection() ext.run(**{"start_hwm": start_hwm, "end_hwm": end_hwm}) publish_hwm(_type, end_hwm) |
...