Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

AWS setup

The KADA Athena extractor requires a User with the following IAM policy OR a User that can assume a role with equivalent IAM policy.

The IAM policy for the KADA Athena extractor requires the following:

  1. Permissions to execute queries against Athena INFORMATION_SCHEMA. In particular the following tables:

    1. information_schema.views

    2. information_schema.tables

    3. information_schema.columns

  2. Executing queries in Athena requires an s3 bucket to temporary store results.
    The policy must also allow Read Write Listing access to objects to the bucket, conversely, the bucket must also have policy to allow to do the same.

  3. Permission to call the following Athena APIs

    1. list_databases

    2. list_table_metadata

    3. list_query_executions

    4. list_work_groups

    5. batch_get_query_executions

    6. start_query_execution

    7. get_query_execution

  4. The IAM policy will need permissions to access all Athena workgroups to be able to extract query logs data. Without access to the workgroups KADA can’t track user usage.

    1. See https://docs.aws.amazon.com/athena/latest/ug/workgroups-iam-policy.html on how to add policy entries to have fine grain control at the workgroup level. Note that the extractor runs queries on Athena, If you do choose to restrict workgroup access, ensure that Query based actions (e.g. StartQueryExecution) are allowed for the workgroup the service user/account/role is associated to.

Athena reports usage at the workgroup. This means usage can not be attributed to the individual user that executed the query. Athena usage will be reported against each work group in the format “athena_workgroup_<name>”

Example Role Policy to allow Athena Access with least privileges for actions

AWSTemplateFormatVersion: "2010-09-09"
Description: 'AWS IAM Role - Athena and Cloudtrail Access to KADA'
Resources: 
  KadaAthenaRole: 
    Type: "AWS::IAM::Role"
    Properties: 
      RoleName: "KadaAthenaRole"
      MaxSessionDuration: 43200
      Path: "/"

  KadaAthenaPolicy: 
    Type: 'AWS::IAM::Policy'
    Properties:
      PolicyName: root
      PolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Action: 
              - athena:BatchGetQueryExecution
              - athena:GetQueryExecution
              - athena:GetQueryResults
              - athena:GetQueryResultsStream
              - athena:ListQueryExecutions
              - athena:StartQueryExecution
              - athena:ListWorkGroups
              - athena:ListDataCatalogs
              - athena:ListDatabases
              - athena:ListTableMetadata
            Resource: '*'
          - Effect: Allow
            Action: 
              - s3:GetBucketLocation
              - s3:GetObject
              - s3:ListBucket
              - s3:ListBucketMultipartUploads
              - s3:ListMultipartUploadParts
              - s3:AbortMultipartUpload
              - s3:PutObject
              - s3:PutBucketPublicAccessBlock
              - s3:DeleteObject
            Resource:
              - arn:aws:s3:::[ATHENA RESULTS BUCKET NAME]
      Roles:
        - !Ref KadaAthenaRole

Running the extractor

Setup a python environment Python 3.8+

KADA Support will provide the extractor on request.

Run the extractor from the kada-athena-extractor directory

To run you will need to the following inputs

  1. Create an Athena Source in K. The host name is an input for the Athena extractor.

  2. An AWS User access key and secret.

  3. Optionally if using assume role. ARN of a role to assume.

  4. List of catalogs to extract from Athena. If not provided default is AwsDataCatalog

pipenv run python kada-athena-extractor.py -k <aws user key> -s <aws user seccret> -hn <k host name for athena> -b <s3 temp results location for athena> -ro <optional role to assume if needed> -c <comma seperated list of catalogs to extract, default is just AwsDataCatalog>

Full Populated Example

pipenv run python kada-athena-extractor.py -k mykey -s mysecret -hn athena -b s3://kada-athena-temp-results -ro arn:aws:iam::xxxxx:role/myrole -c AwsDataCatalog,CustomCatalog

  • No labels