AWS setup
Info |
---|
The KADA Athena extractor requires a User with the following IAM policy OR a User that can assume a role with equivalent IAM policy. |
Note |
---|
The Athena integration is currently in beta. Some changes may occur in the near future. Please check with the KADA team prior to using this. |
The IAM policy for the KADA Athena extractor requires the following:
Permissions to execute queries against Athena
INFORMATION_SCHEMA
. In particular the following tables:information_schema.views
information_schema.tables
information_schema.columns
Executing queries in Athena requires an s3 bucket to temporary store results.
The IAM policy must also allow Read Write Listing access to objects to the bucket , conversely, the bucket must also have policy to allow to do the sameand the s3 Bucket policy will require the same permissions for the IAM User / Role.Permission to call the following Athena APIs
list_databases
list_table_metadata
list_query_executions
list_work_groups
batch_get_query_executions
start_query_execution
get_query_execution
The IAM policy will need permissions to access all Athena workgroups to be able to extract query logs data. Without access to the workgroups KADA can’t can not track user data usage.
To limit access to workgroups See https://docs.aws.amazon.com/athena/latest/ug/workgroups-iam-policy.html on how to add policy entries to have fine grain control at the workgroup level. Note that the extractor runs queries on Athena, If you do choose to restrict workgroup access, ensure that Query based actions (e.g. StartQueryExecution) are allowed for the workgroup the service user/account/role is associated to.
Info |
---|
Athena reports usage at the workgroup. This means usage can not be attributed to the individual user that executed the query. In K Athena usage will be reported against each work group in the format “athena_workgroup_<name>” |
...
Example Role Policy to allow Athena Access with least privileges for actions.
Note
This policy has access to ALL Athena workgroups, due to the
Resources: '*'
ATHENA RESULTS BUCKET NAME
needs to be changed to your bucket.
Code Block |
---|
AWSTemplateFormatVersion: "2010-09-09"
Description: 'AWS IAM Role - Athena and Cloudtrail Access to KADA'
Resources:
KadaAthenaRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: "KadaAthenaRole"
MaxSessionDuration: 43200
Path: "/"
KadaAthenaPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: root
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- athena:BatchGetQueryExecution
- athena:GetQueryExecution
- athena:GetQueryResults
- athena:GetQueryResultsStream
- athena:ListQueryExecutions
- athena:StartQueryExecution
- athena:ListWorkGroups
- athena:ListDataCatalogs
- athena:ListDatabases
- athena:ListTableMetadata
Resource: '*'
- Effect: Allow
Action:
- s3:GetBucketLocation
- s3:GetObject
- s3:ListBucket
- s3:ListBucketMultipartUploads
- s3:ListMultipartUploadParts
- s3:AbortMultipartUpload
- s3:PutObject
- s3:PutBucketPublicAccessBlock
- s3:DeleteObject
Resource:
- arn:aws:s3:::[ATHENA RESULTS BUCKET NAME]
Roles:
- !Ref KadaAthenaRole |
Running the extractor
Setup a python environment Python 3.8+
...
To run you will need to the following inputs
Create an Onboard the Athena Source in K. The Note the host name is an input for the Athena extractor.used in onboarding
An AWS User access key and secret.
Optionally if using assume assuming a role. ARN of a role to assume.
List of catalogs to extract from Athena. If not provided default is
AwsDataCatalog
...