Document toolboxDocument toolbox

Athena

This page will walkthrough the setup of Athena in K using the direct connect method

Integration details

Scope

Included

Comments

Scope

Included

Comments

Metadata

YES

See below

Lineage

YES

 

Usage

YES

 

Sensitive Data Scanner

N/A

 

Known limitations

  • TBC


Step 1: Establish Athena Access

It is advised you create a new Role and a separate s3 bucket for the service user provided to KADA and have a policy that allows the below, see Identity and access management in Athena - Amazon Athena

The service user/account/role will require permissions to the following

  1. Execute queries against Athena with access to the INFORMATION_SCHEMA in particular the following tables:

    1. information_schema.views

    2. information_schema.tables

    3. information_schema.columns

  2. Executing queries in Athena requires an s3 bucket to temporary store results.
    The policy must also allow Read Write Listing access to objects within that bucket, conversely, the bucket must also have policy to allow to do the same.

  3. Call the following Athena APIs

    1. list_databases

    2. list_table_metadata

    3. list_query_executions

    4. list_work_groups

    5. batch_get_query_executions

    6. start_query_execution

    7. get_query_execution

  4. The service user/account/role will need permissions to access all workgroups to be able to extract all data, if you omit workgroups, that information will not be extracted and you may not see the complete picture in K.

  5. See IAM policies for accessing workgroups - Amazon Athena on how to add policy entries to have fine grain control at the workgroup level. Note that the extractor runs queries on Athena, If you do choose to restrict workgroup access, ensure that Query based actions (e.g. StartQueryExecution) are allowed for the workgroup the service user/account/role is associated to.

Note that user usage will be associated to the workgroup level rather than individual users, these workgroups are published as users in K in the form “athena_workgroup_<name>”

Example Role Policy to allow Athena Access with least privileges for actions, this example allows the ACCOUNT ARN to assume the role. Note the variables ATHENA RESULTS BUCKET NAME. You may also choose to just assign the policy directly to a new user and use that user without assuming roles. In the scenario you do wish to assume a role, please note down the role ARN to be used when onbaording/extracting

AWSTemplateFormatVersion: "2010-09-09" Description: 'AWS IAM Role - Athena and Cloudtrail Access to KADA' Resources: KadaAthenaRole: Type: "AWS::IAM::Role" Properties: RoleName: "KadaAthenaRole" MaxSessionDuration: 43200 Path: "/" AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: AWS: "[ACCOUNT ARN]" Action: "sts:AssumeRole" KadaAthenaPolicy: Type: 'AWS::IAM::Policy' Properties: PolicyName: root PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - athena:BatchGetQueryExecution - athena:GetQueryExecution - athena:GetQueryResults - athena:GetQueryResultsStream - athena:ListQueryExecutions - athena:StartQueryExecution - athena:ListWorkGroups - athena:ListDataCatalogs - athena:ListDatabases - athena:ListTableMetadata Resource: '*' - Effect: Allow Action: - s3:GetBucketLocation - s3:GetObject - s3:ListBucket - s3:ListBucketMultipartUploads - s3:ListMultipartUploadParts - s3:AbortMultipartUpload - s3:PutObject - s3:PutBucketPublicAccessBlock - s3:DeleteObject Resource: - arn:aws:s3:::[ATHENA RESULTS BUCKET NAME] Roles: - !Ref KadaAthenaRole

 

 


Step 2: Create the Source in K

Create an Athena source in K

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Click Add Source and select Athena

     

  • Select Direct Connect and add your Athena details

    • Name: Give the Athena source a name in K.

    • Host: Enter a hostname for your Athena instance

    • Region: Set the region for AWS for where Athena exists e.g. ap-southeast-2

    • Athena Results bucket: Bucket location used to temporarily store Athena query results. Use the full path starting with s3://

  • Add Connection Details and click Save & Next

    • Assume Role: Add the Role from Step 1

    • Key: Add the Key from Step 1

    • Secret: Add the Secret from Step 1

  • Test your connection and click Next


Step 3: Schedule Athena source load

  • Select Platform Settings in the side bar

  • In the pop-out side panel, under Integrations click on Sources

  • Locate your new Athena Source and click on the Schedule Settings (clock) icon to set the schedule


Step 4: Manually run an ad hoc load to test Athena

  • Next to your new Source, click on the Run manual load icon

  • Confirm how your want the manual run to be completed

     

  • After the source load is triggered, a pop up bar will appear taking you to the Monitor tab in the Batch Manager page. This is the usual page you visit to view the progress of source loads