Cross-account Athena query

Update (13/5/2021): Looks like AWS's introduced native support for cross-account Athena's queries. Take a look here

Imagine the world in which you have different AWS accounts for different purposes. There’s dev/staging/production AWS account for your shiny software. Also somewhere in the clouds, there’s AWS account used by the data team to prepare one source of truth about your universe (AWS Glue, S3 buckets, catalogues, Athena etc. etc.). Something like that:

Do you see that? Cool.

Now you’re adding a new feature to your software, but it requires to query the data from this huge data store. Peanuts! You modify your role to give it access to the data team’s Athena in another AWS account and you run a query. Well and here things get complicated. You can not make cross-account Athena queries just like that. It requires more effort and involves Lambda functions.

The approach is described in AWS documentation. In that article, I’ll try to help you set up this quickly using Terraform.

Assumptions

As you can see we need two AWS accounts:

  • Account A — the account in which you have S3 buckets with data and AWS Glue catalogue.
  • Account B — the account in which you have your own Athena you’d like to use AWS Glue from Account A and query data in the S3 bucket belonging to Account A.

Here we will not look at how to setup Glue, S3 buckets for catalogue etc. We will just focus on a cross-account query.

Step 1

We need Lambda which will query AWS Glue catalogue. You can use the function from AWS Labs project. So grab from here or use one provided in Terraform module. In this project, you’ll also find more details about the required IAM roles etc. It’s really worth to check their README.md.

Step 2

You need your IAM role which will allow you to:

  • execute Lambda from step 1.
  • query the Glue catalogue in Account A.
  • get access to S3 bucket with data in Account A.
  • run queries and get results from Athena placed in Account B.

Step 3

Create the Athena’s data source.

Terraform

It’s quite easy to perform that steps through AWS console but here you can find Terraform module which will create everything for you. When you use this module you will have one important outcome: ARN of Lambda execution role. You will need to use this ARN in Account A to give Lambda access to Glue catalogue (check section What about Account A).

Current situation

After running Terraform you should have:

  • Lambda and it’s execution role ARN.
  • New data source in Athena — catalogue type: Hive metastore connected to your Lambda.

The final step in Account B

So right now we have the infrastructure to access Glue catalogue in Account A. We need one more thing — the role which you can attach for example to your ECS Fargate task so you can make actual queries to the data in Account A. So you need to create new role (and have its ARN) and attach such policy to it:

This policy allows:

  • write access bucket with results in Account B;
  • read-only access to the bucket with data in Account A;
  • Lambda execution;
  • Run Athena’s queries;

What about Account A

Account A should allow for cross-account access by:

  1. Attaching policy to Glue catalogue

2. Attaching policy to S3 bucket with data:

You should put some effort to specify S3 buckets ARNs properly so the role has access to proper data.

IAM_ROLE_ACCOUNT_B it’s the role with the policy created in Account B which is used for example by ECS Fargate task.

And that’s it!

Using this approach you can give access also to users in different account — it’s just about setting IAM in Account B and Account A.

Summary

I hope that this article was helpful to you. If so please drop a comment or clap :)

Tech Lead / Senior Software Engineer @ Zoopla

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store