Working with Athena in Python using Boto3

Kelvin Galabuzi
Kelvin Galabuzi

Amazon Athena is a serverless interactive query service used to analyze data in Amazon S3. Athena analyses data sets in multiple well-known data formats such as CSV, JSON, Apache ORC, Avro, and Parquet and uses standard SQL queries, which are easy to understand and use for existing data management teams. Athena executes queries in parallel, resulting in faster query results without setting up complex ETL (Extract, Load, and Transform) data pipelines. With Athena, you can define your own data schema and query the data customized according to your business or application requirements. This article will cover how you can automate the management of Amazon Athena by using Python and the Boto3 library.

Prerequisites

To get started with Amazon Route 53 using Boto3, you need to set up your Python environment on your laptop.

As a short summary, you need to install:

  • Python 3
  • Boto3
  • AWS CLI Tools

Alternatively, you can set up and launch a Cloud9 IDE Instance.

Managing Athena named queries using Boto3

Named Queries in AWS Athena are saved query statements that make it simple to re-use query statements on data stored in S3. In the following section, we are going to look at how to manage named queries using Boto3.

Create a named query

To create a named query, use the create_named_query() method and assign the required parameters: the Name of the query, the Database, and the QueryString that you want to run on the data stored in S3.

import boto3

client = boto3.client('athena')

response = client.create_named_query(
    Name='TestQuery',
    Database='wbc',
    QueryString='SELECT * FROM "wbc"."thirdpartydata" limit 10;',
)
print(response)

Here is the execution output.

Creating a named query
Creating a named query

List named queries

To list available named queries in Athena, you can use the list_named_queries() method and pass some optional parameters such as MaxResults, which allows you to specify the number of queries to return, and the WorkGroup parameter, which sets the workgroup from which the queries are being returned.

import boto3

client = boto3.client('athena')

response = client.list_named_queries(
    MaxResults=10,
    WorkGroup='primary'
)
print(response)

Here is the execution output.

Listing named queries
Listing named queries

Get a named query

To get a named query, you need to use the get_named_query() method and assign the required parameter, which is the NamedQueryId.

import boto3

client = boto3.client('athena')

response = client.get_named_query(
    NamedQueryId='b1fc5c6e-cd83-451e-b69f-6d7b1ae73795'
)
print(response)

Here is the execution output:

Getting a named query
Getting a named query

Batch get named queries

To get detailed information about multiple named queries, you can use the batch_get_named_query() and assign the required parameter, the NamedQueryIds. Next, you can pass an array of query IDs to the parameter.

import boto3

client = boto3.client('athena')
response = client.batch_get_named_query(
    NamedQueryIds=[
        '9a09d73c-6a9a-4b86-8c33-1ecc68e161de',
        '1dc56214-96e0-47ef-8944-1882c4043da2'
    ]
)
print(response)

Here is the execution output.

Batch get named queries
Batch get named queries

Delete a named query

To delete a named query, you can use the delete_named_query() method and pass the required parameter, which is the NamedQueryId

import boto3

client = boto3.client('athena')
response = client.delete_named_query(
    NamedQueryId='b1fc5c6e-cd83-451e-b69f-6d7b1ae73795'
)
print(response)

Here is the execution output.

Deleting named query
Deleting named query

Managing Athena prepared query statements using Boto3

Prepared Statements in Amazon Athena enable you to prepare SQL statements that can be used for routine execution of the same query with different query parameters. Using prepared statements introduces the reusability of SQL queries and security against SQL injections. In the following section, we are going to discuss the various ways that you can work with prepared statements using Boto3 in Python.

Create a prepared statement

To create a prepared statement, use the create_prepared_statement() method and pass the required parameters: the StatmentName, WorkGroup name, and QueryStatement.

import boto3

client = boto3.client('athena')
response = client.create_prepared_statement(
    StatementName='Test_Statement',
    WorkGroup='primary',
    QueryStatement='SELECT * FROM "wbc"."thirdpartydata";',
)
print(response)

Here is the execution output:

Creating prepared statement
Creating prepared statement

Get a prepared statement

To get a prepared statement, you can use the get_prepared_statement() method and pass the required parameters which are the StatementName and WorkGroup name.

import boto3

client = boto3.client('athena')
response = client.get_prepared_statement(
    StatementName='Test_Statement',
    WorkGroup='primary'
)
print(response)

Here is the execution output.

Get a prepared statement
Getting a prepared statement

List prepared statements

To list prepared statements, use the list_prepared_statements() method and pass a required parameter called the WorkGroup name.

import boto3


client = boto3.client('athena')
response = client.list_prepared_statements(
    WorkGroup='primary',
)
print(response)

Here is the execution output:

Listing prepared statements
Listing prepared statements

Update a prepared statement

To update a prepared statement, you can use the update_prepared_statement() parameter and assign the required parameters, which are the StatementName, WorkGroup, and QueryStatement

import boto3

client = boto3.client('athena')
response = client.update_prepared_statement(
    StatementName='Test_Statement',
    WorkGroup='primary',
    QueryStatement='SELECT * FROM "wbc"."wbcdata";',
)
print(response)

Here is the execution output:

Updating prepared statement
Updating prepared statement

Delete a prepared statement

To delete a prepared statement, you can use the delete_prepared_statement() method and assign the required parameters: the StatementName and WorkGroup.

import boto3

client = boto3.client('athena')
response = client.delete_prepared_statement(
    StatementName='Test_Statement',
    WorkGroup='primary'
)
print(response)

Here is the execution output:

Deleting prepared statement

Managing Athena query executions using Boto3

Start query execution

To start the execution of a new query, you can use the start_query_execution() method and pass the required parameter, the QueryString and assign the s3 output location of the query results.

import boto3

client = boto3.client('athena')
response = client.start_query_execution(
    QueryString='SELECT * FROM "wbc"."wbcdata";',
    QueryExecutionContext={
        'Database': 'wbc'
    },
     ResultConfiguration={
        'OutputLocation': 's3://aws-athena-query-results-wbc01/',
        },
    WorkGroup='primary'
)
print(response)

Here is the execution output:

Starting query execution
Starting query execution

Stop query execution

To stop query execution, use the stop_query_execution() method and assign the required parameter, the QueryExecutionId of the query, you would like to stop.

import boto3

client = boto3.client('athena')
response = client.stop_query_execution(
    QueryExecutionId='da20d77e-b714-465f-a98f-b3586da66585'
)
print(response)

Here is the execution output.

Stopping Athena query execution
Stopping query execution

Get query execution

To get information about a query execution, use the get_query_execution() method and assign the required parameter, which is the QueryExecutionId

import boto3

client = boto3.client('athena')
response = client.get_query_execution(
    QueryExecutionId='da20d77e-b714-465f-a98f-b3586da66585'
)
print(response)

Here is the execution output.

Getting query execution information
Getting query execution information

Get query execution results

To stream the results from a query execution, use the get_query_results() method and assign the required parameter, which is the QueryExecutionId

import boto3

client = boto3.client('athena')
response = client.get_query_results(
    QueryExecutionId='da20d77e-b714-465f-a98f-b3586da66585',
    MaxResults=10
)
print(response)

Here is the execution output.

Streaming query results
Get query results

List query executions

To list query executions, you can use the list_query_executions() and assign the WorkGroup parameter, which specifies the workgroup you want to get the results from.

import boto3

client = boto3.client('athena')
response = client.list_query_executions(
    WorkGroup='primary',
    MaxResults=12
)
print(response)

Here is the execution output.

Listing query executions
Listing query executions

Batch get query executions

To return details of one or more query executions, you can use the batch_get_query_execution() method and assign the query execution IDs to the QueryExecutionIds parameter as an array of strings.

import boto3

client = boto3.client('athena')
response = client.batch_get_query_execution(
    QueryExecutionIds=[
        '3a427d8f-2582-49b5-a752-1d87b8fe9107',
        'f395e1f6-4a68-4ead-bb8d-cb7064a12341'
    ]
)
print(response)

Here is the execution output.

Batch get query executions
Batch get query executions

Managing Athena workgroups using Boto3

Workgroups in AWS Athena are used to separate users, teams, applications e.t.c., and to set limits on the query execution workloads. This enables the ease of management of query execution requirements of various teams when using the same instance of Amazon Athena. In the following section, we are going to look at how to manage Amazon Athena workgroups using Boto3 in Python.

Create a work group

To create an Amazon Athena workgroup, you can use the create_work_group() method and assign the required parameter: the Name of the workgroup.

import boto3

client = boto3.client('athena')
response = client.create_work_group(
    Name='AITeam',
    Configuration={
        'ResultConfiguration': {
            'OutputLocation': 's3://aiteam01/'
        }
    }
)
print(response)

Here is the execution output.

Creating a workgroup
Creating a workgroup

List work groups

To list all the available workgroups, you can use the list_work_groups() method.

import boto3

client = boto3.client('athena')
response = client.list_work_groups()
print(response)

Here is the execution output.

Listing work groups
Listing workgroups

Get a work group

To get a specific workgroup, you can use the get_work_group() method and assign the required parameter, the WorkGroup name.

import boto3

client = boto3.client('athena')
response = client.get_work_group(
    WorkGroup='AITeam'
)
print(response)

Here is the execution output.

Getting a workgroup
Getting a workgroup

Updating a workgroup

To update the properties of a specified workgroup, use the update_work_group() method and assign the required parameter which is the WorkGroup name.

import boto3

client = boto3.client('athena')
response = client.update_work_group(
    WorkGroup='AITeam',
    ConfigurationUpdates={
        'ResultConfigurationUpdates': {
            'OutputLocation': 's3://wbc-card-data01/',
            }
    }
)
print(response)

Here is the execution output.

Updating a workgroup
Updating a workgroup

Deleting a workgroup

To delete a workgroup, use the delete_workgroup() method and assign the required parameter, which is the WorkGroup name. Additionally, you can use the RecursiveDeleteOption parameter to delete the workgroup and its contents, such as named queries ad query executions.

import boto3

client = boto3.client('athena')
response = client.delete_work_group(
    WorkGroup='AITeam',
    RecursiveDeleteOption=True
)
print(response)

Here is the execution output.

Deleting a workgroup
Deleting a workgroup

Managing Athena data catalogs using Boto3

To work with Athena tables and databases, you need a data catalog, and tables are registered with a data catalog to store and retrieve metadata which Athena can then query. In the following section, we will look at how to work with data catalogs using boto3.

Create a data catalog

To create a data catalog, you can use the create_data_catalog() method and assign the required parameters: the Name and the Type of the data catalog.

The parameter catalog-id is the account ID of your AWS account.

import boto3

client = boto3.client('athena')
response = client.create_data_catalog(
    Name='catalog01',
    Type='GLUE',
    Parameters={
        'catalog-id':'AWS Account ID'
    }
)
print(response)

Here is the execution output.

Creating a data catalog
Creating a data catalog

List data catalogs

To list available data catalogs in your AWS account, you can use the list_data_catalogs() method.

List data catalogs
List data catalogs

Get a data catalog

To get information about a data catalog, you can use the get_data_catalog() method and pass the required parameter, the Name of the data catalog.

import boto3

client = boto3.client('athena')
response = client.get_data_catalog(
    Name='catalog01'
)
print(response)

Here is the execution output.

Getting a data catalog
Getting a data catalog

Update a data catalog

To update a data catalog, use the update_data_catalog() method and pass the required parameters: the Name and Type of the data catalog.

client = boto3.client('athena')
response = client.update_data_catalog(
    Name='catalog01',
    Type='GLUE',
    Description='Test environment data catalog',
    Parameters={
        'catalog-id':'AWS Account ID'
    }
)
print(response)

Here is the execution output.

Updating a data catalog

Delete a data catalog

To delete a data catalog, you can use the delete_data_catalog() and assign the required parameter: the Name of the data catalog. 

import boto3

client = boto3.client('athena')
response = client.delete_data_catalog(
    Name='catalog01'
)
print(response)

Here is the execution output.

Deleting a data catalog
Deleting a data catalog

Managing Athena table metadata using Boto3

List table metadata

To list metadata for tables in a specified data catalog, use the list_table_metdata() method and pass the required parameters, which are the CatalogName and the DatabaseName

import boto3

client = boto3.client('athena')
response = client.list_table_metadata(
    CatalogName='AwsDataCatalog',
    DatabaseName='wbc',
)
print(response)

Here is the execution output.

Listing table metadata
Listing table metadata

Get table metadata

To get the table metadata for a specified data catalog, database, and table, you can use the get_table_metdata() method and assign the required parameters: CatalogName, DatabaseName, and TableName.

import boto3

client = boto3.client('athena')
response = client.get_table_metadata(
    CatalogName='AwsDataCatalog',
    DatabaseName='wbc',
    TableName='wbcdata'
)
print(response)

Here is the execution output.

Getting table metadata
Getting table metadata

Managing resource tags using Boto3

Resource tags are used to group Amazon Web Service resources for different purposes, such as grouping and easing the identification of multiple environments and other use cases such as budgeting and cost allocation.

In the following section, we will describe the different ways that you can work with resource tags in Amazon Athena.

Add resource tag

To add a tag to an Amazon Athena resource such as workgroup or data catalog, use the tag_resource() method and pass the required parameters, the ResourceARN and Tags parameter.

import boto3

client = boto3.client('athena')
response = client.tag_resource(
    ResourceARN='arn:aws:athena:us-east-2:585584209241:workgroup/primary',
    Tags=[
        {
            'Key': 'Environment',
            'Value': 'Test'
        },
    ]
)
print(response)

Here is the execution output.

Adding a tag to a resource
Adding a tag to a resource

List resource tags

To list tags that are either associated with an Athena workgroup or data catalog, you can use the list_tags_for_resource() method and pass the required parameter, which is the ResourceARN

import boto3

client = boto3.client('athena')
response = client.list_tags_for_resource(
    ResourceARN='arn:aws:athena:us-east-2:585584209241:workgroup/primary'

)
print(response)

Here is the execution output.

Listing resource tags
Listing resource tags

Untag resource

To untag a resource, use the untag_resource() method and pass the required parameter, which is the ResourceARN

import boto3

client = boto3.client('athena')
response = client.untag_resource(
    ResourceARN='arn:aws:athena:us-east-2:585584209241:workgroup/primary',
    TagKeys=[
        'Environment',
    ]
)
print(response)

Here is the execution output.

Untag resource
Untag resource

List Athena engine versions

To list engine versions that you can work with, you can use the list_engine_versions() method.

import boto3

client = boto3.client('athena')
response = client.list_engine_versions()
print(response)

Here is the execution output.

Listing engine versions
Listing engine versions

Summary

This article covered Python and the Boto3 library to perform some common management tasks on AWS Athena.

If you’d like to learn more about using the Boto3 library, especially in combination with AWS Lambda, we encourage you to check out one of the top-rated Udemy courses on the topic – AWS Automation with Boto3 of Python and Lambda Functions.

Like this article?

Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest

Want to be an author of another post?

We’re looking for skilled technical authors for our blog!

Leave a comment

If you’d like to ask a question about the code or piece of configuration, feel free to use https://codeshare.io/ or a similar tool as Facebook comments are breaking code formatting.