December 27, 2022

AWS Kinesis – The most important information

Share this

By Andrei Maksimov

December 27, 2022

kinesis

Amazon Kinesis is a collection of services for collecting, processing, and analyzing streaming data. It enables you to capture and process data as it flows in real time, so you can get timely insights and take action quickly. Kinesis is commonly used in BigData, IoT, Gaming, and Media solutions.

AWS Kinesis streaming data platform is built to process real-time video and data streams, for example, application logs and metrics (AWS Logging – Centralized Logging Best Practices), mobile data, gaming data, website clickstreams, IoT telemetry data, video streams, etc.

AWS Kinesis consists of four services:

  • Kinesis Data Streams – responsible for capturing streaming data and making it available for other AWS services for future processing
  • Kinesis Data Firehose – responsible for processing and delivering streaming data into AWS data stores (S3), analytics services (Redshift, OpenSearch), or 3rd-party services such as New Relic and Splunk, for example.
  • Kinesis Data Analytics – allows analyzing and getting actional insights from real-time streaming data with SQL queries using Apache Flink
  • Kinesis Video Streams – capturing, processing, and storing video streams for future playbacks and analytics.

Amazon Kinesis Data Streams

The Amazon Kinesis Data Streams service acts as a durable and immutable data store helping you to ingest and buffer real-time data streams to allow other AWS services to process and analyze the incoming data later. To achieve this goal, you need to set up a Kinesis stream.

Each Kinesis Data Stream consists of multiple shards. The number of shards you use defines your overall data ingestion capacity throughput.

A Quick Introduction To AWS Kinesis – Getting Started Guide - Stream structure

Each shard gives you 1 MiB/second and 1,000 records/second of write capacity, and 2 MiB/second of read capacity.

If traffic exceeds capacity, your data stream will throttle.

You can use launch Data Stream in one of the two capacity modes:

  • On-demand mode automatically scales the stream’s throughput to accommodate the traffic of up to 200 MiB per second and 200,000 records per second for the write capacity. You pay per stream per hour, and the data transfers in/out per GB.
  • Provisioned mode allows you to use a fixed number of shards in your stream. By default, the limit is set to 500 shards in N. Virginia region (AWS Global Infrastructure: Regions and Availability Zones), 200 shards in other regions, can be increased by Amazon Web Services after opening quota increase service request. You pay per shard per hour.
A Quick Introduction To Amazon Kinesis – Getting Started Guide - Data Streams Architecture

Data producers for Kinesis Data Stream

A Kinesis stream producer might be any device or application that uses SDK, Kinesis Producer Library (KPL), or AWS CLI. Additionally, a Kinesis Agent application might be installed on your server to stream, for example, application logs to the Kinesis Data Streams.

Kinesis record

Each Kinesis client streaming data records (maximum record size is 1 MB), which consist of the “Partition key” and “Data” fields:

  • The “Partition key” defines to which shard the record will go.
  • The “Data” field contains the actual transmitted data.

When the record is received by the consumer, the “Sequence number” field is added to the record by Kinesis to represent the order of the message in the shard.

Data Stream consumers

Kinesis Data Streams can be consumed and processed by any application using SDK or Amazon Kinesis Client Library (KCL), Lambda functions, Kinesis Data Firehose, and Kinesis Data Analytics.

Kinesis consumers can read records from each shard in two modes:

  • Shared – 2 MB/sec per shard throughput is dedicated to all consumers reading from the shard
  • Enhanced – 2MB/sec per shard throughput is dedicated per consumer

Amazon Kinesis Data Streams Capabilities

  • The retention period is from 24 hours and up to 365 days
  • You can reprocess any data from Kinesis Data Streams at any time during the retention period
  • Data in the Kinesis data store is immutable and can’t be deleted
  • Data with the same Partition key will go to the same shard in the order it was received

Additional features

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is a service for streaming data to AWS or AWS partner services in near real-time. It can optionally use the Lambda function to transform streamed data and load it into Amazon S3, Redshift, OpenSearch Service, or a third-party service.

A Quick Introduction To Amazon Kinesis – Getting Started Guide - Data Firehose Architecture

Also, you can configure Kinesis Data Firehose to convert raw streaming data from your data sources into formats like Apache Parquet and Apache ORC required by destination data stores without building your own data processing pipelines.

Data producers for Amazon Kinesis Data Firehose

Producers for the Kinesis Firehose are the same as for Kinesis Data Streams (any device or application that uses SDK, Kinesis Producer Library (KPL), AWS CLI, Kinesis Agent) and Amazon Kinesis Data Streams itself.

Kinesis Data Firehose destinations

Amazon Kinesis Firehose allows uploading streaming data in batches to three destination types:

  • AWS Services – Amazon S3, Amazon Redshift (COPY through S3 bucket), Amazon OpenSearch
  • Third-party platforms – New Relic, Dynatrace, DataDog, Splunk, and others.
  • Custom HTTP endpoint

For any option where S3 is not a part of the configuration, Kinesis Data Firehose also saves all or failed data into an S3 bucket.

Kinesis Data Firehose can capture and store data records for up to 24 hours. If data delivery fails for more than 24 hours, your data is lost.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics is a service that allows you to build applications to process and analyze streaming data. It can be used to detect patterns and insights and to take action in real-time based on the data that is streaming through your data pipeline.

Kinesis Data Analytics uses streamed data sent between Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (MSK) services, allowing you to analyze data in near real-time.

Kinesis Data Analytics Studio allows you to use a Studio notebook to develop your application visually using Apache Flink and Apache Zeppelin in a browser-based IDE for managing code, computation inputs and outputs, visualizations, and labels.

Amazon Kinesis Video Streams

Amazon Kinesis Video Streams is a service that captures, processes, and stores video stream data for real-time analysis and playback. It can power various applications, such as security, marketing, customer service, and IoT.

A Quick Introduction To Amazon Kinesis – Getting Started Guide - Kinesis Video Streams

Kinesis Video Streams provisions and elastically scales all required infrastructure to automatically ingest streaming video data from multiple devices. Kinesis Video Streams makes it easy to securely stream video from various devices to AWS for real-time processing, analytics, and Machine Learning (ML). Existing business intelligence tools can also digest extracted video metadata.

Free hands-on AWS workshops

Currently, AWS provides two great hands-on Kinesis workshops:

  • Real-Time Streaming with Kinesis – you’ll build a real-time data streaming and processing application from capturing real-time data, acting on real-time insights to persisting processed data.
  • Amazon Kinesis Video Streams Workshop – will teach you how to ingest and store video from camera devices, do live and on-demand playback, and download video files using Amazon Kinesis Video Streams. You will also learn how to do live face recognition and near real-time video analysis using Amazon Rekognition Video.

FAQ

What can Kinesis do?

Amazon Kinesis services can help you collect, process, and analyze streaming data to get timely insights and take action. You can use Amazon Kinesis to ingest data from various sources, including applications, websites, sensors, and social media feeds. Amazon Kinesis then makes it easy to stream that data to various destinations, including Amazon DynamoDB, S3, Amazon OpenSearch, and Redshift. You can also use Amazon Kinesis to build custom applications that analyze streaming data in real time.

What is AWS Kinesis used for?

AWS Kinesis is used to process and analyze streaming data. It can track user behavior, monitor social media, or detect fraud.

Is Kinesis the same as Kafka?

Amazon Kinesis and Kafka are two different technologies, and they are not the same. Kafka is a publish-subscribe messaging system, while Kinesis is a data stream processing service.

Is Kinesis an ETL tool?

You can configure Kinesis Data Firehose to convert raw data from your data sources into formats like Apache Parquet and Apache ORC required by destination data stores or do other data transformations on the fly using AWS Lambda, but generally speaking, Kinesis is not an ETL tool.

Summary

In this article, we’ve covered the most important information you have to know about Amazon Kinesis services and provided links to hands-on AWS workshops. If you have any questions or suggestions, let us know in the comments below.

Andrei Maksimov

I’m a passionate Cloud Infrastructure Architect with more than 15 years of experience in IT.

Any of my posts represent my personal experience and opinion about the topic.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Related Posts

AWS Spot Instance – The most important information
AWS Elastic IP – Everything you need to know
AWS CloudTrail – The most important information
AWS Snowball – The most important information

Subscribe now to get the latest updates!

>