AWS DataSync – The most important information

Transferring large amounts of data from on-premises to the cloud can be time-consuming and tricky, especially when using limited bandwidth or slow network speeds. Transferring and synchronizing data between cloud providers might be challenging too. Thankfully, AWS released the AWS DataSync service, which helps solve all these challenges quickly and easily.

This article will cover the most important information about AWS DataSync you need to know before considering using it. Let’s get started!

What is AWS DataSync?

AWS DataSync is an online data transfer service that makes it easy to automate moving and synchronizing data between on-premises storage systems and AWS storage services, such as Amazon S3, Amazon EFS, or FSx file systems. It also facilitates faster and more secure data transfer in and out of AWS.

DataSync offers several features, including data integrity checks, automatic scheduling of transfers, encryption in transit, sync file systems across regions, and the ability to run multiple tasks simultaneously.

The concept is very simple: you install AWS DataSync Agent virtual machine and connect it to the shared file system using NFS or SMB protocols, and this virtual machine will write data to AWS or copy data from AWS.

AWS DataSync automatically encrypts your data and accelerates its transfer over the Wide Area Network (WAN), it performs automatic data integrity checks in transit and at rest.

DataSync seamlessly connects to AWS services such as Amazon Simple Storage Service (S3), Amazon FSx for Windows File Server, or Amazon Elastic File System (EFS) to transfer data and metadata to and from AWS.

DataSync supports data transfers to/from Hadoop Distributed File System (HDFS), Object storage systems, Google Cloud Storage buckets, and Azure Files.

Whenever DataSync can be used over the internet (VPN connection), it is recommended to use it via an AWS Direct Connect connection for large data transfers.

Features

AWS DataSync features include:

  • Supports NFS, SMB, HDFS file systems, Google Storage and Azure Files
  • Scheduled replication (hourly, daily, weekly)
  • DataSync Agent is required for replication
  • Data encryption
  • Data integrity checks

Requirements

AWS DataSync requirements are straightforward:

  • Deployed DataSync Agent
  • Network connectivity between the agent and the supported target storage system

Use Cases

AWS DataSync allows transferring of large amounts of data to and from AWS and can cover the following AWS DataSync use cases:

  • Data migration or replication from on-premises to AWS
  • Data replication from AWS to on-premises or another cloud provider
  • Data replication between AWS services, e.g., EFS to EFS, EFS to FSx, etc., through an EC2 instance

Additional use cases include:

  • Data migration
  • Archiving cold data
  • Data protection
  • Data movement for timely in-cloud processing

FAQ

What is the difference between DataSync and Storage Gateway?

DataSync is a data transfer service that uses agents to securely move data from on-premises to the AWS cloud or from the AWS cloud to on-premises or another cloud platform using NFS and SMB protocols. Storage Gateway is a hybrid-cloud storage appliance closely integrated with Amazon S3 to extend on-premises storage to the cloud seamlessly. Storage Gateway caches local storage operations and provides on-premises clients with iSCSI storage devices, SMB, and NFS shares.

What protocol does AWS DataSync use?

AWS DataSync transfers data from Network File System (NFS) and Server Message Block (SMB)-compatible file systems.