26-01-16

AWS Database Choices, Use Cases and Characteristics

I’m currently studying for my AWS Certified Solutions Architect Pro exam and one of the main topics I always seem to have something of a blind spot about it databases. What I’ve done below is to try and summarise some points about RDS and the other database offerings on AWS as a quick reference guide for the exam. If it helps you either for the exam or in real life, great!

The notes were put together using AWS documentation and Re:Invent 2015 videos. If you spot any factual errors, please tweet me @ChrisBeckett and I’ll correct them.

Key differences between SQL and NoSQL

  • NoSQL is schema-less, easy reads and writes, simple data model
  • NoSQL scaling is easy
  • NoSQL focusses on performance and availability at any scale
  • SQL has a strong schema, complex relationships, transactions and joins
  • SQL scaling is difficult
  • SQL focusses on data consistency over scale and availability

What is DynamoDB?

  • NoSQL database offering
  • Fully managed by AWS (no need for separate EC2 instances, etc)
  • Single digit millisecond latency
  • Massive and seamless scalability at low cost

Use cases for DynamoDB

  • Internet of Things (Tracking data, real time notifications, high volumes of data)
  • Ad Tech (ad serving, ID lookup, session tracking)
  • Gaming (gaming leader boards, usage history, logs)
  • Mobile and Web (Storing user profiles, session data)
  • All above use cases require high performance, high scale and high volume

DynamoDB Characteristics

  • Automatically replicated (writes) across three Availability Zones in a single region, persisted to SSD. Write is confirmed when master copy and one replica is updated
  • Reads can be eventually or strongly consistent, no latency trade off, strongly consistent data read from master only
  • DynamoDB consists of tables, tables have items, items have attributes. Because there is no schema, hash keys or item IDs must be present to identify an entry in a table
  • In order to scale properly, you just tell DynamoDB what throughput you need and it will configure the infrastructure for you
  • Pay for the amount of storage used and the amount of throughput (reads and writes)
  • Free tier entitlement of 25GB and 60 million reads and 60 million writes per month. Very cost effective

What is RDS?

  • Fully managed relational databases (MySQL, PostgreSQL, SQL Server, Oracle, Aurora)
  • Fast, predictable performance
  • Simple and fast to scale
  • Low cost, pay for what you use

Use cases for RDS

  • Anything that requires a SQL back end, such as existing corporate applications
  • RDS supports VPC, high availability, instance scaling, encryption and read replicas for Aurora, MySQL, PostgreSQL
  • MySQL even supports cross region deployment
  • 6TB max storage limit for Oracle, PostgreSQL and MySQL. Aurora is 64TB and SQL Server 4TB
  • Scale storage for all except SQL Server
  • Provisioned IOPS 30,000 for MySQL, PostgreSQL, Oracle. 20,000 for SQL Server
  • Largest RDS instance supported is R3.8XL for all platforms

What is Aurora?

  • 99.99% availability
  • 5x faster than MySQL on the same hardware
  • Distributed storage layer, so scales better
  • Data replicated six times across three availability zones
  • 15 read replicas maximum
  • Fully MySQL compatible

RDS Characteristics

  • Supports three types of storage
    • General purpose SSD for most use cases
    • Provisioned IOPS for guaranteed storage performance of up to 30,000 IOPS
    • Magnetic for inexpensive very small workloads
  • Multi-AZ support – provides automatic failover, synchronous replication and is inexpensive and simple to set up
  • Can create read replicas in other regions, can promote to a master for easy DR or put data close to the users that use them
  • Pay for what you consume, some free tier entitlements exist for RDS (20GB of data storage, 20GB of backups, 10m IOPS, 750 micro DB instance hours)

What is ElastiCache?

  • In memory key value store
  • High performance
  • Choice of Redis or Memcached
  • Fully managed, zero admin

ElastiCache Use Cases

  • Caching layer for performance or cost optimisation of an underlying database
  • Storage of ephemeral key-value data
  • High performance application patterns such as session management, event counters, etc
  • Memcached is the simpler of the two. Cache node auto discovery and multi-AZ node placement
  • Redis is Multi-AZ with auto failover, persistence and read replicas
  • Redis handles more complex data types
  • Monthly bill is number of nodes * duration nodes were used for
  • Some free tier eligibility – 750 micro cache node hours

What is Amazon Redshift?

  • Relational data warehouse
  • Massively parallel, petabyte scale
  • Fully managed

Redshift Use Cases

  • Leverage BI tools, Hadoop, machine learning, streaming
  • Pay as you go, grow as you need
  • Analysis in line with data flows
  • Managed availability and data recovery

Redshift Characteristics

 

  • HDD and SSD platforms
  • Architecture has a leader node (endpoint for communication) and compute nodes. These are linked by 10Gbps networking
  • Leader node also stores metadata and optimises the query plan
  • Compute nodes have local columnar storage and have distributed, parallel execution of queries, backups, loads, restores, resizes
  • Backed up continuously and incrementally to S3 and across regions
  • Streamed restores
  • Tolerates disk failures, node failures, network failures and AZ/region failures
  • Leader node is free, compute nodes are charged as they are used

 

 

Advertisement

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.