26-01-16

AWS Database Choices, Use Cases and Characteristics

I’m currently studying for my AWS Certified Solutions Architect Pro exam and one of the main topics I always seem to have something of a blind spot about it databases. What I’ve done below is to try and summarise some points about RDS and the other database offerings on AWS as a quick reference guide for the exam. If it helps you either for the exam or in real life, great!

The notes were put together using AWS documentation and Re:Invent 2015 videos. If you spot any factual errors, please tweet me @ChrisBeckett and I’ll correct them.

Key differences between SQL and NoSQL

NoSQL is schema-less, easy reads and writes, simple data model
NoSQL scaling is easy
NoSQL focusses on performance and availability at any scale
SQL has a strong schema, complex relationships, transactions and joins
SQL scaling is difficult
SQL focusses on data consistency over scale and availability

What is DynamoDB?

NoSQL database offering
Fully managed by AWS (no need for separate EC2 instances, etc)
Single digit millisecond latency
Massive and seamless scalability at low cost

Use cases for DynamoDB

Internet of Things (Tracking data, real time notifications, high volumes of data)
Ad Tech (ad serving, ID lookup, session tracking)
Gaming (gaming leader boards, usage history, logs)
Mobile and Web (Storing user profiles, session data)
All above use cases require high performance, high scale and high volume

DynamoDB Characteristics

Automatically replicated (writes) across three Availability Zones in a single region, persisted to SSD. Write is confirmed when master copy and one replica is updated
Reads can be eventually or strongly consistent, no latency trade off, strongly consistent data read from master only
DynamoDB consists of tables, tables have items, items have attributes. Because there is no schema, hash keys or item IDs must be present to identify an entry in a table
In order to scale properly, you just tell DynamoDB what throughput you need and it will configure the infrastructure for you
Pay for the amount of storage used and the amount of throughput (reads and writes)
Free tier entitlement of 25GB and 60 million reads and 60 million writes per month. Very cost effective

What is RDS?

Fully managed relational databases (MySQL, PostgreSQL, SQL Server, Oracle, Aurora)
Fast, predictable performance
Simple and fast to scale
Low cost, pay for what you use

Use cases for RDS

Anything that requires a SQL back end, such as existing corporate applications
RDS supports VPC, high availability, instance scaling, encryption and read replicas for Aurora, MySQL, PostgreSQL
MySQL even supports cross region deployment
6TB max storage limit for Oracle, PostgreSQL and MySQL. Aurora is 64TB and SQL Server 4TB
Scale storage for all except SQL Server
Provisioned IOPS 30,000 for MySQL, PostgreSQL, Oracle. 20,000 for SQL Server
Largest RDS instance supported is R3.8XL for all platforms

What is Aurora?

99.99% availability
5x faster than MySQL on the same hardware
Distributed storage layer, so scales better
Data replicated six times across three availability zones
15 read replicas maximum
Fully MySQL compatible

RDS Characteristics

Supports three types of storage
- General purpose SSD for most use cases
- Provisioned IOPS for guaranteed storage performance of up to 30,000 IOPS
- Magnetic for inexpensive very small workloads
Multi-AZ support – provides automatic failover, synchronous replication and is inexpensive and simple to set up
Can create read replicas in other regions, can promote to a master for easy DR or put data close to the users that use them
Pay for what you consume, some free tier entitlements exist for RDS (20GB of data storage, 20GB of backups, 10m IOPS, 750 micro DB instance hours)

What is ElastiCache?

In memory key value store
High performance
Choice of Redis or Memcached
Fully managed, zero admin

ElastiCache Use Cases

Caching layer for performance or cost optimisation of an underlying database
Storage of ephemeral key-value data
High performance application patterns such as session management, event counters, etc
Memcached is the simpler of the two. Cache node auto discovery and multi-AZ node placement
Redis is Multi-AZ with auto failover, persistence and read replicas
Redis handles more complex data types
Monthly bill is number of nodes * duration nodes were used for
Some free tier eligibility – 750 micro cache node hours

What is Amazon Redshift?

Relational data warehouse
Massively parallel, petabyte scale
Fully managed

Redshift Use Cases

Leverage BI tools, Hadoop, machine learning, streaming
Pay as you go, grow as you need
Analysis in line with data flows
Managed availability and data recovery

Redshift Characteristics

HDD and SSD platforms
Architecture has a leader node (endpoint for communication) and compute nodes. These are linked by 10Gbps networking
Leader node also stores metadata and optimises the query plan
Compute nodes have local columnar storage and have distributed, parallel execution of queries, backups, loads, restores, resizes
Backed up continuously and incrementally to S3 and across regions
Streamed restores
Tolerates disk failures, node failures, network failures and AZ/region failures
Leader node is free, compute nodes are charged as they are used

Blue Clouds

26-01-16

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply