AWS Database Choices, Use Cases and Characteristics
I’m currently studying for my AWS Certified Solutions Architect Pro exam and one of the main topics I always seem to have something of a blind spot about it databases. What I’ve done below is to try and summarise some points about RDS and the other database offerings on AWS as a quick reference guide for the exam. If it helps you either for the exam or in real life, great!
The notes were put together using AWS documentation and Re:Invent 2015 videos. If you spot any factual errors, please tweet me @ChrisBeckett and I’ll correct them.
Key differences between SQL and NoSQL
- NoSQL is schema-less, easy reads and writes, simple data model
- NoSQL scaling is easy
- NoSQL focusses on performance and availability at any scale
- SQL has a strong schema, complex relationships, transactions and joins
- SQL scaling is difficult
- SQL focusses on data consistency over scale and availability
What is DynamoDB?
- NoSQL database offering
- Fully managed by AWS (no need for separate EC2 instances, etc)
- Single digit millisecond latency
- Massive and seamless scalability at low cost
Use cases for DynamoDB
- Internet of Things (Tracking data, real time notifications, high volumes of data)
- Ad Tech (ad serving, ID lookup, session tracking)
- Gaming (gaming leader boards, usage history, logs)
- Mobile and Web (Storing user profiles, session data)
- All above use cases require high performance, high scale and high volume
DynamoDB Characteristics
- Automatically replicated (writes) across three Availability Zones in a single region, persisted to SSD. Write is confirmed when master copy and one replica is updated
- Reads can be eventually or strongly consistent, no latency trade off, strongly consistent data read from master only
- DynamoDB consists of tables, tables have items, items have attributes. Because there is no schema, hash keys or item IDs must be present to identify an entry in a table
- In order to scale properly, you just tell DynamoDB what throughput you need and it will configure the infrastructure for you
- Pay for the amount of storage used and the amount of throughput (reads and writes)
- Free tier entitlement of 25GB and 60 million reads and 60 million writes per month. Very cost effective
What is RDS?
- Fully managed relational databases (MySQL, PostgreSQL, SQL Server, Oracle, Aurora)
- Fast, predictable performance
- Simple and fast to scale
- Low cost, pay for what you use
Use cases for RDS
- Anything that requires a SQL back end, such as existing corporate applications
- RDS supports VPC, high availability, instance scaling, encryption and read replicas for Aurora, MySQL, PostgreSQL
- MySQL even supports cross region deployment
- 6TB max storage limit for Oracle, PostgreSQL and MySQL. Aurora is 64TB and SQL Server 4TB
- Scale storage for all except SQL Server
- Provisioned IOPS 30,000 for MySQL, PostgreSQL, Oracle. 20,000 for SQL Server
- Largest RDS instance supported is R3.8XL for all platforms
What is Aurora?
- 99.99% availability
- 5x faster than MySQL on the same hardware
- Distributed storage layer, so scales better
- Data replicated six times across three availability zones
- 15 read replicas maximum
- Fully MySQL compatible
RDS Characteristics
- Supports three types of storage
- General purpose SSD for most use cases
- Provisioned IOPS for guaranteed storage performance of up to 30,000 IOPS
- Magnetic for inexpensive very small workloads
- Multi-AZ support – provides automatic failover, synchronous replication and is inexpensive and simple to set up
- Can create read replicas in other regions, can promote to a master for easy DR or put data close to the users that use them
- Pay for what you consume, some free tier entitlements exist for RDS (20GB of data storage, 20GB of backups, 10m IOPS, 750 micro DB instance hours)
What is ElastiCache?
- In memory key value store
- High performance
- Choice of Redis or Memcached
- Fully managed, zero admin
ElastiCache Use Cases
- Caching layer for performance or cost optimisation of an underlying database
- Storage of ephemeral key-value data
- High performance application patterns such as session management, event counters, etc
- Memcached is the simpler of the two. Cache node auto discovery and multi-AZ node placement
- Redis is Multi-AZ with auto failover, persistence and read replicas
- Redis handles more complex data types
- Monthly bill is number of nodes * duration nodes were used for
- Some free tier eligibility – 750 micro cache node hours
What is Amazon Redshift?
- Relational data warehouse
- Massively parallel, petabyte scale
- Fully managed
Redshift Use Cases
- Leverage BI tools, Hadoop, machine learning, streaming
- Pay as you go, grow as you need
- Analysis in line with data flows
- Managed availability and data recovery
Redshift Characteristics
- HDD and SSD platforms
- Architecture has a leader node (endpoint for communication) and compute nodes. These are linked by 10Gbps networking
- Leader node also stores metadata and optimises the query plan
- Compute nodes have local columnar storage and have distributed, parallel execution of queries, backups, loads, restores, resizes
- Backed up continuously and incrementally to S3 and across regions
- Streamed restores
- Tolerates disk failures, node failures, network failures and AZ/region failures
- Leader node is free, compute nodes are charged as they are used