29-08-17

AWS Certification – Changes To Resit Policies?

 

 

As I tweeted at the end of last week, I failed the AWS Advanced Networking exam on Friday and I was looking earlier to see when I could reschedule this and jump back on the horse. Originally when I first started sitting AWS exams back in the dark depths of December 2015, you could sit an exam three times before you had to wait 12 months to sit it again.

As you can imagine, sitting my SA Pro exam at the third time of asking was pressure enough but also to have that sword hanging over my head just made the situation practically unbearable. I’m pleased to note that when I logged into the Training and Certification portal this morning, the resit policy has been relaxed quite a bit. From three attempts in a single year, all exams now have the following terms :-

  • You can sit any AWS exam a total of 10 times (Initial sitting plus 9 retakes)
  • You must wait 14 days after any failed attempt before you can register for a resit
  • The maximum number of exam sittings in a 12 month period seems to have been removed

This is a much better approach for test sitters and takes some of the pressure off. It also makes sense from AWS’s point of view as they can generate more revenues from exams now. I’m not sure when this policy changed (I quickly Googled it and found nothing), but it’s well worth knowing if you’re sitting any exams soon.

As regards the maximum sittings in a single year, if you need more than 10 attempts, it’s probably safe to say you should consider something a bit different. 😉

Screen grab from the T&C portal showing the new resit policy for all exams

 

16-08-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 8.0: Cloud Migration and Hybrid Architecture (10%)

Solutions-Architect-Professional

The final part of the study guide is below – thanks to all those who have tuned in over the past few weeks and given some very positive feedback. I hope it helps (or has helped) you get into the Solutions Architect Pro club. It’s a tough exam to pass and the feeling of achievement is immense. Good luck!

8.1 Plan and execute for applications migrations

  • AWS Management Portal available to plug AWS infrastructure into vCenter. This uses a virtual appliance and can enable migration of vSphere workloads into AWS
  • Right click on VM and select “Migrate to EC2”
  • You then select region, environment, subnet, instance type, security group, private IP address
  • Use cases:-
    • Migrate VMs to EC2 (VM must be powered off and configured for DHCP)
    • Reach new regions from vCenter to use for DR etc
    • Self service AWS portal in vCenter
    • Create new EC2 instances using VM templates
  • The inventory view is presented as :-
    • Region
      • Environment (family of templates and subnets in AWS)
        • Template (prototype for EC2 instance)
          • Running instance
            • Folder for storing migrated VMs
  • Templates map to AMIs and can be used to let admins pick a type for their deployment
  • Storage Gateway can be used as a migration tool
    • Gateway cached volumes (block based iSCSI)
    • Gateway stored volumes (block based iSCSI)
    • Virtual tape library (iSCSI based VTL)
    • Takes snapshots of mounted iSCSI volumes and replicates them via HTTPS to AWS. From here they are stored in S3 as snapshots and then you can mount them as EBS volumes
    • It is recommended to get a consistent snapshot of the VM by powering it off, taking a VM snapshot and then replicating this
  • AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premise data sources, at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).
  • AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos
  • Pipeline has the following concepts:-
    • Pipeline (container node that is made up of the items below, can run on either EC2 instance or EMR node which are provisioned automatically by DP)
    • Datanode (end point destination, such as S3 bucket)
    • Activity (job kicked off by DP, such as database dump, command line script)
    • Precondition (readiness check optionally associated with data source or activity. Activity will not be done if check fails. Standard and custom preconditions available- DynamoDBTableExists, DynamoDBDataExists, S3KeyExists, S3PrefixExists, ShellCommandPrecondition)
    • Schedule
  • Pipelines can also be used with on premises resources such as databases etc
  • Task Runner package is installed on the on premises resource to poll the Data Pipeline queue for work to do (database dump etc, copy to S3)
  • Much of the functionality has been replaced by Lambda
  • Setup logging to S3 so you can troubleshoot it

8.2 Demonstrate ability to design hybrid cloud architectures

  • Biggest CIDR block you can have is a /16 and smallest is /28 for reservations
  • First four IP addresses and last one are reserved by AWS – always 5 reserved
    • 10.0.0.0 – Network address
    • 10.0.0.1 – Reserved for VPC router
    • 10.0.0.2 – Reserved by AWS for DNS services
    • 10.0.0.3 – Reserved by AWS for future use
    • 10.0.0.255 – Reserved for network broadcast. Network broadcast not supported in a VPC, so this is reserved
  • When migrating to Direct Connect from a VPN, make the VPN connection and Direct Connect connection(s) as part of the same BGP area. Then configure the VPN to have a higher cost than the Direct Connect connection. BGP route prepending will do this as BGP is a metric based protocol. A single ASN is considered a more preferable route than an ASN with three or four values
  • For applications that require multicast, you need to configure a VPN between the EC2 instances with in-instance software, so the underlying AWS infrastructure is not aware of it. Multicast is not supported by AWS
  • VPN network must be a different CIDR block than the underlying instances are using (for example 10.x address for EC2 instances and 172.16.x addresses for VPN connection to another VPC)
  • SQL can be migrated by exporting database as flat files from SQL Management Studio, can’t replicate to another region or from on premises to AWS
  • CloudSearch can index documents stored in S3 and is powered by Apache SOLR
    • Full text search
    • Drill down searching
    • Highlighting
    • Boolean search
    • Autocomplete
    • CSV,PDF, HTML, Office docs and text files supported
  • Can also search DynamoDB with CloudSearch
  • CloudSearch can automatically scale based on load or can be manually scaled ahead of expected load increase
  • Multi-AZ is supported and it’s basically a service hosted on EC2, and these are how the costs are derived
  • EMR can be used to run batch processing jobs, such as filtering log files and putting results into S3
  • EMR uses Hadoop which uses HDFS, a distributed file system across all nodes in the cluster where there are multiple copies of the data, meaning resilience of the data and also enables parallel processing across multiple nodes
  • Hive is used to perform SQL like queries on the data in Hadoop, uses simple syntax to process large data sets
  • Pig is used to write MapReduce programs
  • EMR cluster has three components:-
    • Master node (manages data distribution)
    • Core node (stores data on HDFS from tasks run by task nodes and are managed by the master node)
    • Task nodes (managed by the master node and perform processing tasks only, do not form part of HDFS and pass processed data back to core nodes for storage)
  • EMRFS can be used to output data to S3 instead of HDFS
  • Can use spot, on demand or reserved instances for EMR cluster nodes
  • S3DistCp is an extension of DistCp that is optimized to work with AWS, particularly Amazon S3. You use S3DistCp by adding it as a step in a cluster or at the command line. Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by subsequent steps in your Amazon EMR cluster
  • Larger data files are more efficient than smaller ones in EMR
  • Storing data persistently on S3 may well be cheaper than leveraging HDFS as large data sets will require large instances sizes in the EMR cluster
  • Smaller EMR cluster with larger nodes may be just as efficient but more cost effective
  • Try to complete jobs within 59 minutes to save money (EMR billed by hour)

09-08-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 7.0: Scalability and Elasticity (15%)

Solutions-Architect-Professional

7.1 Demonstrate the ability to design a loosely coupled system

  • Amazon CloudFront is a web service (CDN) that speeds up distribution of your static and dynamic web content, for example, .html, .css, .php, image, and media files, to end users. CloudFront delivers your content through a worldwide network of edge locations. When an end user requests content that you’re serving with CloudFront, the user is routed to the edge location that provides the lowest latency, so content is delivered with the best possible performance. If the content is already in that edge location, CloudFront delivers it immediately. If the content is not currently in that edge location, CloudFront retrieves it from an Amazon S3 bucket or an HTTP server (for example, a web server) that you have identified as the source for the definitive version of your content.
  • CloudFront has two aspects – origin and distribution. You create a distribution and link it to an origin, such as S3, an EC2 instance, existing website etc
  • Two types of distributions, web and RTMP
  • Geo restrictions can be used to white or blacklist traffic from specific countries, blocking access to the distribution
  • GET, HEAD, PUT, POST, PATCH, DELETE and OPTIONS HTTP commands supported
  • Allowed methods are what CloudFront will pass on to the origin server. If you do not need to modify content, consider not allowing PUT, POST, PATCH, DELETE to ensure users to not modify content
  • CloudFront does not cache responses to POST, PUT, DELETE and PATCH requests, can POST content to an Edge location and then this is send on to the origin server
  • SSL can be used to provide HTTPS. Can either use CloudFront’s own certificate or use your own
    • To support older browsers, need dedicated SSL IP certificate per edge location, can be very expensive
    • SNI (Server Name Indication) custom SSL certs can be used by adding all hostnames behind the certificate but it is presented as a single IP address. Uses SNI extensions in newer browsers
  • 100 CNAME aliases per distribution, can use wildcard CNAMEs
  • Use Invalidation Requests to forcibly remove content from Edge locations. Need to use API call to do this or do it from the console, or set a TTL on the content
  • Alias records can be used to map a friendly name to a CloudFront URL (Route 53 supports this). Supports zone apex entry (name without www, such as example.com). DNS records for the same name must have the same routing type (simple, weighted, latency, etc) or you will get an error in the console
  • Alias records can then have “evaluate target” set to yes so that existing health checks are used to ensure the underlying resources are up before sending traffic onwards. If a health check for the underlying resource does not exist, evaluate target settings have no effect
  • AWS doesn’t charge for mapping alias records to CloudFront distributions
  • CloudFront supports dynamic web content using cookies to forward on to the origin server
  • Forward query strings passes the whole URL to the origin if configured in CloudFront, but only for a web server or application as S3 does not support this feature
  • Cookie values can then be logged into CloudFront access logs
  • CloudFront can be used to proxy upload requests back to the origin to speed up data transfers
  • Use a zero value TTL for dynamic content
  • Different URL patterns can send traffic to different origins
  • Whitelist certain HTTP headers such as cloudfront-viewer-country so that locale details can be passed through to the web server for custom content
  • Device detection can serve different content based on the User Agent string in the header request
  • Invalidating objects removes them from CloudFront edge caches. A faster and less expensive method is to use versioned object or directory names
  • Enable access logs in CloudFront and then send them to an S3 bucket. EMR can be used to analyse the logs
  • Signed URLs can be used to provide time limited access or access to private content on CloudFront. Signed cookies can be used to limit secure access to certain parts of the site. Use cases are signed URLs for a marketing e-mail and signed cookies for web site streaming or whole site authentication
  • Cache-control max-age header will be sent to browser to control how long the content is in the local browser cache for, can help improve delivery, especially of static items
  • If-modified-since will allow the browser to send a request for content only if it is newer than the modification date specified in the request. If the content has not changed, content is pulled from the browser cache
  • Set a low TTL for dynamic content as most content can be cached even if it’s only for a few seconds. CloudFront can also present stale data if TTL is long
  • Popular Objects report and cache statistics can help you tune CloudFront behaviour
  • Only forward cookies that are used to vary or tailor user based content
  • Use Smooth Streaming on a web distribution for live streaming using Microsoft technology
  • RTMP is true media streaming, progressive download downloads in chunks to say a mobile device. RTMP is Flash only
  • Supports existing WAF policies
  • You can create custom error response pages
  • Two ElastiCache engines available – Redis and Memcached. Exam will give scenarios and you must select the most appropriate
  • As a rule of thumb, simple caching is done by memcached and complex caching is done by Redis
  • Only Redis is multi-AZ and has backup and restore and persistence capabilities, sorting, publisher/subscriber, failover
  • Redis uses a persistence key store or caching engine for persistence
  • Redis has backup and restore and automatic failover and is best used for frequently changing data in a complex scale
  • Doesn’t need a database to backend it like memcached does
  • Leader boards is a good use case for Redis
  • Redis can be configured to use an Append Only File (AOF) that will repopulate the cache in case all nodes are lost and cache is cleared. This is disabled by default. AOF is like a replay log
  • Redis has a primary node and read only nodes. If the primary fails, a read only node is promoted to primary. Writes done to primary node, reads done from read replicas (asynchronous replication)
  • Redis snapshots are used to increase the size of nodes. This is not the same as EC2 snapshots, the snapshot creates a new node based on the snapshot and size is picked when launching
  • Redis can be configured to automatically backup daily in a window or manual snapshots. Automatic have retention limits, manual don’t
  • Memcached can scale horizontally and is multi-threaded, supports sharding
  • Memcached uses lazy loading, so if an app doesn’t get a hit from the cache, it requests it from the DB and then puts that into cache. Write through updates the cache when the database is updated
  • TTL can be used to expire out stale or unread data from the cache
  • Memcached does not maintain it’s own data persistence, database does this, scale by adding more nodes to a cluster
  • Vertically scaling memcached nodes requires standing up a new cluster of required instance sizes/types. All instance types in a cluster are the same type
  • Single endpoint for all memcached nodes
  • Put memcached nodes in different AZs
  • Memcache nodes are empty when first provisioned, bear this in mind when scaling out as this will affect cache performance while the nodes warm up
  • For low latency applications, place Memcache clusters in the same AZ as the application stack. More configuration and management but better performance
  • When deciding between Memcached and Redis, here are a few questions to consider:
    • Is object caching your primary goal, for example to offload your database? If so, use Memcached.
    • Are you interested in as simple a caching model as possible? If so, use Memcached.
    • Are you planning on running large cache nodes, and require multithreaded performance with utilization of multiple cores? If so, use Memcached.
    • Do you want the ability to scale your cache horizontally as you grow? If so, use Memcached.
    • Does your app need to atomically increment or decrement counters? If so, use either Redis or Memcached.
    • Are you looking for more advanced data types, such as lists, hashes, and sets? If so, use Redis.
    • Does sorting and ranking datasets in memory help you, such as with leaderboards? If so, use Redis.
    • Are publish and subscribe (pub/sub) capabilities of use to your application? If so, use Redis.
    • Is persistence of your key store important? If so, use Redis.
    • Do you want to run in multiple AWS Availability Zones (Multi-AZ) with failover? If so, use Redis.
  • Amazon Kinesis is a managed service that scales elastically for real-time processing of streaming data at a massive scale. The service collects large streams of data records that can then be consumed in real time by multiple data-processing applications that can be run on Amazon EC2 instances.
  • You’ll create data-processing applications, known as Amazon Kinesis Streams applications. A typical Amazon Kinesis Streams application reads data from an Amazon Kinesis stream as data records. These applications can use the Amazon Kinesis Client Library, and they can run on Amazon EC2 instances. The processed records can be sent to dashboards, used to generate alerts, dynamically change pricing and advertising strategies, or send data to a variety of other AWS services. The PutRecord command is used to put data into a stream
  • Data is stored in Kinesis for 24 hours, but this can go up to 7 days
  • You can use Streams for rapid and continuous data intake and aggregation. The type of data used includes IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. Because the response time for the data intake and processing is in real time, the processing is typically lightweight
  • The following are typical scenarios for using Streams
    • Accelerated log and data feed intake and processing
    • Real-time metrics and reporting
    • Real-time data analytics
    • Complex stream processing
  • An Amazon Kinesis stream is an ordered sequence of data records. Each record in the stream has a sequence number that is assigned by Streams. The data records in the stream are distributed into shards
  • A data record is the unit of data stored in an Amazon Kinesis stream. Data records are composed of a sequence number, partition key, and data blob, which is an immutable sequence of bytes. Streams does not inspect, interpret, or change the data in the blob in any way. A data blob can be up to 1 MB
  • Retention Period is the length of time data records are accessible after they are added to the stream. A stream’s retention period is set to a default of 24 hours after creation. You can increase the retention period up to 168 hours (7 days) using the IncreaseRetentionPeriod operation
  • A partition key is used to group data by shard within a stream
  • Each data record has a unique sequence number. The sequence number is assigned by Streams after you write to the stream with client.putRecords or client.putRecord
  • In summary, a record has three things:-
    • Sequence number
    • Partition key
    • Data BLOB
  • Producers put records into Amazon Kinesis Streams. For example, a web server sending log data to a stream is a producer
  • Consumers get records from Amazon Kinesis Streams and process them. These consumers are known as Amazon Kinesis Streams Applications
  • An Amazon Kinesis Streams application is a consumer of a stream that commonly runs on a fleet of EC2 instances
  • A shard is a uniquely identified group of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity
  • Once a stream is created, you can add data to it in the form of records. A record is a data structure that contains the data to be processed in the form of a data blob. After you store the data in the record, Streams does not inspect, interpret, or change the data in any way. Each record also has an associated sequence number and partition key
  • There are two different operations in the Streams API that add data to a stream, PutRecords and PutRecord. The PutRecords operation sends multiple records to your stream per HTTP request, and the singular PutRecord operation sends records to your stream one at a time (a separate HTTP request is required for each record). You should prefer using PutRecords for most applications because it will achieve higher throughput per data producer
  • An Amazon Kinesis Streams producer is any application that puts user data records into an Amazon Kinesis stream (also called data ingestion). The Amazon Kinesis Producer Library (KPL) simplifies producer application development, allowing developers to achieve high write throughput to a Amazon Kinesis stream.
  • You can monitor the KPL with Amazon CloudWatch
  • The agent is a stand-alone Java software application that offers an easier way to collect and ingest data into Streams. The agent continuously monitors a set of log files and sends new data records to your Amazon Kinesis stream. By default, records within each file are determined by a new line, but can also be configured to handle multi-line records. The agent handles file rotation, checkpointing, and retry upon failures. It delivers all of your data in a reliable, timely, and simple manner. It also emits CloudWatch metrics to help you better monitor and troubleshoot the streaming process.
  • You can install the agent on Linux-based server environments such as web servers, front ends, log servers, and database servers. After installing, configure the agent by specifying the log files to monitor and the Amazon Kinesis stream names. After it is configured, the agent durably collects data from the log files and reliably submits the data to the Amazon Kinesis stream
  • SNS is Simple Notification Services – publisher creates a topic and then subscribers get updates sent to topics. This can be push to Android, iOS, etc
  • Use SNS to send push notifications to desktops, Amazon Device Messaging, Apple Push for iOS and OSX, Baidu, Google Cloud for Android, MS push for Windows Phone and Windows Push notification services
  • Steps to create mobile push:-
    • Request credentials from mobile platforms
    • Request token from mobile platforms
    • Create platform application object
    • Publish message to mobile endpoint
  • Grid computing vs cluster computing
    • Grid computing is generally loosely coupled, often used with spot instances and tend to grow and shrink as required. Use different regions and instance types
    • Distributed workloads
    • Designed for resilience (auto scaling) – horizontal scaling rather than vertical scaling
    • Cluster computing has two or more instances working together in low latency, high throughput environments
    • Uses same instance types
    • GPU instances do not support SR-IOV networking
  • Elastic Transcoder encodes media files and uses a pipeline with a source and destination bucket, a job and a pre-set (what media type, watermarks etc). Pre-sets are templates and may be altered to provide custom settings. Pipelines can only have one source and one destination bucket
  • Integrates into SNS for job status updates and alerts

 

22-07-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 6.0: Security (20%)

Solutions-Architect-Professional

6.1 Design information security management systems and compliance controls

  • AWS Directory Services is a hosted service that allows you to hook up your EC2 instances with an AD either on prem or standalone in the AWS cloud
  • Comes in two flavours:-
    • AD Connector
    • Simple AD
  • AD Connector permits access to resources such as Workspaces, WorkMail, EC2 etc via existing AD credentials using IAM
  • AD Connector enforces on premises policies such as password complexities, history, lockout policies etc
  • AD can also use MFA by leveraging RADIUS services
  • Simple AD is based within AWS and runs on a Samba 4 compatible server. Supports:-
    • User and group accounts
    • Kerberos based SSO
    • GPOs
    • Domain joining EC2 instances
    • Automated daily snapshots
    • Simple AD limitations:-
      • Does not support MFA
      • Cannot add additional AD servers
      • Can’t create trust relationships
      • Cannot transfer FSMO roles
      • Doesn’t support PowerShell scripting
  • In most cases, Simple AD is the least expensive option and your best choice if you have 5,000 or less users and don’t need the more advanced Microsoft Active Directory features.
  • AWS Directory Service for Microsoft Active Directory (Enterprise Edition) is a managed Microsoft Active Directory hosted on the AWS Cloud. It provides much of the functionality offered by Microsoft Active Directory plus integration with AWS applications. With the additional Active Directory functionality, you can, for example, easily set up trust relationships with your existing Active Directory domains to extend those directories to AWS services.
  • Microsoft AD is your best choice if you have more than 5,000 users and need a trust relationship setup between an AWS hosted directory and your on-premises directories
  • AD Connector is your best choice when you want to use your existing on-premises directory with AWS services.
  • CloudTrail is used for logging all API calls and events made in all regions in your AWS account. This can be either from the console or via the command line. It is more an auditing tool rather than a logging tool
  • CloudWatch is a monitoring service for AWS services. You can collect and track metrics, collect and track log files and set alarms. Works with EC2, DynamoDB, RDS instances as well as any custom metrics from your applications or log files those apps generate
  • By default, CloudWatch Logs will store your log files indefinitely. You can change the log group retention period at any time
  • Log groups are used to capture log files from instances and can gather them in a single folder structure, grouped by instance ID
  • CloudWatch alarms are only stored for 14 days
  • CloudWatch logging is billed per GB ingested and per GB archived per month, charged per alarm per month
  • Can work out cheaper to store your logs in S3, depending on your environment
  • CloudWatch can be used to monitor CloudTrail by creating logging groups to alert when a particular terms, phrases or values is found in a log file (“error”, etc.). This is the CloudWatch Logs feature. Define a metric filter to create alerts based on keywords or phrases in the log files, this then defines a measurable metric
  • Events can be monitored and shipped to CloudWatch, S3 or to a third party product such as Splunk
  • Don’t log to non persistent storage, such as EC2 EBS root volume. Log to S3 or CloudWatch
  • CloudTrail can log across multiple accounts and put logs in a single S3 bucket (needs cross account access)
  • CloudWatch can be used to monitor multiple AWS accounts
  • Awslogs package in Linux installs the log agent and forwards system logs to CloudWatch for collection and alerting
  • Awslogs.conf can be configured to send logging specific information to CloudWatch

6.2 Design security controls with the AWS shared responsibility model and global infrastructure

  • Inline policies are policies that are directly associated to an object (user, for example) and are deleted when the object is deleted. Use cases include:-
    • Requirement for strict one to one policy relationship
    • Ensuring the policy is deleted when the object is deleted
  • Managed policies are created and managed separately, use cases include:-
    • Version management (up to five versions)
    • Configuration rollback
    • Reusability
    • Central management
    • Delegation of permissions management
    • Larger policy size (up to 5K)
    • Can be customer managed or AWS managed (they have little AWS icon next to them)
    • Assign to groups, roles, users etc
    • Up to 10 managed policies may be assigned per object
  • Variables also supported in policies
  • Default policy position is to deny. Explicit deny trumps everything
  • Tags can be used to control access by adding a condition clause into policies – the condition must match a tag for access to be effective (eg. All EC2 instances where the tag matches Cost Centre : IT)
  • IAM policies follow the PARC model
    • Principal (IAM user, group, role)
    • Action (Launch instance, terminate instance, etc.)
    • Resource (EC2 instance, S3 bucket, etc)
    • Condition (where instance = i23523, for example)
      • Effect (Deny, Allow)
  • Wildcards as supported, both asterisk and question marks for granularity
  • NotAction provides a method to exempt or exclude an permission from a resource set, for example having NotAction iam* will grant permissions but not for IAM actions
  • When specifying multiple values in a policy JSON file, this is classed as an array and therefore the values must be wrapped in square brackets []

6.3 Design identity and access management controls

  • The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate (federated users)
  • Users come from one of three sources:-
    • Federated (Active Directory, SAML). Uses AD credentials and does not need to be an IAM user, SSO allows login to console without assigning IAM credentials
    • Federation with OpenID web applications (Facebook, Google, Amazon etc)
    • Cross account access (IAM user from another account)
  • Federation is joining users in one domain (IAM) with another (AD, Facebook etc)
  • Identity Broker joins domain A to domain B
  • Identity Store/Provider is AD, Facebook etc
  • Identity is a user of that service or member of that domain
  • On a correct userid and password, STS returns 4 items – access key, secret access key, token and duration (token’s lifetime, between 1 and 36 hours,default is 12 hours for GetFederationToken, 1 hour for AssumeRole)
  • Identity Broker takes credentials from the application, checks LDAP. If this is correct, it goes to STS and passes the token for a role using GetFederationToken call using IAM credentials. STS passes the access token with permissions back to the broker who passes it back to the app which then accesses the respective resource (such as S3). Resource then verifies the token has appropriate access
    • Develop an identity broker to communicate with LDAP and STS
    • Broker always communicates with LDAP first and then with STS
    • Application gets temporary access to AWS resources
  • AssumeRole Action returns a set of temporary security credentials (consisting of an access key ID, a secret access key, and a security token) that you can use to access AWS resources that you might not normally have access to. Typically, you use AssumeRole for cross-account access or federation.You can optionally include multi-factor authentication (MFA) information when you call AssumeRole. This is useful for cross-account scenarios in which you want to make sure that the user who is assuming the role has been authenticated using an AWS MFA device
  • AssumeRoleWithWebIdentity returns a set of temporary security credentials for users who have been authenticated in a mobile or web application with a web identity provider, such as Amazon Cognito, Login with Amazon, Facebook, Google, or any OpenID Connect-compatible identity provider. Calling AssumeRoleWithWebIdentity does not require the use of AWS security credentials. Therefore, you can distribute an application (for example, on mobile devices) that requests temporary security credentials without including long-term AWS credentials in the application, and without deploying server-based proxy services that use long-term AWS credentials. Instead, the identity of the caller is validated by using a token from the web identity provider.
  • AssumeRoleWithSAML generally used for AD Federation requests.
  • DecodeAuthorizationMessage decodes additional information about the authorization status of a request from an encoded message returned in response to an AWS request. For example, if a user is not authorized to perform an action that he or she has requested, the request returns a Client.UnauthorizedOperation response (an HTTP 403 response). Some AWS actions additionally return an encoded message that can provide details about this authorization failure
  • GetFederationToken returns a set of temporary security credentials (consisting of an access key ID, a secret access key, and a security token) for a federated user. A typical use is in a proxy application that gets temporary security credentials on behalf of distributed applications inside a corporate network. Because you must call the GetFederationToken action using the long-term security credentials of an IAM user, this call is appropriate in contexts where those credentials can be safely stored, usually in a server-based application. If you are creating a mobile-based or browser-based app that can authenticate users using a web identity provider like Login with Amazon, Facebook, Google, or an OpenID Connect-compatible identity provider, we recommend that you use Amazon Cognito or AssumeRoleWithWebIdentity
  • GetSessionToken returns a set of temporary credentials for an AWS account or IAM user. The credentials consist of an access key ID, a secret access key, and a security token. Typically, you use GetSessionToken if you want to use MFA to protect programmatic calls to specific AWS APIs like Amazon EC2 StopInstances. MFA-enabled IAM users would need to call GetSessionToken and submit an MFA code that is associated with their MFA device. Using the temporary security credentials that are returned from the call, IAM users can then make programmatic calls to APIs that require MFA authentication. If you do not supply a correct MFA code, then the API returns an access denied error.
  • The GetSessionToken action must be called by using the long-term AWS security credentials of the AWS account or an IAM user. Credentials that are created by IAM users are valid for the duration that you specify, between 900 seconds (15 minutes) and 129600 seconds (36 hours); credentials that are created by using account credentials have a maximum duration of 3600 seconds (1 hour)
  • Assertions  are used in SAML to map AD groups to AWS roles

6.4 Design protection of Data at Rest controls

  • HSM is a Hardware Security Module and is a physical device used that safeguards and manages cryptographic keys, usually either a plug in card or a physical box
  • HSMs would previously have to be hosted on premises, which could mean latency between the application in AWS and the HSM on the customer site
  • Amazon provides CloudHSM. Keys can be created, stored and managed in a way that is only accessible to you
  • CloudHSM is charged with an upfront fee and then per hour until the instance is terminated. A two week eval is available by request
  • CloudHSM is single tenanted. When you purchase an instance, it’s dedicated to you
  • Has to be deployed in a VPC (EC2-Classic will have to add a VPC)
  • VPC peering can be used to access CloudHSM
  • You can use EBS volume encryption, S3 object encryption and key management with CloudHSM, but this does require custom scripting
  • If you need fault tolerance, you need to add a second CloudHSM in a cluster as if you lose your single one, you lose all the keys
  • Can integrate CloudHSM with RDS as well as Redshift
  • Monitor with syslog
  • AWS Key Management Service is used from the IAM console and allows an administrator to define keys for the encryption of data
  • KMS is region based
  • CMK is the Customer Master Key and is the top of the hierarchy and you can add KMS administrators using IAM. Users also need to have permissions via IAM or they are not allowed to use keys to perform encryption tasks
  • Accounts from other AWS accounts can be added as users
  • Key rotation changes the backing key and all backing keys are kept. These are used to encrypt and decrypt data. CMKs would need to be disabled to prevent any of the backing keys being used for encryption or decryption
  • Data encrypted using a key is lost if the key is lost
  • You can select which encryption key is used to create an encrypted EBS volume, for example. If none is selected, the default is the EBS key pre-created in KMS

6.5 Design protection of Data in Flight and Network Perimeter controls

  • NTP amplification can be used with a spoof IP address to return a large packet back to a different target (the intended victim) and flood the target with traffic
  • Reflection attacks involve eliciting a response from a server to a spoofed IP address where the compromised server acts like a reflector
  • Attacks can also take place at the application layer (layer 7) by flooding the web server with GET requests.
  • Slowloris attack is deliberately slow GET requests to open up lots of connections on the web server
  • Limit the attack surface by opening only required ports, use bastion hosts where appropriate and use private subnets
  • WAF is web application filter and provides protection at layer 7
  • Can use a community based WAF appliance or use the AWS WAF service
  • Stacks can also be scaled horizontally and vertically to meet the additional load placed on your infrastructure by a DDoS attack
  • Scaling out is easier than scaling up as it results in no downtime as instances are added
  • Geo restrictions or blocking can be used with CloudFront to prevent attacks from certain countries. This can be achieved by either using white or black listing
  • Origin Access Identity. Restrict access to S3 buckets by preventing direct user access and forcing them to access objects via CloudFront URLs
  • Alias records in Route 53 can be used to redirect traffic from an existing infrastructure to a new one with greater capacity and WAFs, built to withstand a DDoS attack. No DNS changes and no propagation delays
  • You also need to learn normal behaviour for an application so that you don’t block any traffic during month end spikes, for example
  • With C3, C4, R3, D2, and I2 instances, you can enable Enhanced Networking capabilities, which provides higher network performance (packets per second). This feature uses a network virtualization stack that provides higher I/O performance and lower CPU utilization compared to traditional implementations. With Enhanced Networking, your application can benefit from features that can aid in building resilience against DDoS attacks, such as high packet-per-second performance, low latency networking, and improved scalability.
  • Amazon Route 53 has two capabilities that work together to help ensure end users can access your application even under DDoS attack: shuffle sharding and anycast routing
  • Amazon Route 53 uses shuffle sharding to spread DNS requests over numerous PoPs, thus providing multiple paths and routes for your application
  • Anycast routing increases redundancy by advertising the same IP address from multiple PoPs. In the event that a DDoS attack overwhelms one endpoint, shuffle sharding isolate failures while providing additional routes to your infrastructure
  • Alias Record Sets can save you time and provide additional tools while under attack. For example, suppose an Alias Record Set for example.com points to an ELB load balancer, which is distributing traffic across several EC2 instances running your application. If your application came under attack, you could change the Alias Record Set to point to an Amazon CloudFront distribution or to a different ELB load balancer with higher capacity EC2 instances running WAFs or your own security tools. Amazon Route 53 would then automatically reflect those changes in DNS answers for example.com without any changes to the hosted zone that contains Alias Record Sets for example.com.
  • IDS is Intrusion Detection, IPS is Intrusion Protection
  • IDS/IPS is a virtual appliance installed into the public subnet that may communicate with a SoC such as Trend Micro, sends logs to S3 and an agent is required in each instance to capture and analyse traffic and requests
  • It is possible to restrict access to resources using tags. You can do an explicit deny permission and this overrides everything. Use Action:API permissions to prevent actions via the command line or AWS console

18-07-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 5.0: Data Storage for a complex large scale deployment

Solutions-Architect-Professional

5.1 Demonstrate ability to make architectural trade off decisions involving storage options

  • S3 is highly available, replicated object based data within a region
  • S3 can be optimised for specific use cases and costs
  • Glacier is cheaper storage, object based but recovery of data from this storage takes several hours and is not suitable for quick recovery. Archiving is a good use case
  • Backup S3 by copying data to another bucket in another account (cross-account access)
  • EBS is block level storage used with EC2 instances. Depending on the use case, EBS can provide magnetic storage (cheaper but slower) or SSD storage and is suitable for persistent storage and random read/write workloads
  • EBS provides 99.999% availability and is AZ specific
  • EBS and S3 offer encryption at rest
  • S3 offers versioning functionality
  • EBS offers snapshot functionality. Snapshots only copy the updated blocks and do not affect performance. Snapshots may also be used to create a new volume that has been resized, or you can also change storage type from GP2 to Magnetic or Provisioned IOPS, for example
  • You can create EBS Magnetic volumes from 1 GiB to 1 TiB in size; you can create EBS General Purpose (SSD) and Provisioned IOPS (SSD) volumes up to 16 TiB in size. You can mount these volumes as devices on your Amazon EC2 instances. You can mount multiple volumes on the same instance, but each volume can be attached to only one instance at a time
  • Delete on terminate flag can be changed at any time
  • With General Purpose (SSD) volumes, your volume receives a base performance of 3 IOPS/GiB, with the ability to burst to 3,000 IOPS for extended periods of time. Burst credits are accumulated over time, much like T2 instances and CPU.
  • General Purpose (SSD) volumes are ideal for a broad range of use cases such as boot volumes, small and medium size databases, and development and test environments. General Purpose (SSD) volumes support up to 10,000 IOPS and 160 MB/s of throughput
  • With Provisioned IOPS (SSD) volumes, you can provision a specific level of I/O performance. Provisioned IOPS (SSD) volumes support up to 20,000 IOPS and 320 MB/s of throughput. This allows you to predictably scale to tens of thousands of IOPS per EC2 instance
  • EBS volumes are created in a specific Availability Zone, and can then be attached to any instances in that same Availability Zone. To make a volume available outside of the Availability Zone, you can create a snapshot and restore that snapshot to a new volume anywhere in that region. You can copy snapshots to other regions and then restore them to new volumes there, making it easier to leverage multiple AWS regions for geographical expansion, data center migration, and disaster recovery
  • Performance metrics, such as bandwidth, throughput, latency, and average queue length, are available through the AWS Management Console. These metrics are provided by Amazon CloudWatch
  • Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket and eventual consistency for overwrite PUTS and DELETES in all regions
  • Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content, for example, .html, .css, .php, image, and media files, to end users. CloudFront delivers your content through a worldwide network of edge locations.
  • When an end user requests content that you’re serving with CloudFront, the user is routed to the edge location that provides the lowest latency, so content is delivered with the best possible performance.
  • If the content is already in that edge location, CloudFront delivers it immediately. If the content is not currently in that edge location, CloudFront retrieves it from an Amazon S3 bucket or an HTTP server (for example, a web server) that you have identified as the source for the definitive version of your content
  • If you need to design for a high number (> 300 per second) of GETs (reads), then CloudFront can be an appropriate option. Caches content from S3. RTMP and web distribution
  • HTTPS can be used on a CloudFront URL (distro.cloudfront.net) and then from CF to the actual source (origin) itself if it is a HTTPS web server. S3 origin traffic will be sent as the same as CF, so if the CF connection is HTTPS, then onward traffic to S3 is HTTPS (“Match Viewer” setting)
  • You can use a custom SSL certificate on the CF distribution if you want it to match your domain name. You can upload this or use Certificate Manager to do this. The CA must be supported by Mozilla in order to work with CF
  • If you anticipate that your workload will consistently exceed 100 requests per second, you should avoid sequential key names. If you must use sequential numbers or date and time patterns in key names, add a random prefix to the key name. The randomness of the prefix more evenly distributes key names across multiple index partitions
  • S3 can be tuned to use versioning, on/off/suspended but cannot be removed
  • MFA deletion policies can be used
  • Bucket policies can secure S3 contents
  • In summary, AWS storage options are:-
    • Amazon S3 Scalable storage in the cloud
    • Amazon Glacier Low-cost archive storage in the cloud
    • Amazon EBS Persistent block storage volumes for Amazon EC2 virtual machines
    • Amazon EC2 Instance Storage Temporary block storage volumes for Amazon EC2 virtual machines
    • AWS Import/Export Large volume data transfer
    • AWS Storage Gateway Integrates on-premises IT environments with cloud storage
    • Amazon CloudFront Global content delivery network (CDN)
    • CF can have reserved capacity purchased up front to save costs
  • S3 usage summary:-
    • One very common use for Amazon S3 is storage and distribution of static web content and media. This content can be delivered directly from Amazon S3, since each object in Amazon S3 has a unique HTTP URL address, or Amazon S3 can serve as an origin store for a content delivery network (CDN), such as Amazon CloudFront. Because of Amazon S3’s elasticity, it works particularly well for hosting web content with extremely spiky bandwidth demands. Also, because no storage provisioning is required, Amazon S3 works well for fast growing websites hosting data intensive, user-generated content, such as video and photo sharing sites.
    • Amazon S3 is also frequently used to host entire static websites. Amazon S3 provides a highly-available and highly scalable solution for websites with only static content, including HTML files, images, videos, and client-side scripts such as JavaScript.
    • Amazon S3 is also commonly used as a data store for computation and large-scale analytics, such as analyzing financial transactions, clickstream analytics, and media transcoding. Because of the horizontal scalability of Amazon S3, you can access your data from multiple computing nodes concurrently without being constrained by a single connection.
    • Finally, Amazon S3 is often used as a highly durable, scalable, and secure solution for backup and archival of critical data, and to provide disaster recovery solutions for business continuity. Because Amazon S3 stores objects redundantly on multiple devices across multiple facilities, it provides the highly-durable storage infrastructure needed for these scenarios.
    • Amazon S3’s versioning capability is available to protect critical data from inadvertent deletion
    • To speed access to relevant data, many developers pair Amazon S3 with a database, such as Amazon DynamoDB or Amazon RDS
  • Amazon S3 has the following anti-patterns:  
    • File system—Amazon S3 uses a flat namespace and isn’t meant to serve as a standalone, POSIX-compliant file system. However, by using delimiters (commonly either the ‘/’ or ‘\’ character) you are able construct your keys to emulate the hierarchical folder structure of file system within a given bucket.
    • Structured data with query—Amazon S3 doesn’t offer query capabilities: to retrieve a specific object you need to already know the bucket name and key. Thus, you can’t use Amazon S3 as a database by itself. Instead, pair Amazon S3 with a database to index and query metadata about Amazon S3 buckets and objects.  
    • Rapidly changing data—Data that must be updated very frequently might be better served by a storage solution with lower read / write latencies, such as Amazon EBS volumes, Amazon RDS or other relational databases, or Amazon DynamoDB.  
    • Backup and archival storage—Data that requires long-term encrypted archival storage with infrequent read access may be stored more cost-effectively in Amazon Glacier.  
    • Dynamic website hosting—While Amazon S3 is ideal for websites with only static content, dynamic websites that depend on database interaction or use server-side scripting should be hosted on Amazon EC2
  • Amazon Glacier summary:-
    • Organizations are using Amazon Glacier to support a number of use cases. These include archiving offsite enterprise information, media assets, research and scientific data, digital preservation and magnetic tape replacement
  • Amazon Glacier has the following anti-patterns:  
    • Rapidly changing data—Data that must be updated very frequently might be better served by a storage solution with lower read/write latencies, such as Amazon EBS or a database.  
    • Real time access—Data stored in Amazon Glacier is not available in real time. Retrieval jobs typically require 3-5 hours to complete, so if you need immediate access to your data, Amazon S3 is a better choice
  • EBS usage summary:-
    • Amazon EBS is meant for data that changes relatively frequently and requires long-term persistence.
    • Amazon EBS is particularly well-suited for use as the primary storage for a database or file system, or for any applications that require access to raw block-level storage.
    • Amazon EBS Provisioned IOPS volumes are particularly well-suited for use with databases applications that require a high and consistent rate of random disk reads and writes.
  • Amazon EBS has the following anti-patterns:  
    • Temporary storage—If you are using Amazon EBS for temporary storage (such as scratch disks, buffers, queues, and caches), consider using local instance store volumes, Amazon SQS, or ElastiCache (Memcached or Redis).  
    • Highly-durable storage—If you need very highly-durable storage, use Amazon S3 or Amazon Glacier. Amazon S3 standard storage is designed for 99.999999999% annual durability per object. In contrast, Amazon EBS volumes with less than 20 GB of modified data since the last snapshot are designed for between 99.5% and 99.9% annual durability; volumes with more modified data can be expected to have proportionally lower durability.  
    • Static data or web content—If your data doesn’t change that often, Amazon S3 may represent a more cost effective and scalable solution for storing this fixed information. Also, web content served out of Amazon EBS requires a web server running on Amazon EC2, while you can deliver web content directly out of Amazon S3.
  • Instance Store volumes usage:-
    • In general, local instance store volumes are ideal for temporary storage of information that is continually changing, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers. Amazon EC2 instance storage is well-suited for this purpose. It consists of the virtual machine’s boot device (for instance store AMIs only), plus one or more additional volumes that are dedicated to the Amazon EC2 instance (for both Amazon EBS AMIs and instance store AMIs). This storage is usable only from a single Amazon EC2 instance during its lifetime. Note that, unlike Amazon EBS volumes, instance store volumes cannot be detached or attached to another instance.
    • High I/O and high storage provide Amazon EC2 instance storage targeted to specific use cases. High I/O instances provide instance store volumes backed by SSD, and are ideally suited for many high performance database workloads. Example applications include NoSQL databases like Cassandra and MongoDB.
    • High storage instances support much higher storage density per Amazon EC2 instance, and are ideally suited for applications that benefit from high sequential I/O performance across very large datasets. Example applications include data warehouses, Hadoop storage nodes, seismic analysis, cluster file systems, etc.
    • Note that applications using instance storage for persistent data generally provide data durability through replication, or by periodically copying data to durable storage.
  • Amazon EC2 instance store volumes have the following anti-patterns:  
    • Persistent storage—If you need persistent virtual disk storage similar to a physical disk drive for files or other data that must persist longer than the lifetime of a single Amazon EC2 instance, Amazon EBS volumes or Amazon S3 are more appropriate.  
    • Relational database storage—In most cases, relational databases require storage that persists beyond the lifetime of a single Amazon EC2 instance, making Amazon EBS volumes the natural choice.  
    • Shared storage—Instance store volumes are dedicated to a single Amazon EC2 instance, and cannot be shared with other systems or users. If you need storage that can be detached from one instance and attached to a different instance, or if you need the ability to share data easily, Amazon S3 or Amazon EBS volumes are the better choice.  
    • Snapshots—If you need the convenience, long-term durability, availability, and shareability of point-in-time disk snapshots, Amazon EBS volumes are a better choice
  • AWS Import/Export summary:-
    • AWS Import/Export is ideal for transferring large amounts of data in and out of the AWS cloud, especially in cases where transferring the data over the Internet would be too slow or too costly. In general, if loading your data over the Internet would take a week or more, you should consider using AWS Import/Export.
    • Common use cases include initial data upload to AWS, content distribution or regular data interchange to/from your customers or business associates, transfer to Amazon S3 or Amazon Glacier for off-site backup and archival storage, and quick retrieval of large backups from Amazon S3 or Amazon Glacier for disaster recovery
  • AWS Import/Export Anti-Patterns
    • AWS Import/Export is optimal for large data that would take too long to load over the Internet, so the anti-pattern is simply data that is more easily transferred over the Internet.
    • If your data can be transferred over the Internet in less than one week, AWS Import/Export may not be the ideal solution
  • AWS Storage Gateway summary
    • Organizations are using AWS Storage Gateway to support a number of use cases. These include corporate file sharing, enabling existing on-premises backup applications to store primary backups on Amazon S3, disaster recovery, and data mirroring to cloud-based compute resources.
  • AWS Storage Gateway has the following anti-patterns:  
    • Database storage—Amazon EC2 instances using Amazon EBS volumes are a natural choice for database storage and workloads.
  • Amazon CLoudFront usage patterns:-
    • Amazon CloudFront is ideal for distribution of frequently-accessed static content that benefits from edge delivery—like popular website images, videos, media files or software downloads.
    • Amazon CloudFront can also be used to deliver dynamic web applications over HTTP. These applications may include static content, dynamic content, or a whole site with a mixture of the two.
    • Amazon CloudFront is also commonly used to stream audio and video files to web browsers and mobile devices.
  • Amazon CloudFront has the following anti-patterns:  
    • Programmatic cache invalidation—While Amazon CloudFront supports cache invalidation, AWS recommends using object versioning rather than programmatic cache invalidation.  
    • Infrequently requested data—It may be better to serve infrequently-accessed data directly from the origin server, avoiding the additional cost of origin fetches for data that is not likely to be reused at the edge
  • Could also use instance backed storage (i.e. host local) as a caching mechanism by striping a bunch of volumes as RAID0 and then mirroring or synching them off to EBS volumes for persistence
  • S3 also has the ability to use events to push notifications to SNS or to Lambda. For example, you could have a configuration where when a video file is uploaded to S3, the bucket then triggers a create event to add metadata to DynamoDB and add a thumbnail to the bucket. Conversely, a delete event could remove the thumbnail and also clean up the entry in DynamoDB.
    • Prefixes can be used to subscribe to events only in certain folders in buckets
    • Suffixes can be used to subscribe to events only of a certain file type, i.e. JPG
    • S3 permissions are required for this to work

5.2 Demonstrate ability to make architectural trade off decisions involving database options

  • Amazon Relational Database Service (Amazon RDS) is a web service that provides the capabilities of MySQL, Oracle, or Microsoft SQL Server relational database as a managed, cloud-based service. It also eliminates much of the administrative overhead associated with launching, managing, and scaling your own relational database on Amazon EC2 or in another computing environment.
  • Amazon RDS Usage Patterns:-
    • Amazon RDS is ideal for existing applications that rely on MySQL, Oracle, or SQL Server traditional relational database engines. Since Amazon RDS offers full compatibility and direct access to native database engines, most code, libraries, and tools designed for these databases should work unmodified with Amazon RDS.
    • Amazon RDS is also optimal for new applications with structured data that requires more sophisticated querying and joining capabilities than that provided by Amazon’s NoSQL database offering, Amazon DynamoDB.
    • When creating a new DB instance using the Amazon RDS Provisioned IOPS storage, you can specify the IOPS your instance needs from 1,000 IOPS to 30,000 IOPS and Amazon RDS provisions that IOPS rate for the lifetime of the instance.
    • Amazon RDS leverages Amazon EBS volumes as its data store
    • The Amazon RDS Multi-AZ deployment feature enhances both the durability and the availability of your database by synchronously replicating your data between a primary Amazon RDS DB instance and a standby instance in another Availability Zone. In the unlikely event of a DB component failure or an Availability Zone failure, Amazon RDS will automatically failover to the standby (which typically takes about three minutes) and the database transactions can be resumed as soon as the standby is promoted
  • Amazon RDS anti-patterns:-
    • Index and query-focused data—Many cloud-based solutions don’t require advanced features found in a relational database, such as joins and complex transactions. If your application is more oriented toward indexing and querying data, you may find Amazon DynamoDB to be more appropriate for your needs.  
    • Numerous BLOBs—While all of the database engines provided by Amazon RDS support binary large objects (BLOBs), if your application makes heavy use of them (audio files, videos, images, and so on), you may find Amazon S3 to be a better choice.  
    • Automated scalability—As stated previously, Amazon RDS provides pushbutton scaling. If you need fully automated scaling, Amazon DynamoDB may be a better choice.  
    • Other database platforms—At this time, Amazon RDS provides a MySQL, MariaDB (MySQL fork), Postgres, Oracle, and SQL Server databases. If you need another database platform (such as IBM DB2, Informix or Sybase) you need to deploy a self-managed database on an Amazon EC2 instance by using a relational database AMI, or by installing database software on an Amazon EC2 instance.  
    • Complete control—If your application requires complete, OS-level control of the database server, with full root or admin login privileges (for example, to install additional third-party software on the same server), a self managed database on Amazon EC2 may be a better match.
    • There is no BYOL option for RDS. If you need to use a BYOL database, you will need to provision an EC2 instance and install your database on this.
  • Amazon DynamoDB is a fast, fully-managed NoSQL database service that makes it simple and cost-effective to store and retrieve any amount of data, and serve any level of request traffic. Amazon DynamoDB helps offload the administrative burden of operating and scaling a highly-available distributed database cluster
  • Amazon DynamoDB stores structured data in tables, indexed by primary key, and allows low-latency read and write access to items ranging from 1 byte up to 64 KB. Amazon DynamoDB supports three data types: number, string, and binary, in both scalar and multi-valued sets
  • The primary key uniquely identifies each item in a table. The primary key can be simple (partition key) or composite (partition key and sort key).
  • When it stores data, DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based upon the partition key value. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
  • Amazon DynamoDB is integrated with other services, such as Amazon Elastic MapReduce (Amazon EMR), Amazon Redshift, Amazon Data Pipeline, and Amazon S3, for analytics, data warehouse, data import/export, backup, and archive.
  • DynamoDB supports query and scan options for searching tables. Query uses indexes that already exist and is quicker and lighter on resource. Scan ignores indexes and basically searches every key and value. Very slow and resource intensive
  • Secondary indexes can be used to create additional indexes for query if the standard primary key search is not appropriate. Global secondary key and local secondary keys are supported
  • DynamoDB Usage Patterns:-
    • Amazon DynamoDB is ideal for existing or new applications that need a flexible NoSQL database with low read and write latencies, and the ability to scale storage and throughput up or down as needed without code changes or downtime.
    • Common use cases include: mobile apps, gaming, digital ad serving, live voting and audience interaction for live events, sensor networks, log ingestion, access control for web-based content, metadata storage for Amazon S3 objects, e-commerce shopping carts, and web session management. Many of these use cases require a highly available and scalable database because downtime or performance degradation has an immediate negative impact on an organization’s business.
  • Amazon DynamoDB has the following anti-patterns:  
    • Prewritten application tied to a traditional relational database—If you are attempting to port an existing application to the AWS cloud, and need to continue using a relational database, you may elect to use either Amazon RDS (MySQL, Oracle, or SQL Server), or one of the many preconfigured Amazon EC2 database AMIs. You are also free to create your own Amazon EC2 instance, and install your own choice of database software.  
    • Joins and/or complex transactions—While many solutions are able to leverage Amazon DynamoDB to support their users, it’s possible that your application may require joins, complex transactions, and other relational infrastructure provided by traditional database platforms. If this is the case, you may want to explore Amazon RDS or Amazon EC2 with a self-managed database.  
    • BLOB data—If you plan on storing large (greater than 64 KB) BLOB data, such as digital video, images, or music, you’ll want to consider Amazon S3. However, Amazon DynamoDB still has a role to play in this scenario, for keeping track of metadata (e.g., item name, size, date created, owner, location, and so on) about your binary objects.  
    • Large data with low I/O rate—Amazon DynamoDB uses SSD drives and is optimized for workloads with a high I/O rate per GB stored. If you plan to store very large amounts of data that are infrequently accessed, other storage options, such as Amazon S3, may be a better choice
  • ElastiCache is a web service that makes it easy to deploy, operate, and scale a distributed, in-memory cache in the cloud.
  • ElastiCache improves the performance of web applications by allowing you to retrieve information from a fast, managed, in-memory caching system, instead of relying entirely on slower disk-based databases.
  • ElastiCache supports two popular open-source caching engines: Memcached and Redis (master/slave, cross AZ redundancy)
  • ElastiCache usage patterns:-
    • ElastiCache improves application performance by storing critical pieces of data in memory for low-latency access.
    • It is frequently used as a database front end in read-heavy applications, improving performance and reducing the load on the database by caching the results of I/O-intensive queries.
    • It is also frequently used to manage web session data, to cache dynamically-generated web pages, and to cache results of computationally-intensive calculations, such as the output of recommendation engines.
    • For applications that need more complex data structures than strings, such as lists, sets, hashes, and sorted sets, the Redis engine is often used as an in-memory NoSQL database.
    • Sorted sets make Redis a good choice for gaming applications such as leaderboards, top tens, most popular etc
    • Pub/sub for messaging, so real time chat etc
  • Amazon ElastiCache has the following anti-patterns:  
    • Persistent data—If you need very fast access to data, but also need strong data durability (persistence), Amazon DynamoDB is probably a better choice
  • Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets that range from a few hundred gigabytes to a petabyte or more.
  • Amazon Redshift usage patterns:-
    • Amazon Redshift is ideal for analyzing large datasets using your existing business intelligence tools.
    • Analyze global sales data for multiple products  
    • Store historical stock trade data  
    • Analyze ad impressions and clicks  
    • Aggregate gaming data  
    • Analyze social trends  
    • Measure clinical quality, operation efficiency, and financial performance in the healthcare space
  • Amazon Redshift has the following anti-patterns:  
    • OLTP workloads—Amazon Redshift is a column-oriented database suited to data warehouse and analytics, where queries are typically performed over very large datasets.
    • If your application involves online transaction processing, a traditional row-based database system, such as Amazon RDS, is a better match.
    • BLOB data—If you plan on storing binary (e.g., video, pictures, or music), you’ll want to consider Amazon S3.
  • Redshift can be Can be a single node or multi node cluster
  • Provision a Redshift node type to select the size of storage included
  • Redshift is a single AZ solution
  • Redshift clusters have a leader node which co-ordinates queries and spreads them across worker nodes in the cluster. Data is written as a stripe across all nodes to their local storage
  • To scale a cluster, just add nodes to add compute and add its local storage to the cluster. Redshift manager manages adding nodes and spreading queries out
  • Redshift is like a massive SQL server
  • Encryption can be enabled on Redshift but must be done when the cluster is first spun up
  • When a cluster is resized, the cluster is restarted in read only mode, all connections are terminated and the old cluster is used as a data source to re-populate the new cluster. The new cluster is read only until the replication is complete. The end point is then updated and the old cluster terminates connections
  • Redshift may be purchased on demand or use reserved instances but spot instances are not possible. Clusters can be scaled up, down, in or out
  • When you shut down a cluster, you can take a final manual snapshot from which you can recover from later. If you delete a cluster, all automatic snapshots are deleted
  • Snapshots can be manually or automatically copied from one region to another, but data charges are incurred
  • Redshift snapshot includes cluster size, instance types, cluster data and cluster configuration
  • Amazon EC2, together with Amazon EBS volumes, provides an ideal platform for you to operate your own self-managed relational database in the cloud. Many leading database solutions are available as pre built, ready-to-use Amazon EC2 AMIs, including IBM DB2 and Informix, Oracle Database, MySQL, Microsoft SQL Server, PostgreSQL, Sybase, EnterpriseDB, and Vertica.
  • Databases on EC2 usage patterns:-
    • Running a relational database on Amazon EC2 and Amazon EBS is the ideal scenario for users whose application requires a specific traditional relational database not supported by Amazon RDS, or for those users who require a maximum level of administrative control and configurability.
  • Self-managed relational databases on Amazon EC2 have the following anti-patterns:
    • Index and query-focused data—Many cloud-based solutions don’t require advanced features found in a relational database, such as joins or complex transactions.
    • If your application is more oriented toward indexing and querying data, you may find Amazon DynamoDB to be more appropriate for your needs, and significantly easier to manage.  
    • Numerous BLOBs—Many relational databases support BLOBs (audio files, videos, images, and so on). If your application makes heavy use of them, you may find Amazon S3 to be a better choice. You can use a database to manage the metadata.
    • Automatic scaling—Users of relational databases on AWS can, in many cases, leverage the scalability and elasticity of the underlying AWS platform, but this requires system administrators or DBAs to perform a manual or scripted task. If you need pushbutton scaling or fully-automated scaling, you may opt for another storage choice such as Amazon DynamoDB or Amazon RDS.  MySQL, Oracle, SQL Server
    • If you are running a self-managed MySQL, Oracle, or SQL Server database on Amazon EC2, you should consider the automated backup, patching, Provisioned IOPS, replication, and pushbutton scaling features offered by a fully-managed Amazon RDS database

5.3 Demonstrate ability to implement the most appropriate data storage architecture

  • S3 for highly available, multi-AZ resilience, BLOB storage, versioning, bucket policies, lifecycling, encryption in transit and at rest
  • CloudFront for lots of read requests, caches to geographical edge location
  • Glacier for long term storage, infrequent access
  • Amazon EBS Persistent block storage volumes for Amazon EC2 virtual machines, encryption of volumes, snapshotting
  • Amazon EC2 Instance Storage Temporary block storage volumes for Amazon EC2 virtual machines
  • AWS Import/Export Large volume data transfer
  • AWS Storage Gateway Integrates on-premises IT environments with cloud storage
  • RDS most appropriate for applications pre-built to leverage SQL, Oracle, MySQL etc and structured data
  • If asked about ACID capabilities, use an RDS based solution

5.4 Determine use of synchronous versus asynchronous replication

11-07-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 4.0: Network Design for a complex large scale deployment (10%)

Solutions-Architect-Professional

4.1 Demonstrate ability to design and implement networking features of AWS

  • VPC is a Virtual Private Cloud. You can have up to 5 on an AWS account, if you need more you can raise a ticket
  • Create a VPC you need a CIDR block, a name tag and tenancy type of default or dedicated. Dedicated costs money, default doesn’t
  • Creating a VPC automatically creates a routing table
  • Subnets map to AZs on a one to one basis
  • Amazon reserve five IP addresses on a subnet
  • One Internet Gateway per VPC
  • There is a default routing table but you can also create your own routing tables and assign them to subnets
  • Public subnets are publically accessible from the internet, private ones aren’t
  • AMI is used for the NAT device, runs on Amazon Linux
  • Remember to disable source/destination checks on your NAT instance, or traffic will not be routed
  • NAT Gateways can be used to provide up to 10Gbps traffic out from a private subnet to the internet. For more bandwidth or to scale, add more gateways. Remember though that NAT Gateways and subnets have a one to one relationship in the sense that if you add a route to 0.0.0.0/0 to the NAT Gateway, you can’t add another route to 0.0.0.0/0 for failover. You would need to split the routes up.
  • Create an Endpoint to send S3 traffic over the AWS backbone rather than the public internet. You add a route table entry to use the Endpoint reference to send S3 traffic via the endpoint. Endpoints are created at VPC level. Create a VPC policy to restrict access to buckets within S3 from certain principals. Can also be used in concert with bucket policies for further security.
  • Can’t cross regions with S3 Endpoints, so can’t copy a bucket from one region to another using an Endpoint
  • VPC peering connects two VPCs together within the same region. This can be the same AWS account or different accounts
  • There is no single point of failure and it does not use a VPN, bridge or gateway to make the connection
  • Transitive peering is not supported and peered VPCs must not have overlapping CIDR blocks
  • Soft limit of 50 VPC peers per account
  • Placement groups can span peered VPCs but you will not get the full bandwidth between instances in peered VPCs
  • A placement group is a logical grouping of instances within a single Availability Zone. Using placement groups enables applications to participate in a low-latency, 10 Gbps network. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both. To provide the lowest latency, and the highest packet-per-second network performance for your placement group, choose an instance type that supports enhanced networking
  • You are charged for data transfer within a VPC peering connection at the same rate as you are charged for data transfer across Availability Zones.
  • The Maximum Transmission Unit (MTU) across a VPC peering connection is 1500 bytes
  • You cannot have more than one VPC peering connection between the same two VPCs at the same time
  • You cannot reference a security group from the peer VPC as a source or destination for ingress or egress rules in your security group. Instead, reference CIDR blocks of the peer VPC as the source or destination of your security group ingress or egress rules (you can from 1st March)
  • An instance’s public DNS hostname will not resolve to its private IP address across peered VPCs
  • Some use cases for VPC peering:-
    • Your company’s IT department has a VPC for file sharing. You want to peer other VPCs to that central VPC, however, you do not want the other VPCs to send traffic to each other
    • Your company has a VPC that you want to share with your customers. Each customer can create a VPC peering connection with your VPC, however, your customers cannot route traffic to other VPCs that are peered to yours, nor are they aware of the other customers’ routes
    • You have a central VPC that is used for Active Directory services. Specific instances in peer VPCs send requests to the Active Directory servers and require full access to the central VPC. The central VPC does not require full access to the peer VPCs; it only needs to route response traffic to the specific instances
  • The routing table entry must have the CIDR block to be reached in the peered VPC (such as 10.0.2.0/24) and the target VPC connection (such as pcx-aaaaeeee)
  • VPCs can be configured to peer and access resources in a specific subnet by using routing table entries to match that subnet. This also allows for peered connections to other VPCs where CIDR blocks will overlap. For example:-
    • The route table for subnet X points to VPC peering connection pcx-aaaabbbb to access the entire CIDR block of VPC B. VPC B’s route table points to pcx-aaaabbbb to access the CIDR block of only subnet X in VPC A. Similarly, the route table for subnet Y points to VPC peering connection pcx-aaaacccc to access the entire CIDR block of VPC C. VPC C’s route table points to pcx-aaaacccc to access the CIDR block of only subnet Y in VPC A
  • Peered connections can be configured to route between one subnet and a VPC only by creating a routing table and adding it to that specific subnet
  • If you have a VPC peered with multiple VPCs that have overlapping or matching CIDR blocks, ensure that your route tables are configured to avoid sending response traffic from your VPC to the incorrect VPC. AWS currently does not support unicast reverse path forwarding in VPC peering connections that checks the source IP of packets and routes reply packets back to the source
  • For example, you have the same configuration of one VPC peered to specific subnets in two VPCS. VPC B and VPC C have matching CIDR blocks, and their subnets have matching CIDR blocks. The route tables for VPC A, subnet A in VPC B, and subnet B in VPC C remain unchanged. The route table for subnet B in VPC B points to the VPC peering connection pcx-aaaabbbb to access VPC A’s subnet

vpc

  • To route traffic to a specific instance in another VPC, add a routing table entry with the IP address and /32 –  for example, if you need to route from 10.0.1.20 in VPC A to 192.168.1.56 in VPC B, add a route table entry for 192.168.1.56/32 with a target of pcx-aaabbbb to ensure traffic is routed to the correct VPC
  • Longest prefix match is used when routing traffic to a specific instance in a VPC peer when CIDR blocks overlap. Traffic routing to a specific IP address takes precedence over a subnet entry as the prefix is longer (24 vs 32 for example)
  • There can only be one route per subnet when overlapping CIDR blocks are in play, so where an entry already exists in VPC A for 192.168.1.0/24 to route to VPC B, you cannot have another entry for 192.168.1.0/24 to route to VPC B, you would have to restrict peering to another subnet in VPC B, such as 192.168.2.0/24
  • Invalid VPC peering configurations:-
    • Overlapping CIDR blocks
    • Transitive VPC peering
    • Edge to edge routing. If VPC A has a peer to VPC B and VPC B has a VPN or Direct Connect connection to a corporate LAN, VPC A cannot use this connection to access resources in the corporate network

vpc-b

  • Other scenarios not permitted:-
    • A VPN connection or an AWS Direct Connect connection to a corporate network
    • An Internet connection through an Internet gateway
    • An Internet connection in a private subnet through a NAT device
    • A ClassicLink connection to an EC2-Classic instance
    • A VPC endpoint to an AWS service; for example, an endpoint to Amazon S3.
  • To configure VPC peering:-
    • Owner of VPC A sends a peering request to owner of VPC B
    • VPC B owner accepts request
    • VPC A and VPC B owners add a routing table entry to route traffic to the reciprocal VPC
    • Security groups and/or NACLs may need reconfiguring to allow traffic

4.2 Demonstrate ability to design and implement connectivity features of AWS

  • Direct Connect is a dedicated permanent connection between your premises and AWS. This is brokered via a third parties who are Direct Connect partners
  • Supports 802.1q VLANs and you can partition the connection into multiple virtual interfaces, or VIFs
  • 1Gbps or 10Gbps connections are available, sub 1Gpbs connections can be bought from Direct Connect partners (AT&T, Colt, Equinix, etc)
  • Can help reduce costs when using lots of traffic
  • Increase reliability and bandwidth, no longer dependent on internet links
  • VPN connection more appropriate for quick setup and modest bandwidth requirements
  • Direct Connect uses a public connection when accessing public resources such as EC2 instances and S3 buckets and uses a private connection when accessing resources such as VPC based resources
  • Makes AWS a logical extension of your corporate network

vpc-c

  • One private VIF connection per VPC (one to one mapping)
  • Direct Connect is not inherently fault tolerant, this needs to be built in either by having a secondary Direct Connect or VPN, using BGP to fail over automatically to the backup connection
  • VPN has two endpoints, Customer Gateway (CGW) and AWS connection (Virtual Private Gateway or VPG) – these concepts are not used by Direct Connect
  • In the US, one Direct Connect connection will grant you access to all regions, traffic stays within the AWS internal network
  • Layer 2 network connections not supported
  • Prerequisites for Direct Connect include:-
    • Your network is co-located with an existing AWS Direct Connect location
    • You are working with an AWS Direct Connect partner who is a member of the AWS Partner Network (APN)
    • You are working with an independent service provider to connect to AWS Direct Connect.
    • Connections to AWS Direct Connect require single mode fiber, 1000BASE-LX (1310nm) for 1 gigabit Ethernet, or 10GBASE-LR (1310nm) for 10 gigabit Ethernet. Auto Negotiation for the port must be disabled. You must support 802.1Q VLANs across these connections
    • Your network must support Border Gateway Protocol (BGP) and BGP MD5 authentication. Optionally, you may configure Bidirectional Forwarding Detection (BFD)
  • To connect to Amazon Virtual Private Cloud (Amazon VPC), you must first do the following:-
    • Provide a private Autonomous System Number (ASN). Amazon allocates a private IP address in the 169.x.x.x range to you
    • Create a virtual private gateway and attach it to your VPC
  • To connect to public AWS products such as Amazon EC2 and Amazon S3, you need to provide the following:-
    • A public ASN that you own (preferred) or a private ASN
    • Public IP addresses (/31) (that is, one for each end of the BGP session) for each BGP session. If you do not have public IP addresses to assign to this connection, log on to AWS and then open a ticket with AWS Support.
    • The public routes that you will advertise over BGP
  • AWS Direct Connect Limits:-
    • Virtual interfaces per AWS Direct Connect connection – 50 (soft limit)
    • Active AWS Direct Connect connections per region per account – 10 (soft limit)
    • Routes per Border Gateway Protocol (BGP) session – 100 (hard limit)
  • Sub 1Gbps connections only support a single VIF
  • HPC uses Lustre and NFS file protocols, these can often require Jumbo Frames. These are only supported by enhanced networking (10Gbps NIC). Also use Placement Groups to keep instances together for high performance and low latency in a single AZ
  • The following instances support enhanced networking:-
    • C3
    • C4
    • D2
    • I2
    • M4
    • R3
  • Enhanced Networking is made possible using SR-IOV (single root I/O virtualisation)
  • Must be done using HVMs and not PV instances
  • A Placement Group is a logical grouping of instances within a single AZ
    • Used for low latency connections between instances
  • For lowest latency and highest throughput, choose an instance that supports Enhanced Networking
  • The latest Amazon Linux HVM AMIs have the module required for enhanced networking installed and have the required attribute set. Therefore, if you launch an Amazon EBS–backed C3, C4, R3, or I2 instance using a current Amazon Linux HVM AMI, enhanced networking is already enabled for your instance
  • Older HVMs can be enhanced networking enabled by updating to the latest kernel by running a sudo yum update
  • Use the modinfo ixgbevf command to check if enhanced networking has been enabled
  • If you lose connectivity while enabling enhanced networking, the ixgbevf module might be incompatible with the kernel. Try installing the version of the ixgbevf module included with the distribution of Linux for your instance
  • 1 placement group per AZ, they don’t span AZs
  • Placement Groups can span subnets in the same VPC, but they must be in the same AZ
  • Existing instances cannot be moved into an AZ
  • It’s best practice to size the placement group for the peak load and launch all instances at once
  • Try to use the same instance types when creating a placement group
  • Elastic Load Balancer (ELB) distributes traffic amongst instances in the multiple AZs
  • You can use the operating systems and instance types supported by Amazon EC2. You can configure your EC2 instances to accept traffic only from your load balancer.
  • You can configure the load balancer to accept traffic using the following protocols: HTTP, HTTPS (secure HTTP), TCP, and SSL (secure TCP).
  • You can configure your load balancer to distribute requests to EC2 instances in multiple Availability Zones, minimizing the risk of overloading one single instance. If an entire Availability Zone goes offline, the load balancer routes traffic to instances in other Availability Zones.
  • There is no limit on the number of connections that your load balancer can attempt to make with your EC2 instances. The number of connections scales with the number of concurrent requests that the load balancer receives.
  • You can configure the health checks that Elastic Load Balancing uses to monitor the health of the EC2 instances registered with the load balancer so that it can send requests only to the healthy instances.
  • You can use end-to-end traffic encryption on those networks that use secure (HTTPS/SSL) connections.
  • [EC2-VPC] You can create an Internet-facing load balancer, which takes requests from clients over the Internet and routes them to your EC2 instances, or an internal-facing load balancer, which takes requests from clients in your VPC and routes them to EC2 instances in your private subnets. Load balancers in EC2-Classic are always Internet-facing.
  • [EC2-Classic] Load balancers for EC2-Classic support both IPv4 and IPv6 addresses. Load balancers for a VPC do not support IPv6 addresses.
  • You can monitor your load balancer using CloudWatch metrics, access logs, and AWS CloudTrail.
  • You can associate your Internet-facing load balancer with your domain name. Because the load balancer receives all requests from clients, you don’t need to create and manage public domain names for the EC2 instances to which the load balancer routes traffic. You can point the instance’s domain records at the load balancer instead and scale as needed (either adding or removing capacity) without having to update the records with each scaling activity.
  • Elastic Load Balancing supports the processing, storage, and transmission of credit card data by a merchant or service provider, and has been validated as being compliant with Payment Card Industry (PCI) Data Security Standard (DSS)
  • For Elastic Load Balancing, you pay for each hour or portion of an hour that the service is running, and you pay for each gigabyte of data that is transferred through your load balancer
  • ELB works in conjunction with:-
    • EC2
    • Auto Scaling
    • CloudWatch
    • Route 53
  • Load balancers can listen on the following ports:
    • [EC2-VPC] 1-65535
    • [EC2-Classic] 25, 80, 443, 465, 587, 1024-65535
  • The HTTP requests and HTTP responses use header fields to send information about HTTP messages. Elastic Load Balancing supports X-Forwarded-For headers. Because load balancers intercept traffic between clients and servers, your server access logs contain only the IP address of the load balancer. To see the IP address of the client, use the X-Forwarded-For request header
  • When you use HTTP/HTTPS, you can enable sticky sessions on your load balancer. A sticky session binds a user’s session to a specific back-end instance. This ensures that all requests coming from the user during the session are sent to the same back-end instance
  • For each request that a client makes through a load balancer, the load balancer maintains two connections. One connection is with the client and the other connection is to the back-end instance. For each connection, the load balancer manages an idle timeout that is triggered when no data is sent over the connection for a specified time period. After the idle timeout period has elapsed, if no data has been sent or received, the load balancer closes the connection
  • By default, Elastic Load Balancing sets the idle timeout to 60 seconds for both connections
  • If you use HTTP and HTTPS listeners, we recommend that you enable the keep-alive option for your EC2 instances. You can enable keep-alive in your web server settings or in the kernel settings for your EC2 instances. Keep-alive, when enabled, enables the load balancer to re-use connections to your back-end instance, which reduces the CPU utilization. To ensure that the load balancer is responsible for closing the connections to your back-end instance, make sure that the value you set for the keep-alive time is greater than the idle timeout setting on your load balancer.
  • By default, your load balancer distributes incoming requests evenly across its enabled Availability Zones. To ensure that your load balancer distributes incoming requests evenly across all back-end instances, regardless of the Availability Zone that they are in, enable cross-zone load balancing
  • To ensure that the load balancer stops sending requests to instances that are de-registering or unhealthy, while keeping the existing connections open, use connection draining. This enables the load balancer to complete in-flight requests made to instances that are de-registering or unhealthy. Default is 300 seconds (1-3600 seconds available)
  • Proxy Protocol is an Internet protocol used to carry connection information from the source requesting the connection to the destination for which the connection was requested. Elastic Load Balancing uses Proxy Protocol version 1, which uses a human-readable header format.
  • By default, when you use Transmission Control Protocol (TCP) or Secure Sockets Layer (SSL) for both front-end and back-end connections, your load balancer forwards requests to the back-end instances without modifying the request headers. If you enable Proxy Protocol, a human-readable header is added to the request header with connection information such as the source IP address, destination IP address, and port numbers. The header is then sent to the back-end instance as part of the request.
  • You can enable Proxy Protocol on ports that use either the SSL and TCP protocols. You can use Proxy Protocol to capture the source IP of your client when you are using a non-HTTP protocol, or when you are using HTTPS and not terminating the SSL connection on your load balancer
  • Prerequisites to using Proxy Protocol:-
    • Confirm that your load balancer is not behind a proxy server with Proxy Protocol enabled. If Proxy Protocol is enabled on both the proxy server and the load balancer, the load balancer adds another header to the request, which already has a header from the proxy server. Depending on how your back-end instance is configured, this duplication might result in errors.
    • Confirm that your back-end instances can process the Proxy Protocol information
  • You can’t assign an Elastic IP address to an ELB
  • IPv4 and v6 supported on an ELB
  • You can load balance a domain apex name, such as bbc.com (no www)
  • Enable CLoudTrail on an ELB to output logs to an S3 bucket
  • Multiple SSL certificates should mean multiple ELBs unless using a wildcard
  • Each load balancer receives a default Domain Name System (DNS) name. This DNS name includes the name of the AWS region in which the load balancer is created. For example, if you create a load balancer named my-loadbalancer in the US West (Oregon) region, your load balancer receives a DNS name such as my-loadbalancer-1234567890.us-west-2.elb.amazonaws.com
  • Use a friendly DNS name for your load balancer, such as http://www.example.com, instead of the default DNS name, you can create a custom domain name and associate it with the DNS name for your load balancer
  • To improve the performance of your NAT instance, you can:-
    • Scale up (choose a bigger instance)
    • Enable Enhanced Networking
    • Scale out (add more instances)
    • Create a new NAT instance and new subnet and route new  subnet traffic through it. Subnets and NAT instances are associated on a one to one basis
    • HA for NAT is also possible, but remember it’s an active/passive configuration

27-06-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 3.0: Deployment Management (10%)

Solutions-Architect-Professional

3.1 Ability to manage the lifecycle of an application on AWS

  • CloudFormation is a way of scripting the deployment of infrastructure and can automatically take care of dependencies and introduce version management
  • CloudFormation supports the following services:-
    • Auto Scaling
    • CloudFront
    • CloudTrail
    • CloudWatch
    • DynamoDB
    • EC2
    • ElastiCache
    • Elastic Beanstalk
    • Elastic Load Balancer
    • Kinesis
    • IAM
    • OpsWorks
    • RDS
    • RedShift
    • Route53
    • S3
    • SimpleDB
    • SNS
    • SQS
    • VPC
    • CloudFormation is made up of a template and a stack
      • A template is an architectural “diagram” of what the deployment will look like
      • A stack is the actual deployment itself (and its constituent services)
      • You can create, update and delete stacks using templates
    • Templates are in JSON format
    • You don’t need to delete a stack in order to update individual components
    • Template has the following characteristics:-
      • File format and version number (mandatory)
      • List of AWS resources and their configuration values (mandatory)
      • Everything else is optional
        • Template parameters (applied at stack creation time, limit of 60)
        • Output values (public IP address, ELB address, limit of 60)
        • List of data tables (AMI types etc.)
        • Use Fn:GetAtt to output data. Fn uses an intrinsic function and also has Fn:GetInMap, Fn:GetAZs to return a value from a lookup
    • In order to successfully deploy/delete/update a stack, the user must have appropriate IAM permissions to all objects the stack contains (e.g. EC2 instances, S3 buckets etc.). If not, deployment will fail
    • Chef and Puppet are supported to provide a configuration down to the application layer
    • Bootstrap scripts also supported to allow installation of packages, files and applications by adding them to the template
    • Automatic rollback on error is automatically enabled
    • You are still charged for provisioned resources, even if the deployment fails
    • CloudFormation is free
    • CloudFormation has a WaitCondition resource that can wait for an application or deployment response before continuing on
    • You can specify deletion policies:-
      • Take a snapshot of an EC2 instance, EBS volume or RDS instance before deletion
      • Preserve a resource when deleting a stack, such as an S3 bucket
    • CloudFormation can be used to create roles in IAM
      • Also grant EC2 instances access to roles
    • CloudFormation can create VPCs and their components:-
      • Subnets
      • Gateways
      • Route Tables
      • Network ACLs
      • Elastic IPs
      • EC2 instances
      • Auto Scaling Groups
      • Elastic Load Balancers
      • RDS Instances
      • RDS Security Groups
    • You can specify IP addresses as either specific individual addresses or CIDR ranges. You can also specify pre-existing elastic IP addresses
    • Can create multiple VPCs inside one template

 

  • Can enable VPC peering from CloudFormation but only within the same AWS account

 

  • Route 53 zones can be created or updated from a template
    • A Records, CNAME, Alias etc
  • Remember what is mandatory for a template (Format, version and resources)
  • Chef, Puppet and Bootstrap are the supported deployment tools
  • Cfn-init can be used when the instances are created to install packages, start services, define service states, etc
  • Elastic Beanstalk is a pre-built web application environment that developers can upload applications to for a quick deployment time
  • Elastic Beanstalk takes care of capacity, provisioning, auto scaling and monitoring
  • Provides a “portal” based access method for developers to upload their application
  • CloudFormation supports Elastic Beanstalk, but not vice versa
  • Elastic Beanstalk supports the following:-
    • Apache Tomcat for Java applications
    • Apache HTTP for PHP applications
    • Apache HTTP for Python applications
    • Nginx or Apache HTTP for Node.js applications
    • Passenger for Ruby applications
    • IIS 7.5 for .NET applications
  • Elastic Beanstalk provides access to CloudWatch for application status and monitoring
  • You can adjust application settings (such as JVM settings) and pass environment variables
  • Elastic Beanstalk has three components within applications:-
    • Environments (EC2, ELB, ASGs)
    • Application versions (stored in S3, highly available, roll back code, can also push from git)
    • Saved configurations (define how objects behave including auto scaling groups, e-mail notifications, instances, managed update settings, software configurations, update configuration etc. App can have many saved configurations)
  • Two types of Elastic Beanstalk deployments:-
    • Single instance
    • Load balancing, auto scaling
  • Two tiers of Elastic Beanstalk
    • Web server
    • Worker server (no web components, just runs binaries, listens to SQS queues for work)
  • Application environments can be set for test/dev, staging, production etc for blue/green deployments
  • Use RDS for test and dev as this is removed when EB has the application removed. Production requires more permanent data store, so use already provisioned RDS instance
  • You can deploy Docker containers to EB in one of three ways:-
    • Dockerfile (image built on instance)
    • Dockerfile.json.aws (manifest that describes how to use Docker image – name of the image, port mappings and is unique to EB)
    • Application archive (should include Dockerfile or Dockerfile.json.aws)
  • Benefits of using Docker images include you can use any runtime, even ones not supported by EB (Scala, for example)
  • Dockerfile is basically a manifest file that defines how the docker application is setup. There are several command used within the file to determine such settings as maintainer name, commands to run, ADD (downloads content from a local store or git URL, ENV for environment variables, FROM (defines base image and must be declared first in the Dockerfile), VOLUME (mount host local directory inside the container), RUN (get updates for packages etc)
  • .ebextensions file lists all the resources created by EB (IAM roles, instances, RDS etc)
  • Use Swap URLs feature for a quick cut over between prod and staging, with zero downtime – however this is all or nothing, if the app is broken it will break all instances
  • eb deploy command reads the .git folder for commits, if not present it uses the current folder for deploying to EB
  • When deploying apps in EB, note the following policies for deployment policies:-
  • Deployment policy – Choose from the following deployment options:
    • All at once – Deploy the new version to all instances simultaneously. All instances in your environment are out of service for a short time while the deployment occurs.
    • Rolling – Deploy the new version in batches. Each batch is taken out of service during the deployment phase, reducing your environment’s capacity by the number of instances in a batch.
    • Rolling with additional batch – Deploy the new version in batches, but first launch a new batch of instances to ensure full capacity during the deployment process.
    • Immutable – Deploy the new version to a fresh group of instances by performing an immutable update (alternative to rolling updates that ensure that configuration changes that require replacing instances are applied efficiently and safely. If an immutable environment update fails, the rollback process requires only terminating an Auto Scaling group. A failed rolling update, on the other hand, requires performing an additional rolling update to roll back the changes).
  • Batch type – Whether you want to allocate a percentage of the total number of EC2 instances in the Auto Scaling group or a fixed number to a batch of instances.
  • Batch size – The number or percentage of instances to deploy in each batch, up to 100 percent or the maximum instance count in your environment’s Auto Scaling configuration.
  • EB logs can be sent to an S3 bucket
  • Can run other components (such as ElastiCache) side by side in EC2 instances
  • Access log files without logging into application servers
  • Push an application file (such as a WAR file) or Github repo to Elastic Beanstalk
  • AWS Toolkit for Visual Studio also available
  • Only modified git files are uploaded to Elastic Beanstalk
  • Elastic Beanstalk is designed to support multiple environments such as test/dev, staging and production
  • Each environment is configured separately and runs on it’s own AWS resources
  • Elastic Beanstalk stores and tracks application versions so apps can be rolled back to a prior state
  • Application files and optionally log files are stored in S3
  • If you are using the management console, git, AWS Toolkit for Visual Studio, an S3 bucket is created automatically and files are uploaded to this bucket
  • You can configure EB to upload log files to S3 every hour
  • S3 can also be used for application storage for items such as images, etc. Include the SDK as part of your application deployment
  • Elastic Beanstalk can automatically configure an RDS instance, environment variables are used to expose DB instance connection information to your application
  • Elastic Beanstalk is not fault tolerant amongst regions but can be configured to be multi AZ in a region for resilience
  • Your application is publicly available via app.elasticbeanstalk.com. As EB integrates with VPCs, you can configure security groups or NACLs to restrict access
  • Elastic Beanstalk supports IAM
  • You can SSH into EB instances if required
  • Amazon Linux AMI and Windows 2008/2012 are supported
  • OpsWorks is a service based on Chef that provides scripted and automated management of applications and their dependencies
  • Chef turns infrastructure into code
  • Chef provides scripting and automation for the building out of infrastructure
  • Infrastructure then becomes scriptable, testable and versionable like your applications
  • Chef has a client/server infrastructure
  • The Chef server stores all recipes and also configuration data
  • The Chef client is installed on all pieces of infrastructure you manage, such as servers and network devices
  • The client periodically polls the server for policy updates. If the client policy is out of date, remediation takes place
  • OpsWorks consists of two elements, stacks and layers
  • A stack is is a group of EC2 instances and related objects such as ELBs that are grouped together for a common purpose
  • A layer exists within a stack and is represented by such things as database or application layers
  • When you create a layer, instead of configuring everything manually, OpsWorks does this for you
  • There are 1 or more layers in a stack
  • An instance must be assigned to at least one layer
  • Preconfigured layers include database layers, applications, load balancing and caching
  • Instances are always Linux in Chef 11, Windows as well with Chef 12
  • ELBs can either be pre-existing or you can create a new one from the EC2 console
  • Add an EC2 instance to your layer (so the resource that runs your layer), select the appropriate instance type
  • Choose storage type – depending on the type of instance, it may only be possible to choose EBS backed rather than Instance Store
  • OpsWorks security groups are not deleted by default

3.2 Demonstrate ability to implement the right architecture for development, testing, and staging environments

  • CloudFormation can be used to provision an entire infrastructure such as VPCs, ELBs, EC2 instances, S3 buckets, Route53
  • Elastic Beanstalk provides a pre-built web application deployment environment so a developer can upload a web app and deploy it quickly without needing console access to the instances. Examples include Apache, PHP, .NET
  • Elastic Beanstalk is designed to support multiple concurrent environments such as test/dev, staging and production
  • Elastic Beanstalk is multi-AZ fault tolerant but not multi-region fault tolerant
  • Elastic Beanstalk can provision RDS instances, VPCs and also uses IAM
  • Elastic Beanstalk code is stored in S3, so can be replicated across regions and encrypted at rest. 11 9’s reliability
  • EB changes can be rolled back using version management
  • OpsWorks uses Chef to provision stacks and layers – provisions EC2 instances
  • Opsworks uses layers and stacks, layers go inside stacks and represent each dependency in the stack, such as ELB and application or RDS layer
  • OpsWorks can re-use existing EC2 instances, ELBs, VPCs
  • OpsWorks can perform auto-scaling on time of day or average CPU load

3.3 Position and select most appropriate AWS deployment mechanism based on scenario

  • Use case for CloudFormation is an entire infrastructure with many AWS components
  • CloudFormation itself is free, you just pay for EC2 instances, elastic IP addresses etc
  • Elastic Beanstalk provides quick pre-built application environments for developers to upload their code and can support concurrent test/dev, staging and production
  • Elastic Beanstalk is free, you pay for EC2 instances, S3 buckets etc
  • OpsWorks can leverage Chef to provision an application environment using EC2 instances and then pull updates from code repositories. It can auto scale and is free, you only pay for instances used
  • Chef, Puppet and bootstrap are supported deployment methods for CloudFormation, whereas OpsWorks uses only Chef
  • AWS Config is used to take a JSON snapshot of current services configuration and apply it as a configuration baseline (think of PowerShell DSC on Azure). Helps notify admins on configuration drift
    • Configure which resource types you want to record (not all are supported)
    • Select an S3 bucket in which to store the configuration snapshots (can be current account or linked account)
    • IAM is used to get read only access to resources and then output to S3
    • SNS can be used to send event notifications
    • Can see relationships with other resources such as EC2 instance, EBS volume, S3 bucket, etc.
    • Takes regular config snapshots that can be used for compliance, auditing and troubleshooting purposes

23-06-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 2.0 Costing

2.1 Demonstrate ability to make architectural decisions that minimize and optimize infrastructure cost

  • Costs can be controlled by using Reserved Instances if the required compute resource is fairly static and predictable – any excess capacity may be resold on the Marketplace. Reserved Instances have three payment models:-
    • All up front (up to 75% discount)
    • Partial upfront (middle discount)
    • No upfront (no discount, but still cheaper than On Demand)
    • Contract length is 1 or 3 years
  • Reserved Instances can be cost effective when the steady state load is known – purchasing RIs to service this requirement and then using On Demand or Spot for bursting can be an effective cost strategy
  • Reserved instances can be modified
    • Can be moved to another AZ
    • Change the instance type within the same family
    • Each instance size has a normalisation factor, which is a unitary value from 0.5 (micro) to 80 (10xlarge)
Instance size Normalization factor
micro 0.5
small 1
medium 2
large 4
xlarge 8
2xlarge 16
4xlarge 32
8xlarge 64
10xlarge 80
  • Spot instances can provide ad hoc compute power but are transitory and can be removed without notice if the spot price goes above your limit
  • On Demand instances run on a “pay as you go” model
  • Legacy apps that have burstable compute requirements are more efficiently run on instances with burstable CPU credits (T2 instances). Useful for legacy applications that do not support auto scaling
  • Instances use CPU credit balance which expires after 24 hours on a rolling basis. When not using full CPU, credits can accrue and be used to periodically burst
  • Instance types provide optimised instances for different types of workloads and may represent the best cost vs performance question when designing a deployment. Types include:-
    • T2 – Burstable Performance Instances – useful for general purpose workloads where CPU usage may need to spike on occasion. Balanced compute/network/storage instance. Lowest cost. Use cases – test/dev, small workloads, code repos, micro instances
    • M4 – General Purpose – Xeon Haswell CPU, balanced compute/networking/storage, EBS optimised by default and has enhanced networking option. Use cases – Small and mid-size databases, data processing tasks that require additional memory, caching fleets, and for running back end servers for SAP, Microsoft SharePoint, cluster computing, and other enterprise applications
    • M3 – Largely as above, but with Ivy Bridge generation CPUs, so one generation back from M4. Use cases – Small and mid-size databases, data processing tasks that require additional memory, caching fleets, and for running backend servers for SAP, Microsoft SharePoint, cluster computing, and other enterprise applications
    • C4 – Compute Optimised – EC2 specific Haswell CPU, EBS optimised by default, support for enhanced networking and clustering . Use cases – High performance front-end fleets, web-servers, batch processing, distributed analytics, high performance science and engineering applications, ad serving, MMO gaming, and video-encoding.
    • C3 – as above, but with Ivy Bridge generation CPUs, lowest price point per GB of RAM, SSD storage and enhanced networking support. Use cases – memory-optimized instances for high performance databases, distributed memory caches, in-memory analytics, genome assembly and analysis, larger deployments of SAP, Microsoft SharePoint, and other enterprise applications.
    • G2 – GPU Optimised – for GPU and enhanced graphics applications. Sandy Bridge CPUs, NVIDIA GPU with 4GB RAM, designed to support up to eight real-time HD video streams (720p@30fps) or up to four real-time full HD video streams (1080p@30fps),  high-quality interactive streaming experiences. Use cases – 3D application streaming, machine learning, video encoding, and other server-side graphics or GPU compute workloads.
    • I2 – High I/O Instances – Ivy Bridge CPUs, SSD Storage with TRIM support, support for enhanced networking, high random I/O performance. Use cases – NoSQL databases like Cassandra and MongoDB, scale out transactional databases, data warehousing, Hadoop, and cluster file systems.
    • D2 – Dense Storage Instances – Haswell CPUs, HDD storage, consistent high performance at launch time, high disk throughput, support for enhanced networking. Lowest price per disk throughput performance on Amazon EC2. Up to 48 TB of HDD-based local storage. Use cases – Massively Parallel Processing (MPP) data warehousing, MapReduce and Hadoop distributed computing, distributed file systems, network file systems, log or data-processing applications
  • EBS volume types are:-
    • General Purpose SSD (3 IOPS per GB with burstable ability, 1 GB – 16 TB) – “Better”
    • Provisioned IOPS SSD (up to 20,000 IOPS per volume, 4 GB – 16 TB) – “Best”
    • Magnetic (100 IOPS, burstable to hundreds, 500 GB – 16 TB) – “Good”
VPC only EBS only SSD volumes Placement group HVM only Enhanced networking
C3 Yes Yes Yes
C4 Yes Yes Yes Yes Yes
D2 Yes Yes Yes
G2 Yes Yes Yes
I2 Yes Yes Yes Yes
M3 Yes
M4 Yes Yes Yes Yes Yes
R3 Yes Yes Yes Yes
T2 Yes Yes Yes
X1 Yes Yes Yes Yes No

2.2 Apply the appropriate AWS account and billing set-up options based on scenario

  • How do you need to handle account management and billing?
  • Consolidated billing provides a way for a “master” account called the Paying Account to be responsible for a number of other AWS account’s bills (Linked Accounts)
  • There is a soft limit of 20 linked accounts but this can be upped by request
  • Advantages of consolidated billing
    • Single bill
    • Easy to track usage and payments
    • Volume pricing discounts across all your accounts combined
    • Reserved Instances not being used can be used to make On Demand instances cheaper. AWS will always apply the cheapest price.
  • You may have acquired a company that already use AWS, you can join them together for billing
  • You may also want to use different accounts for security separation
  • You can use cross account access to provide permissions to resources in other accounts
    • If you need a custom policy (say to provide read/write access to a specific S3 bucket), then create this first
    • Create a role with cross account access (role type) in the primary account IAM
    • Apply the policy to that role and note the ARN
    • Grant access to the role in the secondary account
    • Switch to the role
  • Configure MFA on your main billing root account, use strong passwords
  • Resources should not be deployed in the paying account, this should only really be used for admin purposes
  • Billing alerts can be enabled per account but when alerting is enabled on the paying account then all linked accounts are included
  • CloudTrail is enabled per region and works per AWS account
  • CloudTrail logs can be consolidated into an S3 bucket
    • Enable CloudTrail in the paying account
    • Create a bucket policy that allows cross account access
    • Enable CloudTrail in the other accounts and use the S3 bucket
  • Budgets feature can be used to set a budget for the AWS account(s) and to send alerts when the cost goes over or close to the allocated budget by a certain percentage
  • Budgets works in conjunction with CloudWatch and SNS to send alerts when costs reach a pre-set level
  • Budgets can be set at a granular level (by EC2, S3, etc.) or can be set as an aggregate value across all accounts and all resources
  • Notify by actual or forecasted costs
  • Budget creation then provides a dashboard of total amount spent versus budget amount
  • You can go over budget, these are not caps as such, but alert limits
  • Redshift uses on demand and reserved instances
  • EMR uses on demand and spot instances (RI discounts can be leveraged by starting EMR OD  instances in the same AZ and not having RI in use. AWS will apply the discount rates)

2.3 Ability to compare and contrast the cost implications of different architectures

  • Remember that different instance types exist so that appropriate workloads can be placed on instance types that provide the best performance at the best price point, so a G2 instance would be used for GPU workloads, for example
  • Unused Reserved Instances can be used to offset the cost of On Demand Instances
  • Linking several accounts can provide additional discounts for services such as S3, which is charged per GB
  • Use tags to provide granular billing per service. Tags work across multiple accounts that are linked using consolidated billing.
  • Resource groups are created using tags and values and show a view of all resources used grouped by tag, including costs.
  • If you need to make S3 content available to a single additional region, consider cross region replication rather than CloudFront. CRR is cheaper and less complex.
  • Can use bi-directional cross region replication so Site A can replicate a bucket to Site B and vice versa for global replication. Versioning provides the ability to use “Recycle Bin” type functionality by allowing two writes to the same object concurrently so nothing is lost, two versions of the file are created
  • Can replicate across buckets in different accounts, but IAM must allow the source bucket to write to the destination bucket. Then specify the IAM role you created with the creation policy when configuring CRR. PUTs and DELETEs are both replicated
  • Use different Edge Price Class Locations to save on cost. By default, content is replicated to all regions. US and EU only is cheaper but at the cost of higher potential latency to users in other regions
  • DNS queries for alias records are free of charge, CNAMEs are not

21-06-16

AWS Certified Solutions Architect Professional – Study Guide – Domain 1.0

Solutions-Architect-Professional

As I mentioned in my previous post, I made a lot of notes when I studied for my AWS SA Pro and I wanted to give something back by publishing them to the community for free. I’ve done this sort of thing before, and I find it very rewarding. The notes I made were taken from a variety of sources – I used some online training from acloud.guru and LinuxAcademy and supplemented it with QwikLabs hands on exercises and AWS Re:Invent videos on YouTube.

Please support the guys behind acloud.guru and LinuxAcademy by purchasing their courses. They’re both very good and very complementary to each other. They put a lot of time into developing the content and are priced very competitively.

This guide is not enough on it’s own to pass and indeed the points I noted may not make much sense to you, but you’re welcome to them and I hope they help you. I will publish each domain as a separate post as I need to do a bit of cleaning up and formatting before I can post them.

Finally, if you’re sitting the exam soon, good luck and I hope you pass!

Domain 1.0 High Availability and Business Continuity (15%)

1.1 Demonstrate ability to architect the appropriate level of availability based on stakeholder requirements

  • Stakeholder requirements is key phrase here – look at what the requirements are first before deciding the best way to architect the solution
  • What is availability? Basically up time. Does the customer need 99.99% up time or less? Which products may need to be used to meet this requirement?
  • Look at products which are single AZ, multi AZ and multi region. It may be the case that a couple of instances in a single AZ will suffice if cost is a factor
  • CloudWatch can be used to perform EC2 or auto scaling actions when status checks fail or metrics are exceeded (alarms, etc)

1.2 Demonstrate ability to implement DR for systems based on RPO and RTO

  • What is DR? It is the recovery of systems, services and applications after an unplanned period of downtime.
  • What is RPO? Recovery Point Objective. At which point in time do we need to get back to when DR processes are invoked? This would come from a customer requirement – when systems are recovered, data is consistent from 30 minutes prior to the outage, or 1 hour, or 4 hours etc. What is acceptable to the stakeholder?
  • What is RTO? Recovery Time Objective. How quickly must systems and services be recovered after invoking DR processes? It may be that all critical systems must be back online within a maximum of four hours.
  • RTO and RPO are often paired together to provide an SLA to end users as to when services will be fully restored and how much data may be lost. For example, an RTO of 2 hours and an RPO of 15 minutes would mean all systems would be recovered in two hours or less and consistent to within 15 minutes of the failure.
  • How can low RTO be achieved? This can be done by using elastic scaling, for example or using monitoring scripts to power up new instances using the AWS API. You may also use multi AZ services such as EBS and RDS to provide additional resilience
  • How can low RPO be achieved? This can be done by using application aware and consistent backup tools, usually native ones such as VSS aware ones from Microsoft or RMAN for Oracle, for example. Databases and real time systems may need to be acquiesced to obtain a crash consistent backup. Standard snapshot tools may not provide this. RMAN can backup to S3 or use point in time snapshots using RDS. RMAN is supported on EC2. Use data dump to move large databases.
  • AWS has multi AZ, multi region and services like S3 which has 11 nines of durability with cross region replication
  • Glacier – long term archive storage. Cheap but not appropriate for fast recovery (several hours retrieval SLA)
  • Storage Gateway is a software appliance that sits on premises that can operate in three modes – gateway cached (hot data kept locally but most data stored in S3), gateway stored (all data kept locally but also replicated to S3) and VTL-Tape Library (virtual disk tapes stored in S3, virtual tape shelf stored in Glacier)
  • You should use gateway cached when the requirement is for low cost primary storage with hot data stored locally
  • Gateway stored keeps all data locally but takes asynchronous snapshots to S3
  • Gateway cached volumes can store 32TB of data, 32 volumes are supported (32 x 32, 1PB)
  • Gateway stored volumes are 16TB in size, 12 volumes are supported (16 x 12, 192TB)
  • Virtual tape library supports 1500 virtual tapes in S3 (150 TB total)
  • Virtual tape shelf is unlimited tapes (uses Glacier)
  • Storage Gateway can be on premises or EC2. Can also schedule snapshots, supports Direct Connect and also bandwidth throttling.
  • Storage Gateway supports ESXi or Hyper-V, 7.5GB RAM, 75GB storage, 4 or 8 vCPU for installation. To use the Marketplace appliance, you must choose xlarge instance or bigger and m3, i2, c3, c4, r3, d2, or m4 instance types
  • Gateway cached requires a separate volume as a buffer upload area and caching area
  • Gateway stored requires enough space to hold your full data set and also an upload buffer
  • VTL also requires an upload buffer and cache area
  • Ports required for Storage Gateway include 443 (HTTPS) to AWS, port 80 for initial activation only, port 3260 for iSCSI internally and port 53 for DNS (internal)
  • Gateway stored snapshots are stored in S3 and can be used to recover data quickly. EBS snapshots can also be used to create a volume to attach to new EC2 instances
  • Can also use gateway snapshots to create a new volume on the gateway itself
  • Snapshots can also be used to migrate cached volumes into stored volumes, stored volumes into cached volumes and also snapshot a volume to create a new EBS volume to attach to an instance
  • Use System Resource Check from the appliance menu to ensure the appliance has enough virtual resources to run (RAM, vCPU, etc.)
  • VTL virtual tape retrieval is instantaneous, whereas Tape Shelf (Glacier) can take up to 24 hours
  • VTL supports Backup Exec 2012-15, Veeam 7 and 8, NetBackup 7, System Center Data Protection 2012, Dell NetVault 10
  • Snapshots can either be scheduled or done ad hoc
  • Writes to S3 get throttled as the write buffer gets close to capacity – you can monitor this with CloudWatch
  • EBS – Elastic Block Store – block based storage replicated across hosts in a single AZ in a region
  • Direct Connect – connection directly into AWS’s data centre via a trusted third party. This can be backed up with standby Direct Connect links or even software VPN
  • Route53 also has 100% uptime SLA, Elastic Load Balancing and VPC can also provide a level of resilience if required
  • DynamoDB has three copies per region and also can perform multi-region replication
  • RDS also supports multi-AZ deployments and read only replicas of data. 5 read only replicas for MySQL, MariaDB and PostGres, 15 for Aurora
  • There are four DR models in the AWS white paper:-
    • Backup and restore (cheap but slow RPO and RTO, use S3 for quick restores and AWS Import/Export for large datasets)
    • Pilot Light (minimal replication of the live environment, like the pilot light in a gas heater, it’s used to bring services up with the smallest footprint running in DR. AMIs ready but powered off, brought up manually or by autoscaling. Data must be replicated to DR from the primary site for failover)
    • Warm Standby (again a smaller replication of the live environment but with some services always running to facilitate a quicker failover. It can also be the full complement of servers but running on smaller instances than live. Horizontal scaling is preferred to add more instances to a load balancer)
    • Multi-site (active/active configuration where DNS sends traffic to both sites simultaneously. Auto scaling can also add instances for load where required. DNS weighting can be used to route traffic accordingly). DNS weighting is done as a percentage, so if two records have weightings of 10, then the overall is 20 and the percentage is 50% chance of either being used, this is round robin. Weights of 10 and 40 would mean a total of weight 50, with 1 in 5 chance of weight 10 DNS record being used
  • Import/Export can import data sets into S3, EBS or Glacier. You can only export from S3
  • Import/Export makes sense for large datasets that cannot be moved or copied into AWS over the internet in an efficient manner (time, cost, etc)
  • AWS will export data back to you encrypted with TrueCrypt
  • AWS will wipe devices after import if specified
  • If exporting from an S3 bucket with versioning enabled, only the most recent version is exported
  • Encryption for imports is optional, mandatory for exports
  • Some services have automated backup:-
    • RDS
    • Redshift
    • Elasticache (Redis only)
  • EC2 does not have automated backup. You can use either EBS snapshots or create an AMI Image from a running or stopped instance. The latter option is especially useful if you have an instance storage on the host which is ephemeral and will get deleted when the instance is stopped (Bundle Instance). You can “copy” the host storage for the instance by creating an AMI, which can then be copied to another region
  • To restore a file on a server for example, take regular snapshots of the EBS volume, create a volume from the snapshot, mount the volume to the instance, browse and recover the files as necessary
  • MySQL requires InnoDB for automated backups, if you delete an instance then all automated backups are deleted, manual DB snapshots stored in S3 are not deleted
  • All backups are stored in S3
  • When you do an RDS restore, you can change the engine type (SQL Standard to Enterprise, for example), assuming you have enough storage space.
  • Elasticache automated backups snapshot the whole cluster, so there will be performance degradation whilst this takes place. Backups are stored on S3.
  • Redshift backups are stored on S3 and have a 1 day retention period by default and only backs up delta changes to keep storage consumption to a minimum
  • EC2 snapshots are stored in S3 and are incremental and each snapshot still contains the base snapshot data. You are only charged for the incremental snapshot storage

1.3 Determine appropriate use of multi-Availability Zones vs. multi-Region architectures

  • Multi-AZ services examples are S3, RDS, DynamoDB. Using multi-AZ can mitigate against the loss of up to two AZs (data centres, assuming there are three. Some regions only have two). This can provide a good balance between cost, complexity and reliability
  • Multi-region services can mitigate failures in AZs or individual regions, but may cost more and introduce more infrastructure and complexity. Use ELB for multi-region failover and resilience, CloudFront
  • DynamoDB offers cross region replication, RDS offers the ability to snapshot from one region to another to have read only replicas. Code Pipeline has a built in template for replicating DynamoDB elsewhere for DR
  • Redshift can snapshot within the same region and also replicate to another region

1.4 Demonstrate ability to implement self-healing capabilities

  • HA available already for most popular databases:-
    • SQL Server Availability Groups, SQL Mirroring, log shipping. Read replicas in other AZs not supported
    • MySQL – Asynchronous mirroring
    • Oracle – Data Guard, RAC (RAC not supported on AWS but can run on EC2 by using VPN and Placement Groups as multicast is not supported)
  • RDS has multi-AZ automatic failover to protect against
    • Loss of availability in primary AZ
    • Loss of connectivity to primary DB
    • Storage or host failure of primary DB
    • Software patching (done by AWS, remember)
    • Rebooting of primary DB
    • Uses master and slave model
  • MySQL, Oracle and Postgres use physical layer replication to keep data consistent on the standby instance
  • SQL Server uses application layer mirroring but achieves the same result
  • Multi-AZ uses synchronous replication (consistent read/write), asynchronous (potential data loss) is only used for read replicas
  • DB backups are taken from the secondary to reduce I/O load on the primary
  • DB restores are taken from the secondary to avoid I/O suspension on the primary
  • AZ failover can be forced by rebooting your instance either via the console or via the RebootDBInstance API call
  • Multi-AZ databases are used for DR, not as a scaling solution. Scale can be achieved by using read replicas, this can be done via the AWS console or by using the CreateDBInstanceReadReplica API call
  • Amazon Aurora employs a highly durable, SSD-backed virtualized storage layer purpose-built for database workloads. Amazon Aurora automatically replicates your volume six ways, across three Availability Zones. Amazon Aurora storage is fault-tolerant, transparently handling the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability. Amazon Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and replaced automatically.
  • Creating a read replica means a snapshot of your primary DB instance, this may result in a pause of about a minute in non multi-AZ deployments
  • Multi-AZ deployments will use a secondary for a snapshot
  • A new DNS endpoint address is given for the read only replica, you need to update the app
  • You can promote a read only replica to be a standalone, but this breaks replication
  • MySQL and Postgres can have up to 5 replicas
  • Read replicas in different regions for MySQL only
  • Replication is asynchronous only
  • Read replicas can be built off Multi-AZ databases
  • Read replicas are not multi-AZ
  • MySQL can have read replicas of read replicas, but this increases latency
  • DB Snapshots and automated backups cannot be taken of read replicas
  • Consider using DynamoDB instead of RDS if your database does not require:-
    • Transaction support
    • Atomicity
    • Consistency
    • Isolation
    • Durability
    • ACID (durability) compliance
    • Joins
    • SQL