Exam 70-740 : Installation, Storage, and Compute with Windows Server 2016 : Exam Tips and Feedback


I sat and passed the above exam today, and as there seems to be a total lack of information on this exam out there (aside from what’s in the exam blueprint), I thought I would pass on the benefit of my experience and offer a few tips for folks out there planning on taking it.

The 70-740 forms part of the three exams needed to fulfil MCSA : Server 2016 (the other two being 70-741 and 70-742) if you don’t already have a 2012 MCSA, which I don’t. The first exam concentrates on installation, storage and compute as you may guess by the title of the post.

The exam itself is 47 questions and the time allotted is 120 minutes. It seems that gone are the days when you got really verbose scenarios and then asked if the equally wordy solutions matched the requirements. The questions today were pretty concise, and that included ones where a scenario was given and potential solutions offered.

If you’ve done any of the recent MCSA exams such as Office 365, I thought the questions were even shorter than that, so perfect for someone like me with the attention span of a gnat.

In terms of the question formats, there are the usual drag and drops (3 from 6, say), drop down complete the PowerShell commands, select a couple of correct answers from 6 or 8 and then the right answer from 6 or 8. The focus of the questions is pretty faithful to the exam blueprint and you should study (amongst others) on the following areas :-

  • NLB
  • Storage Spaces
  • Nano Server installation and customisation
  • Product activation models
  • Remote PowerShell sessions
  • Hyper-V (create VMs, create VHDs, limitations of nested virtualisation)
  • Failover Clustering (Shared VHDXs, monitoring, live migration requirements)
  • NTFS vs ReFS
  • iSCSI
  • Storage Replicas
  • Containers and Docker commands

The exam took me around 35-40 minutes and I managed to pass with a 796, which was a pleasant surprise as I’d been studying quite piecemeal for the exam and a lot of my answers in the exam itself were educated guesses. As usual, if you aren’t sure, rule out the ones you know can’t be right and then play the percentages after that. Also, a lot of PowerShell answers revolve around a “Get-Something” and “Set-Something” structure, so that may help if you’re not sure.

On now to 70-741 and hopefully I can wrap up the remaining two MCSA exams fairly quickly. Good luck if you’re sitting this one soon. In terms of study resources, I used PluralSight, bought the MS Press Study Guide (use the code MCPEBOOK for 50% off the eBook version) and used a lot of Technet articles and also Hands On Labs to lab stuff I couldn’t quite get my head around.



AWS Specialty Beta Exams – Feedback and Tips




At the end of last week, I completed all three new AWS beta “specialty” exams. For those not aware, AWS are bringing in three new certifications to complement the existing five that have been around for a while. The new exams focus on specific technology areas:-

There was a special offer running during the beta in that the exams were half the usual price, plus a free resit if you don’t get past them the first time. It’s difficult to say at what level these are pitched – in general, a lot of the content seemed “Pro” level to me, certainly you need to know a lot more than the Associate exams.

The exams themselves were the Pro length 170 minutes, with varying numbers of questions. The Networking exam had something like 130, the Security I think was 106 and the Big Data 100. The questions were typical wordy type AWS questions with some of the usual favourite key words such as “resilient” and “cost optimal”. Certainly from a format perspective, there’s nothing really new here. Of the three, I think I did best on the Security exam, followed by a borderline Networking exam and Big Data trailing in a very distant last. There were a lot of terms in that exam I’d never even heard of before!

Results are due at the end of March which is when the beta collation period ends. I have no expectation on the Networking and Big Data exams, but then again you never know how these things are going to be scored and evaluated. The Security exam I felt went quite well, but who knows?

With respect to the content, these were the key takeaway areas :-


  • Direct Connect – tons of questions on this.
  • VPNs
  • VPC, including peering – what is and isn’t possible (between accounts in the same region, etc.)
  • BGP – including prepending and MED
  • Routing – both static and dynamic
  • Routing tables
  • Route propagation
  • DHCP Option Sets
  • NAT (gateways and instances)
  • S3 Endpoints
  • CloudFront
  • Jumbo Frames
  • Network optimised instances


  • IAM (users, groups, roles, policies)
  • Encryption (data in flight and at rest – disk encryption, VPN etc)
  • Database encryption (TDE, disk encryption)
  • KMS
  • CloudHSM
  • CloudTrail and CloudWatch
  • Federation with other sources, SAML, Cognito etc
  • AssumeRole and how that works
  • Tagging
  • S3 (versioning, MFA delete)
  • IAM Access keys

Big Data

  • EMR
  • RedShift (including loading and unloading data to S3, performance issues loading files, Avro, gzip etc.)
  • Pig
  • Hive
  • Hadoop
  • iPython
  • Zeppelin
  • DynamoDB (make sure you understand partitioning, performance issues and indexes – global and local)
  • QuickSight
  • Machine Learning (including models)
  • RDS
  • Lambda
  • S3 ETags
  • Kinesis
  • ElasticSearch
  • Kibana
  • IoT
  • API Gateway
  • Kafka
  • Encryption (TDE, CloudHSM,KMS)

As you can see from above, the focus may be relatively narrow, but you do need to understand things pretty well. I wouldn’t say you need to go right into deep depth in the exam questions, but you certainly need to know each of the topics listed above and really what they can and can’t do. From there, you should be able to work out what you think is the right answer.

So now we wait until the end of March, I expect and am prepared to sit all three again as we continue on the never ending treadmill that is IT certification 😉

Study materials included acloud.guru as usual and also the AWS YouTube channel. The Re:Invent 300 and 400 level videos are really good preparation for the exams as they go into some decent depth.

Any comments or questions, please feel free to hit me up on Twitter.



AWS Certified DevOps Engineer Professional – Exam Experience & Tips


I managed to find the time yesterday to sit the above exam before the end of the year to reach my goal of holding all five current AWS certifications. There isn’t a lot out there about this exam, so as usual I thought I would try to pass on the benefit of my experiences for others planning to sit this one.

The exam is 80 questions over 170 minutes. I finished with about 20 minutes to spare and passed barely with a 66%, but as we always say – a pass is a pass! Looking back over the score report, there are four domains tested in the exam:-

  • Domain 1: Continuous Delivery and Process Automation
  • Domain 2: Monitoring, Metrics, and Logging
  • Domain 3: Security, Governance, and Validation
  • Domain 4: High Availability and Elasticity

I managed to score really well on domains 1, 3 and 4 (between 75% and 85%0, but really bombed on domain 2, which really surprised me. This domain focusses mainly on CloudWatch, so it goes without saying that I didn’t know it as well as I thought I did!

Like all the other AWS exams, the questions are worded in a very specific way, and it can take time to read and re-read the questions to truly understand what is being asked. I wouldn’t worry too much about time running out, some of the questions are quite short but you need to look for key words in the questions – such as “cost-effective”, “fault tolerant” and “efficient”. This can help you rule out the obviously incorrect answers.

In terms of what you need to know, I’d say the following :-

  • Domain 1: CloudFront (templates, custom resources), OpsWorks (lifecycles), Elastic Beanstalk (platform support, scaling, Docker), SQS, SNS, Data Pipeline (I was surprised to see this feature in the exam as I figured it was being phased out in favour of Lambda), SWF, bootstrapping
  • Domain 2: CloudWatch, CloudTrail (what it can and can’t do), CloudWatch Logs (Log streams, Log filters, Log agent), EMR
  • Domain 3: IAM (Roles, users, STS, AssumeRole(s))
  • Domain 4: Load balancing, auto scaling, EC2, S3, Glacier, EBS, RDS,  DynamoDB, instance types

And for what I used for study, use your AWS account and the free tier entitlement to much around with all the services. There are loads of walkthroughs in the documentation and provided you don’t leave massive instances running 24/7 it should only cost you pennies to use.

The A Cloud Guru course is well worth the investment of time and money – Adrian and Nick do a great job of taking you through most of what you need to know for the exam. I did find that there wasn’t as much DynamoDB content on the exam as I was expecting, not that I’m complaining because a lot of how it works still really mashes my head!

There are lots of good videos on YouTube, from Re:Invent conferences from years gone by which go into a lot of depth. I can also recommend Ian Massingham’s CloudFormation Masterclass video as a good refresher/primer for CF.

Difficulty wise, it’s definitely a tough exam, don’t let anyone tell you otherwise. 80 questions is a lot and many of them are very verbose in both the question and the answers. I’d say it’s not as tough as the Solutions Architect Pro as it doesn’t cover as broad a range of topics, but you can’t really wing it.

I hope this article helps anyone doing this exam any time soon. I’m going to enjoy being part of the “All 5” club for as long as it lasts (the three “Specialty” exams are coming up early next year, I’ve registered to sit all the betas).



Linux Foundation Certified System Administrator  – Exam Experience & Tips


I’ve recently gone through the process of sitting the LFCSA exam and one thing I noticed as I was studying was the almost total lack of blogs and articles about this certification and the exam. There are some good courses from Pluralsight and Linux Academy, but not a great deal else. As such, I thought I would drop a few thoughts down in case it helps someone else.

Firstly, what is the LFCSA and why should I sit it? Well firstly it’s vendor agnostic (but not), you get to choose which distro you’d like to certify on – SUSE, CentOS or Ubuntu. I chose CentOS as in my experience, it’s the most common distro in use in the enterprise right now, if you leave out the paid Red Hat equivalent. If you didn’t know, CentOS is in essence the “free” version of Red Hat, so if you know one, you know the other, so to speak. This puts you in a really good place skills wise.

Secondly, now that Microsoft have hugged the penguin (and no, that’s not a euphemism!), if you pass both the 70-533 (Azure Operations) and LFCSA exam, you qualify for the MCSA : Linux on Azure certification. As far as I can tell, there don’t seem to be a whole lot of people around who have that at the moment.

On to the exam itself. It’s currently priced at $300 and you get a resit included if you fail the first time. I thought initially the exam cost was quite high for an entry level exam, but when you factor in the resit, it’s actually not bad value for money. Also, until the 22nd December, you can save 50% on exam vouchers by using the code HOLIDAY50. Vouchers last for a year, so well worth buying now, even if you don’t plan to sit the exam until next year sometime.

The exam is all on line proctored, so you don’t need to find a test centre in the back of beyond that looks like an office block in Pripyat.

article-2257256-16c0b4f4000005dc-636_964x653A typical exam centre waiting room. In Pripyat. Possibly.

You perform all the registration via the Linux Foundation website and then click through to the exam delivery company (much the same as you do from Microsoft to Pearson). There is a short wait time while you are checked in, and much like the Microsoft process, you are asked to do a 360 degree view of the room with your webcam and desk area. All the testing requirements are the same for Microsoft online proctored exams, so nothing new here. In fact, if anything, it’s far less stringent. No turning out your pockets, rolling out your sleeves or reciting the Catalina Magdalena Lupensteiner Wallabeiner song.

From here, the exam kicks off and you’re monitored via webcam, as is typical for these things. The exam itself comprises 25 questions in 2 hours, so tight for time, even if you know your stuff. If you’ve sat VCAP/VCIX exams, you should know by now how to manage your time during these exams. You can go backwards and forwards between the questions and I can’t recall there being any dependencies between the answers, so if you don’t answer question 1, it doesn’t prevent you answering question 2, for example.

My tip? Scroll through the questions and answer the “easy” ones first. Some questions have a single objective, some have three or four sub-objectives. If you’re not massively confident, go for the low hanging fruit first. Remember this is a practical exam, so even if you only part answer a question, you will still get credit for it.

The exam screen is in two halves, the left pane has the question panel and the right pane has a terminal session. And no, you don’t have GUI access, so get that command line stuff learned!

In terms of content, I went through the Linux Academy material and found it matched the exam blueprint pretty well. I also dipped in and out of the PluralSight videos to plug the gaps in my knowledge. Also, labbing stuff and trying it out has no substitute (or as I have expressed it in the past, “labbing the shit out of it”). Either use your home lab or burn some AWS Free Tier or Azure MSDN credit. CentOS boxes should be extremely cheap to run as there’s no licence cost, you’re just paying compute and storage fees.

So what should you know? Well of course I’m restricted by NDA, but as I said, look at the exam blueprint and how the domains are weighted. There is a high value placed on “Essential Commands” and “Operation of Running Systems”, so take it from the blueprint to mean :-

  • Redirection of files
  • Core commands such as ls, echo, cp, rm, find, sed, sudo, etc.
  • Know how to use vim!
  • Creating, extracting and types of archives
  • Install and remove software
  • Write a basic shell script
  • Manipulation of users
  • Storage commands including LVM and mount
  • Creation, deletion and maintenance of user and group accounts
  • Firewalls, services and startup commands
  • KVM virtualisation commands (virsh, etc.)

Years ago, I sat and passed the old SUSE CLP entry level exam and I have to say, I thought the LFCS exam was pitched at a lower level than that. It’s not massively taxing, but that being said, I’m still awaiting my score report so I could have failed it! The SLA is 72-75 hours from the end of your exam for your results. The first time around, I broke the lab VM and so had to quit and therefore fail the exam! Oddly, the questions in the resit were more or less the same as the first time around, which makes me think the question pool is not all that large.

Also, don’t think you can wing it with man pages. You simply don’t have time to wade through all of that if you don’t know the fundamentals. One useful tip is to use tab complete. Not only does this save small amounts of time, but if you’re also not quite sure about what the command is, but you know the first couple of letters of it, tab complete will give you a list of matching commands. I found that quite useful a couple of times.

Hopefully this has proved useful, as I said previously, there’s not a lot of content out there about this certification or exam apart from the aforementioned training courses. Here are some useful study resources:-

Good luck if you’re sitting this any time soon, fingers crossed I will get a positive result!

Update 18/12/16 – I just found out I passed with 85%. Well happy with that!


Azure VMs – New Auto-Shutdown Feature Sneaked In (almost!)

I saw the news the other day that Azure Backup has now been included on the VM management blade in the Azure Portal, which is great news as you don’t want to be jumping around in the portal to manage stuff where you don’t need to. However, one feature I notice that appears to have sneaked into the VM management blade without any fanfare at all is the ability to auto schedule the shutdown of a virtual machine.

Many customers request the function of shutting down virtual machines during off hours in order to save cost once any backups and scheduled maintenance tasks have occurred. Previously this would have to be done by using Azure Automation to execute a run book to shut down VMs. This is fine and a valid way of doing this, but on larger estates ends up being a costed feature as the time taken to run the run books exceeds the free tier allowances.

This typical requirement has obviously found it’s way back the product management team at Microsoft and in order to make it a lot easier when spinning up VMs to enable this, it’s been added to the standard VM management blade, as shown below:-


As far as I can tell, this feature is either not in use yet or is only available in a small number of regions, ahead of a broader roll out. I tried it on VMs in UK South and North Europe, only to see this message :-


And trying to read between the lines of the error message, will this feature allow starting the VM too? You’d have to hope so! I did ping Azure Support on Twitter to see when this feature would be fully available in the UK/EU and got a very speedy response (thanks, chaps!):-



So stay tuned for this feature being enabled at some point in the near future. I’d also assume there will be some corresponding PowerShell command to go with it, so that you can add it to scripted methods of deploying multiple virtual machines.


Achievement Unlocked : MCSA Office 365


I’m pleased to say that after a couple of attempts at 70-347, I successfully passed my MCSA : Office 365 last night. For those looking at doing this certification in the near future, I just wanted to pass on the benefit of my experience. You may think, like me, that Office 365 is a pretty straight forward suite of software. In some respects, it is. It’s pretty much the same Exchange, Office, Sharepoint, etc. that you’ve always been used to, but with the additions in this exam of knowing things like subscription plan differences, AD sync and much more.

Out of the two, I found the first exam 70-346 much easier. This in some ways lures you into a false sense of security in thinking the second will be much the same. This is really where I came unstuck. I got a little bit carried away and perhaps didn’t put quite as much effort as I should have done into my study and got a bit of a kicking in the end.

Once I dusted myself down and went back over the parts I didn’t know on the exam, I felt a lot more confident last night but I still took out the insurance policy of the Microsoft Booster Pack, which is an exam voucher plus 4 resits. Yes it’s more expensive, but it takes out the risk of running up large exam bills and takes the pressure off a bit too. The promotion runs until the end of this month, so if you want to take advantage, you’d better be quick.

Anyway, each exam was around 52 questions, a couple of case studies thrown in but most were the usual drag and drop, order a list, multiple choice type formats. If you’ve sat Microsoft exams before, there shouldn’t be anything in there about the format that should surprise you.

So then, what to study?

  • PowerShell, PowerShell, PowerShell. You’ll get battered on this. Know common switches for things like user manipulation, mailbox settings, mobile devices, Lync configuration etc
  • Make sure you know all of the different Exchange migration methods and when to use them, what their advantages and disadvantages are (cutover, staged, remote move, IMAP, etc.)
  • Know the permissions model of SharePoint well – how to give anonymous access, how to remove it and how to set up site collection hierarchies
  • Install and play with AD Connect and make sure you understand how it works and how you can use it in a hybrid environment, same goes for ADFS if you don’t know that well
  • Know what integrates with Skype for Business
  • Know the plan differences well, especially Enterprise and Small Business plans. Know what is included and what isn’t
  • Did I mention PowerShell?

Resources I used :-

  • Microsoft MVA training – Managing Office 365 Identities and Services. A little dated now but still very useful
  • CBT Nuggets – very concise course giving you most of the information you need to know
  • Pluralsight – A bigger deep dive into things like SharePoint sites and administration, which was a gap for me initially

Good luck if you’re sitting this any time soon, just don’t underestimate it or it will bite you on the arse!



Office 365 Features – Quick Reference Matrix

I’ve been doing quite a bit with Office 365 lately, and I always get confused as to what services come under which plan (typical Microsoft!). You’ll also be asked about this if you’re doing the Office 365 MCSA exams (70-346 and 70-347), so well worth knowing if just for that.

The gist of it :-

  • Exchange Online, Sharepoint and Office Online (Office Web Apps) is available on every plan
  • Exchange Online, Sharepoint, Skype for Business (Lync) and OneDrive for business is available on every plan except K1 plans
  • Office ProPlus requires E3, E4 or E5 Enterprise plans
  • Yammer is included, but with caveats (see notes table)

The matrix below has been lifted from Microsoft’s site and is current as of the time of this post. Beware this can and probably will change!

   Project Online is not included, but can be purchased as a separate add-on service or added for free to the Office 365 Education plan.
2   Yammer Enterprise is not a component of Office 365 Government, but may be acquired at no cost as a standalone offer for each user licensed for Office 365 Government Plan E1, E3, E4, and K1. This offer is currently limited to customers which purchase Office 365 Government under Enterprise Agreement and Enterprise Subscription Agreements.
3   Azure RMS is not included, but can be purchased as a separate add-on service or added for free to the Office 365 Education plan.
4    To learn more about which RMS features are included with Office 365 plans, see Comparison of Rights Management Services (RMS) Offerings .
5   Office 365 Enterprise E5 contains Cloud PBX, PSTN Conferencing, and PSTN Calling capability. To implement PSTN Calling requires an additional plan purchase (either Local or Local and International).

Hope this helps!


Azure VNet Peering Preview Now Available




One of the networking features that I liked AWS over Azure for was the ease of peering VPCs together. As a quick primer, an AWS VPC is basically your own private cloud within AWS, with subnets and instances and all that good stuff. Azure VNets are very similar in that they are a logical grouping of subnets, instances, address spaces, etc. Previously, to link VNets together, you had to use a VPN connection. That’s all well and good, but it’s a little bit clunky and in my opinion, is not as elegant as VPC peering.

Anyway, Microsoft has recently announced that VNet peering within a region is now available as a preview feature. This means that it’s available for you to try out, but be warned it’s pre-release software (much like a beta programme) and it’s a bit warts and all. It’s not meant to be used for production purposes and it is not covered by any SLAs.

The benefits of VNet peering include:-

  • Eliminates need for VPN connections between VNets
  • Connect ASM and ARM networks together
  • High speed connectivity across the Azure backbone between VNets

Many of the same restrictions that govern the use of VPC peering in AWS apply here too to VNet peering, including:-

  • Peering must occur in the same region
  • There is no transitive peering between VNets (VNet A is peered to VNet B but not to VNet C. VNet B is peered to VNet C but VNet A has no peer to VNet C)
  • There must be no overlap in the IP address space

While VNet peering is in preview, there is no charge for this service. Take a look at the documentation and give it a spin, in the test environment, obviously 😉



AWS : Keeping up with the changes


As we all know, working in the public cloud space means changes in the blink of an eye. Services are added, updated (and in some cases, removed) at short notice and it’s vital from not just a Solutions Architect’s perspective but from an end user or operational standpoint that we keep up to date with these announcements, as and when they happen.

In days of old, we’d keep an eye on a vendor’s annual conference when they’d reveal something cool in their keynote, with a release on that day or to follow shortly after. In the public cloud, innovation happens much quicker and it’s no longer a case of waiting for “Geek’s Christmas”.

To that end, today I was pointed towards the AWS “What’s New” blog, which in essence is a change log for AWS services. Yesterday alone lists 8 announcements or service updates.

It’s a site well worth bookmarking and reviewing on a regular basis, I’d suggest weekly if you have time. If you’re designing AWS infrastructures or running your business on AWS, you need to know what’s on the roadmap so you can plan accordingly.

You can visit the What’s New blog site here.



AWS Certified Solutions Architect Professional – Study Guide – Domain 8.0: Cloud Migration and Hybrid Architecture (10%)


The final part of the study guide is below – thanks to all those who have tuned in over the past few weeks and given some very positive feedback. I hope it helps (or has helped) you get into the Solutions Architect Pro club. It’s a tough exam to pass and the feeling of achievement is immense. Good luck!

8.1 Plan and execute for applications migrations

  • AWS Management Portal available to plug AWS infrastructure into vCenter. This uses a virtual appliance and can enable migration of vSphere workloads into AWS
  • Right click on VM and select “Migrate to EC2”
  • You then select region, environment, subnet, instance type, security group, private IP address
  • Use cases:-
    • Migrate VMs to EC2 (VM must be powered off and configured for DHCP)
    • Reach new regions from vCenter to use for DR etc
    • Self service AWS portal in vCenter
    • Create new EC2 instances using VM templates
  • The inventory view is presented as :-
    • Region
      • Environment (family of templates and subnets in AWS)
        • Template (prototype for EC2 instance)
          • Running instance
            • Folder for storing migrated VMs
  • Templates map to AMIs and can be used to let admins pick a type for their deployment
  • Storage Gateway can be used as a migration tool
    • Gateway cached volumes (block based iSCSI)
    • Gateway stored volumes (block based iSCSI)
    • Virtual tape library (iSCSI based VTL)
    • Takes snapshots of mounted iSCSI volumes and replicates them via HTTPS to AWS. From here they are stored in S3 as snapshots and then you can mount them as EBS volumes
    • It is recommended to get a consistent snapshot of the VM by powering it off, taking a VM snapshot and then replicating this
  • AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premise data sources, at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR).
  • AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premise data silos
  • Pipeline has the following concepts:-
    • Pipeline (container node that is made up of the items below, can run on either EC2 instance or EMR node which are provisioned automatically by DP)
    • Datanode (end point destination, such as S3 bucket)
    • Activity (job kicked off by DP, such as database dump, command line script)
    • Precondition (readiness check optionally associated with data source or activity. Activity will not be done if check fails. Standard and custom preconditions available- DynamoDBTableExists, DynamoDBDataExists, S3KeyExists, S3PrefixExists, ShellCommandPrecondition)
    • Schedule
  • Pipelines can also be used with on premises resources such as databases etc
  • Task Runner package is installed on the on premises resource to poll the Data Pipeline queue for work to do (database dump etc, copy to S3)
  • Much of the functionality has been replaced by Lambda
  • Setup logging to S3 so you can troubleshoot it

8.2 Demonstrate ability to design hybrid cloud architectures

  • Biggest CIDR block you can have is a /16 and smallest is /28 for reservations
  • First four IP addresses and last one are reserved by AWS – always 5 reserved
    • – Network address
    • – Reserved for VPC router
    • – Reserved by AWS for DNS services
    • – Reserved by AWS for future use
    • – Reserved for network broadcast. Network broadcast not supported in a VPC, so this is reserved
  • When migrating to Direct Connect from a VPN, make the VPN connection and Direct Connect connection(s) as part of the same BGP area. Then configure the VPN to have a higher cost than the Direct Connect connection. BGP route prepending will do this as BGP is a metric based protocol. A single ASN is considered a more preferable route than an ASN with three or four values
  • For applications that require multicast, you need to configure a VPN between the EC2 instances with in-instance software, so the underlying AWS infrastructure is not aware of it. Multicast is not supported by AWS
  • VPN network must be a different CIDR block than the underlying instances are using (for example 10.x address for EC2 instances and 172.16.x addresses for VPN connection to another VPC)
  • SQL can be migrated by exporting database as flat files from SQL Management Studio, can’t replicate to another region or from on premises to AWS
  • CloudSearch can index documents stored in S3 and is powered by Apache SOLR
    • Full text search
    • Drill down searching
    • Highlighting
    • Boolean search
    • Autocomplete
    • CSV,PDF, HTML, Office docs and text files supported
  • Can also search DynamoDB with CloudSearch
  • CloudSearch can automatically scale based on load or can be manually scaled ahead of expected load increase
  • Multi-AZ is supported and it’s basically a service hosted on EC2, and these are how the costs are derived
  • EMR can be used to run batch processing jobs, such as filtering log files and putting results into S3
  • EMR uses Hadoop which uses HDFS, a distributed file system across all nodes in the cluster where there are multiple copies of the data, meaning resilience of the data and also enables parallel processing across multiple nodes
  • Hive is used to perform SQL like queries on the data in Hadoop, uses simple syntax to process large data sets
  • Pig is used to write MapReduce programs
  • EMR cluster has three components:-
    • Master node (manages data distribution)
    • Core node (stores data on HDFS from tasks run by task nodes and are managed by the master node)
    • Task nodes (managed by the master node and perform processing tasks only, do not form part of HDFS and pass processed data back to core nodes for storage)
  • EMRFS can be used to output data to S3 instead of HDFS
  • Can use spot, on demand or reserved instances for EMR cluster nodes
  • S3DistCp is an extension of DistCp that is optimized to work with AWS, particularly Amazon S3. You use S3DistCp by adding it as a step in a cluster or at the command line. Using S3DistCp, you can efficiently copy large amounts of data from Amazon S3 into HDFS where it can be processed by subsequent steps in your Amazon EMR cluster
  • Larger data files are more efficient than smaller ones in EMR
  • Storing data persistently on S3 may well be cheaper than leveraging HDFS as large data sets will require large instances sizes in the EMR cluster
  • Smaller EMR cluster with larger nodes may be just as efficient but more cost effective
  • Try to complete jobs within 59 minutes to save money (EMR billed by hour)