AWS Certification – Changes To Resit Policies?



As I tweeted at the end of last week, I failed the AWS Advanced Networking exam on Friday and I was looking earlier to see when I could reschedule this and jump back on the horse. Originally when I first started sitting AWS exams back in the dark depths of December 2015, you could sit an exam three times before you had to wait 12 months to sit it again.

As you can imagine, sitting my SA Pro exam at the third time of asking was pressure enough but also to have that sword hanging over my head just made the situation practically unbearable. I’m pleased to note that when I logged into the Training and Certification portal this morning, the resit policy has been relaxed quite a bit. From three attempts in a single year, all exams now have the following terms :-

  • You can sit any AWS exam a total of 10 times (Initial sitting plus 9 retakes)
  • You must wait 14 days after any failed attempt before you can register for a resit
  • The maximum number of exam sittings in a 12 month period seems to have been removed

This is a much better approach for test sitters and takes some of the pressure off. It also makes sense from AWS’s point of view as they can generate more revenues from exams now. I’m not sure when this policy changed (I quickly Googled it and found nothing), but it’s well worth knowing if you’re sitting any exams soon.

As regards the maximum sittings in a single year, if you need more than 10 attempts, it’s probably safe to say you should consider something a bit different. ūüėČ

Screen grab from the T&C portal showing the new resit policy for all exams




Event Review ‚Äď Google Cloud Next London ‚Äď Day¬†Two

Seenit demo

The day 2 keynote started with an in depth discussion of Cloud Spanner, as mentioned previously. AWS and Azure provide highly scalable and highly tunable NoSQL services in the form of DynamoDB etc, but when it comes to more traditional “meat and potatoes” RDBMS solutions, they are constrained by the limitations of the products they use, such as MySQL, SQL Server, Postgres, etc.

Cloud Spanner is different as it is a fully scalable RDBMS solution the cloud that offers all the same benefits as the NoSQL solutions in Azure and AWS. Much of the complexity of sharding the database and replicating it globally has been taken care of within Cloud Spanner. Automatic tuning is also done over time by background algorithms.

Cloud Spanner will be GA’d on May 16th and well worth a look if you have ACID database requirements at scale.

A representative from The Telegraph was brought up to discuss how GC’s data solutions allow them to perform very precise consumer targeting using analytics. It was also worth noting that they are a multi-cloud environment, using best of breed tools depending on the use case. Rare and ballsy!

An example of the powerful Google APIs available was then demonstrated by a UK startup called Seenit. They use the Google Video Intelligence API to automatically tag videos that are uploaded to their service. Shazam then came up on stage to discuss their use of the Google Cloud platform and to share some of the numbers they have for their service.

Shazam by numbers

As you can see from the picture above, there have been over a billion downloads for the app and more than 300 million daily active users. Those numbers take some processing! One of the key takeaways for Shazam was that in some cases, traffic spikes can be predicted, such as at a major sporting event or during a Black Friday sale. This is less the case with Shazam, so they have to have an underlying platform that can be resilient to these spikes.

There was a demo of GPU usage in the cloud, around the use case of rendering video. The key benefit of cloud GPU is that you can harness massive scalability at a fraction of the cost it would take to provision your own kit. Not only that, but consumption based charging means that you only pay for what you use, making it a highly cost effective option.

For the final demo of the keynote, there was a show and tell around changes coming to G Suite. This includes Hangouts, which has had some major engineering done to it. It will support call in delegates, a Green Room to hold attendees before the meeting starts and also the support for a new device called the Jamboard. This is a touch screen whiteboard that can be shared with delegates in the Hangouts meeting who can also interact with the virtual whiteboard, making it a team interactive session. Jamboards are still not available, but expect them to cost a few thousand pounds/dollars on release.

One of the new aspects of G Suite that I liked was the addition of bots and natural language support. Bots are integrated with Hangouts so that you can assign a project task to a team member, or you can use the bot to find the next free meeting slot for all delegates, all of which takes time in the real world.

Hangouts improvements

Natural language support was demonstrated in Sheets, whereby a user wanted to apply a particular formula but didn’t know how. By expressing what they wanted to do in natural language, Sheets was able to construct a complex formula that achieved these results in a split second, again illustrating the value of the powerful Google APIs.

A final demo was given by another UK startup called Ravelin. They have a service that detects fraud in financial transactions using powerful Machine Learning techniques. They then draw heat maps of suspected fraud activity and this can at a glance show parts of the country where fraud is most likely.

The service sits in the workflow for online payments and can return positive or negative results in milliseconds, thus not delaying the checkout process for the end consumer. Really impressive stuff!

More security and compliance in the cloud

After the keynote, I went to the first breakout of the day which was about security and compliance. This did not just cover GCP but also mobile as well. A Google service called Safety Net makes 400 million checks a day against devices to help prevent attacks and data leaks. This is leveraged by Google Play, whose payment platform serves 1 billion users worldwide.

One stat that blew me away was that 99% of all exploits happen a year after the CVE was published. This is a bit of a damning statement and shows that security and patching is still not treated seriously enough. On the other side of the coin, Android still has a lot to do in this area, so in some respects I thought it was a bit rich of Google to point fingers.

Are you the weakest link?

Google has 15 regions and 100 POPs in 33 countries, with a global fibre network backbone that carries a third of all internet traffic daily. The Google Peering website has more information on the global network and is worth a visit. Google really emphasised their desire to be the securest cloud provider possible by noting that they have 700+ security researchers and have published 160 academic security white papers. Phishing is still the most common way of delivering malicious payloads.

DLP is now available for both GMail and Drive, meaning the leak of data to unauthorised sources can now be prevented. There is also support for FIDO approved tokens, which are USB sticks with a fingerprint scanner on board. These are fairly cheap and provide an additional layer of security. The session wrapped with announcements around expiring access and IRM support for Drive, S/MIME support for Gmail and third party apps white listing for G Suite.

To mention GDPR ¬†– Google have stated that you are the data controller and Google are the data processor. Google has certified all infrastructure for FedRAMP, only provider to do that. Although FedRAMP doesn’t apply outside of the US, there may be cases where this level of certification will be useful to show security compliance.

Cloud networking for Enterprises

My next breakout was on GC networking. I have to say that as a rule, the way GC does this is very similar to AWS with VPC and subnet constructs, along with load balancing capabilities. Load balancing comes in three main flavours – HTTP(S), SSL and TCP Proxy. You can also have both internal and external load balancing.

Load balancing can be globally distributed, to help enable high availability and good levels of resilience. This uses IP AnyCast to achieve this functionality. IPv6 is now supported on load balancers, but you can only have one address type to each load balancer. In respect of CDN, there is a Google CDN, but you can also use third party CDN providers such as Akamai or Fastly.

Fastly took part in the breakout to explain how their solution works. It adds a layer of scalability and also performance on top of public cloud providers. It is custom code written by Fastly to determine optimal routes for network traffic globally. I’m sure it does a lot more than that, so feel free to check them out.

The Fastly network

Andromeda is the name of the SDN written by Google to control all networking functions. There is a 60Gbps link between VMs in the same region and live migration of VMs is available (unique to GC at the time of writing). GCP firewalls are stateful, accept ingress/egress rules and deny is the default unless overridden.

DDos protection at layer 3 and 4 with Cloud CDN and Load Balancer, with third party appliances supported (Checkpoint, Palo Alto, F5, etc.). Identity Aware Proxy can be used to create ACLs for access to external and internal sites using G Suite credentials. In respect of VPCs, you can have a single VPC that can be used globally and also shared with other organisations. VPCs have expandable IP address ranges, so you don’t need to decide up front how many addresses you will need, this can be changed later.

There is private access to Google services from VPCs including cloud storage, so think of S3 endpoints in AWS and you’ll get the idea. Traffic does not traverse the public internet, but uses Google’s network backbone. You can access any region from a single interconnect through Google‚Äôs network (think Direct Connect or ExpressRoute).

Like Azure and AWS, VPC network peering is available. VMs support multi NICs and you can have 10 NICs per VM. XPNs define cross project networking and you can have shareable central network admin, shared VPN, fine grained IAM controls and the Cloud Router supports BGP. Finally, in terms of high bandwidth/low latency connections, you can have a direct connection to Google with Partner interconnections also available.

To wrap up

To summarise, props to Google for a very good event. There was loads of technical deep dive content and if I had one criticism, it would be that the exhibition hall was a bit sparse, but I expect that will be addressed pretty quickly. In respect of functionality, I was pleasantly surprised with how much is currently available in GC.

Standouts for me include the VM charging model, VM sizing, live migration of VMs and added flexibility around the networking piece. It’s clear that Google want to position GC as having all the core stuff you’d expect, but with the availability of the APIs that help run the consumer side of Google, with some massively powerful APIs available.




Event Review – Google Cloud Next London – Day One

I was fortunate enough to spend the last couple of days at the Google Cloud Next London event at the ExCel centre and I have a few thoughts about it I’d like to share. The main takeaway I got from the event is that while there may not be the breadth of services within Google Cloud (GCP) as there is in AWS or Azure, GCP is not a “me too” public cloud hyperscaler.

While some core services such as cloud storage, VPC networking, IaaS and databases are available, there are some key differences with GCP that are worth knowing about. My interpretation of what I saw over the couple of days was that Google have taken some of the core services they’ve been delivering for years, such as Machine Learning, Maps and Artificial Intelligence and presenting them as APIs for customers to consume within their GCP account.

This is a massive difference from what I can see with AWS and Azure. Sure, there are components of the above available in those platforms, but these are services which have been at the heart of Google’s consumer services for over a decade and they have incredible power. In terms of market size, both AWS and Azure dwarf GCP, but don’t be fooled into thinking this is not a priority area for Google, because it is. They have ground to make up, but they have very big war chests of capital to spend and also have some of the smartest people on the planet working for them.

To start with, in the keynote, there was the usual run down of event numbers, but the one that was most interesting for me was that there were 4,500 delegates, which is up a whopping 300% on last year, and 67% of registered attendees described themselves as developers. Google Cloud is made up of GCP, G Suite (Gmail and the other consumer apps), Maps and APIs, Chrome and Android. Google Cloud provides services to 1 billion people worldwide per day. Incredible!

Gratuitous GC partner slide

There was the usual shout out of thanks to the event sponsors. One thing I did notice in contrast to other vendor events I’ve been to was the paucity of partners in the exhibition hall. There were several big names including Rackspace, Intel and Equinix but obviously building a strong partner ecosystem is still very much a work in progress.

We then had a short section with Diane Greene, who many industry veterans will know as one of the founders of VMware. She is now Senior VP for Google Cloud and it’s her job to get Google Cloud better recognition in the market. Something I found quite odd about this section is that she seemed quite ill prepared for her content and brought some paper notes with her on stage, which is very unusual these days. There were several quite long pauses and it seemed very under-rehearsed, which surprised me. Normally the keynote speakers are well versed and very slick.

GDPR and GC investment

Anyway, moving on to other factoids – Greene committed Google to be fully GDPR compliant by the time it becomes law next May. She also stated there has been $29.4 billion spent on Google Cloud in the last three years. The Google fibre backbone carries one third of all internet traffic. Let that sink in for a minute!

There is ongoing investment in the GC infrastructure and when complete in late 2017/early 2018, there will be 17 regions and 50 availability zones in the GC environment, which will be market leading.


GCP regions, planned and current

Google Cloud billing model

One aspect of the conference that was really interesting was the billing model for virtual machines. In the field, my experience with AWS and Azure has been one of pain when trying to determine the most cost effective way to provide compute services. It becomes a minefield of right sizing instances, purchasing reserved instances, deciding what you might need in three year’s time, looking at Microsoft enterprise agreements to try and leverage Hybrid Use Benefit. Painful!

The GCP billing model is one in which you can have custom VM sizes (much like we’ve always had with vSphere, Hyper-V and KVM), so there is less waste per VM. Also, the longer you use a VM, the cheaper the cost becomes (this is referred to as¬†sustained usage discount). Billing is also done per minute, which is in contrast to AWS and Azure who bill per hour. So even if you only use a part hour, you still pay the full amount.

It is estimated that 45% of public cloud compute spend is wasted, the GC billing model should help reduce this figure. You can also change VM sizes at any time and the sustained usage discount can result in “up to” 57% savings. Worth looking at, I think you’ll agree.

Lush from the UK were brought up to discuss their migration to GCP and they performed this in 22 days and they calculate 40% savings on hosting charges per year. Not bad!

Co-existence and migration

There has also been a lot of work done within GCP to support Windows native tools such as PowerShell (there are GCP cmdlets) and Visual Studio. There are also migration tools that can live move VMs from vSphere,. Hyper-V and KVM, as you’d probably expect. Worth mentioning too at this point that GCP has live migration for VMs as per vSphere and Hyper-V, which is unique to GCP right now, certainly to the best of my knowledge.

G Suite improvements

Lots of work has been done around G Suite, including improvements to Drive to allow for team sharing of documents and also using predictive algorithms to put documents at the top of the Drive page within one click, rather than having to search through folders for the document you’re looking for. Google claim a 40% hit rate from the suggested documents.

There are also add ons from the likes of QuickBooks, where you can raise an invoice from directly within Gmail and be able to reconcile it when you get back to QuickBooks. Nice!

Encryption in the cloud

Once the opening keynote wrapped, I went to my first breakout session which was about encryption within GC. I’m not going to pretend I’m an expert in this field, but¬†Maya Kaczorowski clearly is, and she is a security PM at Google. The process of encrypting data within the GC environment can be summarised thus :-

  • Data uploaded to GC is “chunked” into small pieces (variable size)
  • Each chunk is encrypted and has it’s own key
  • Chunks are written randomly across the GC environment
  • Getting one chunk of data compromised is effectively useless as you will still need the other chunks
  • There is a strict hierarchy to the Key Management Service (shown below)

Google key hierarchy

A replay of this session is available on YouTube and is well worth a watch. Probably a couple of times so you actually understand it!

What’s new in Kubernetes and Google Container Engine

Next up was a Kubernetes session and how it works with Google Container Engine (GCE). I have to say, I’ve heard the name of Kubernetes thrown around a lot, but never really had the time or the inclination to see what all the fuss is about. As I understand it, Kubernetes is a wrapper over the top of container technologies such as Docker to provide more enterprise management and features such as clustering and scaling.

Kubernetes was written initially by Google before being open sourced and it’s rapidly becoming one of the biggest open source projects ever. One of the key drivers for using containers and Kubernetes is the ability to port your environment to any platform. Containers and Kubernetes can be run on Azure, AWS, GC or even on prem. Using this technology avoids vendor lock in, if this is a concern for you.

Kubernetes contributors and users

There is also a very high release cadence – a new version ships every three months and version 1.7 is due at the end of June (1.6 is the current version). The essence of containerisation is that you can start to use and develop microservices (services broken down into very small, fast moving parts rather than one huge bound up, inflexible monolithic stack). Containers also are stateless in the sense that data is stored elsewhere (cloud storage bucket, etc) and are disposable items.

In a Kubernetes cluster, you can now scale up to 5,000 pods per cluster. A cluster is a collection of nodes (think VMs) and pods are container items running isolated from each other on a node. Clusters can be multi-zone and multi-region and now also have the concept of “taints” and “tolerances”. Think of taints as node characteristics such as having a GPU, or a certain RAM or CPU size. A tolerance is a container rule that allows or disallows affinity based on the node taint. For example, a tolerance would allow a container to run on a node with a GPU only.

The final point of note here is that Google offer a managed Kubernetes service called Google Container Engine.

From Blobs to Relational Tables, where do I store my data?

My next breakout was to try and get a better view of the different storage options within GC. One of the first points made was really interesting in that Rolls Royce actually lease engines to airlines so they can collect telemetry data and have the ability to tune engines as well as perform pro-active maintenance based on data received back from the engines.

In summary, your storage options include:-

  • RDBMS – Cloud SQL
  • Data Warehousing – BigQuery
  • Hadoop – Cloud Storage
  • NoSQL – Cloud BigTable
  • NoSQL Docs – Cloud datastore
  • Scalable RDBMS – Cloud Spanner

Cloud Storage can have several different characteristics, including multi-region, regional, nearline and coldline. This is very similar to the options provided by AWS and Azure. Cloud Storage has an availability SLA of 99.95% and you use the same API to access all storage tiers.

Data lifecycle policies are available available in a similar way to S3, moving data between the tiers when rules are triggered. Delivery Network is performed using the Cloud CDN product and message queuing is performed using Cloud Pub/Sub. Cloud Storage for hybrid environments is also available in a similar way to StorSimple or the AWS Storage Gateway using partner solutions such as Panzura (cold storage, backup, tiering device, etc.)

Cloud SQL – 99.95% SLA, with failover replica and read replicas, which seemed very similar to how AWS RDS works. One interesting product was Cloud Spanner. This is a¬†horizontally scalable RDBMS solution that offers typical SQL features such as ACID but with the scalability of typical cloud NoSQL solutions. This to me seemed a pretty unique feature of GC, I haven’t seen this elsewhere. Cloud Spanner also provides global consistency, 99.99% uptime SLA and a 99.999% multi-region availability SLA. Cool stuff!

Serverless Options on GCP

My next breakout was on serverless options on GCP. Serverless seems to the latest trend in cloud computing that for some people is the answer to everything and nothing. Both AWS and Azure provide serverless products, and there are a lot of similarities with the Google Functions product.

To briefly deconstruct serverless tech, this is where you use event driven process to perform a specific task. For example, a file gets uploaded to a storage bucket and this causes an event trigger where “stuff” is performed by a fleet of servers. Once this task is complete, the process goes back to sleep again.

The main benefit of serverless is cost and management. You aren’t spinning VMs up and down and you aren’t paying compute fees for idle VMs. Functions is charged per 100ms of usage and also how much RAM is assigned to the process. The back end also auto scales so you don’t have to worry about setting up your own auto scaling policies.

Cloud Functions is in it’s infancy right now, so only node.js is supported but more language support will be added over time. Cloud storage, Pub/Sub channels and HTTP webhooks can be used to capture events for serverless processes.

Day Two wrap up to come in the next post!


Breaking Bad – Beating Analysis Paralysis

We’ve all been there, right? A large project or even a project that’s part of a bigger programme of works that seems to be stalled and stuck in an infinite loop because there is so much worry that something deployed may prove to be “wrong” further down the line. Meeting after meeting goes by, nips and tucks go into designs, more stakeholders are pulled into the process and the WITWIT cycle becomes a cycle you can’t seem to break.

Barack was unimpressed as the project meeting deferred yet another decision

WITWIT? Yep, it means “What if this? what if that?”. I made it up myself, as you can probably tell. There is also a further damaging introspection stage I like to call WITWOO (“What if this?”, “What Other Obstacles?”), but that is so silly and so contrived that I’m not going to mention it again. No, it’s not an April Fool’s Day post!

“WITWOO! Your project is dooooomed!”

So anyway, back to the project that is stuck in the thought process because everyone is terrified of making the wrong decision. I suppose in some ways, dealing with it depends on the deliverable of the project. If it’s something new, the TR (Terror Rating) is usually pretty high. This is because we’re dealing with a bit of a nebulous concept – we can’t see it, touch it, play with it. We don’t know what it can and can’t do, less how it can help deliver value to our organisation. We’ve seen demos and it looks cool, we signed off on that bit, but now how can this technology help the business grow?

Let’s start breaking down the problem then into more digestible chunks. I’ve spent a lot of time recently looking at things like Lean, Agile, Kanban, Scrum and Lean Coffee (look that one up, it’s interesting!). All of those things are frameworks, much like ITIL and PRINCE2. That means they aren’t prescriptive and you need to pick and choose which parts of the framework suit the needs of the project deliverable.

Agile type frameworks don’t need to represent software code as such – we’re talking about a product with features we want to consume. This could be anything – Office 365, in house software, even a drinks vending machine for heaven’s sake. We have this big “thing” in the distance, how do we get there in the best way? Keep it in your sights, but deliver smaller pieces quickly and start delivering value much quicker.

I’ve talked before about the Minimum Viable Product process and I’ve had folks rebut this argument with models such as RAT (Riskiest Assumption Test). Either way, if you start arguing about stuff like this, you’re totally missing the point. Similarly if you embrace a full Agile/Kanban method with daily standups, camp fires, burning joss sticks and rounds of “Kum Ba Yah” without having anything tangible to show for it other than “Hey! We do DevOps!”.

The DevOpsiest DevOps team in the world. Singing Kum ba yah. Possibly.

Frameworks are buffets – this means we can pick a bit of this, a bit of that and leave the rest because we have no use for it. Keep it as simple as you can to start relieving the paralysis log jam so it doesn’t make the problem bigger.

Start delivering value

To start with, using the MVP analogy, ask yourself “what is the minimum set of features this product must have on day one to start delivering value to the business?”. Going back to the earlier product deliverable itself, this could be:-

  • Office 365 Product – Just SharePoint Online
  • In house software – A login GUI for end users, linked to LDAP and a dashboard with one chart on it
  • A drinks vending machine – Sell cans of diet cola (any brand will do)

Already we know that if we’ve done our requirements capture properly, these features are just the tip of the iceberg. That being said, by producing this MVP, we now have something our customers can consume. Users can start to add SharePoint sites or login to the new in house software and see a dashboard with a key chart on it or go to the vending machine and buy a can of diet cola.

At this point, we can then talk to our customers and find out if the MVP delivers the initial deliverable and if not, how it might be changed. For example, in the drinks machine scenario, customers like that the machine is there, but would prefer a branded diet cola rather than the own brand version from a large cash and carry warehouse.

Each time we make a change or add a feature, we go back to the customer to find out their thoughts. We also commit to delivering changes regularly, such as weekly or bi-weekly. This is a key Agile concept. We don’t wait until the whole thing is finished before we let people consume it.

Customers would like to be able to buy bottled water and ice tea from the vending machine. Great! However, in the next week, we can only commit to getting one of these drinks in. How do we know which one to add?

Any project board or design authority board will have people on it with strong opinions. In fact, you should want this. In my experience, passive “passengers” sit and say nothing and only complain once something has been delivered and it’s harder to make changes. Members will also be passionate about things that don’t matter to other board members. Brand of cola versus bottled water, for example. How do we break this cycle? This is most likely to be the main bottleneck to progress.

Defining value

We need to have a framework to defining value to the business of what the product is delivering. On the project board, for the next release, Gomez wants to stock Diet Pepsi instead of the unbranded diet cola and Morticia¬†doesn’t drink cola but thinks bottled water is essential in any vending machine. Oh, and Gomez is the CTO and Morticia is the CEO.

“Get some water in. What do we say? Now!”

We can’t make both people happy, but we still have to keep delivering value. How can we do this? The best measure of anything is analytical, empirical and measurable. Take out instinct, opinion and gut feeling. Everyone’s is different. By assigning a numeric value to proposed features, we can be dispassionate about the design decision and also demonstrate to stakeholders that we are delivering maximum value to the business.

Firstly, we need some criteria by which to score the proposed features. We know we want to add Diet Pepsi, bottled water and pre-mixed protein shakes (I forgot to mention the CFO is a gym rat). We have three “wants”, each requester is C-Level and we have to keep delivering maximum value to the business in the shortest amount of time.

“I need your clothes, your boots and your protein shake”

Keep the criteria definitions short and simple. Between 5-8 I would say, this way you have enough criteria to give a well defined score, but you also aren’t listing 20 criteria on which you have to decide. Using this principle, we arrive at the following criteria:-

  • Compatibility of product with vending machine slot
  • Delivery of product in 3-5 days from supplier
  • Availability of stock
  • Demand for product
  • Health benefits to staff

So we now have our criteria, but how do we score it? Remember, KISS! Not the glam rock band, but Keep It Simple, Stupid! One process that works well for me is a simple low, medium and high. So 25, 50 and 100. This way, the rating is easier to decide on, plus there is enough gap between the numbers to help decide ordering of the deliverable.

“Keep it simple, or we’ll tour again. Mmm-kay?”

Based on both the criteria and the scoring values, we can now rate features for the next iteration.

  Diet Pepsi Bottled Water Pre-Mixed Protein
Machine Compatibility 100 100 25
Item delivery lead time 100 100 25
Availability of stock 50 50 25
Product demand 50 100 25
Health benefits 50 100 100
TOTAL 350 450 200

Quickly once scoring is done, we can see a clear set of priorities defined for the next three releases but how did we arrive at these scores? Well, the following happened during discussions:-

  • The protein mix is in a large bottle that doesn’t fit in a standard slot (so scores low for compatibility)
  • Cola and water is in stock with the supplier but the protein mix has to be ordered from the manufacturer
  • Cola and water is in stock but there are not enough bottles to fill the machine to capacity and will need to be back ordered
  • An internal poll on the company intranet showed all respondents wanted to buy water, 65% also wanted to buy cola and 10% also wanted to buy protein, so we round the values accordingly
  • Water and protein is healthy and nutritious, diet cola not so much (it can damage bones, apparently)

So there you have it – now we have dispassionately provided an ordered list of features, we commit to delivering bottled water, Diet Pepsi and pre-mixed protein into the vending machine during the next three releases. Customers are informed of this and are informed of the dates when they will be added (one item per week over three weeks).

What you should then find is that any arguments stop, because we have applied strict rules to our criteria and weighted them accordingly. Also, we know we want to stock all kinds of goodies in the vending machine, but that is a longer term goal. We have the machine, we want to stock it as quickly as possible and with products we know there is a clear demand for. A happy side effect of this process is we don’t waste time stocking a product nobody wants.

For example, we could add a new Marmite drink to the machine on the first day. It’s healthy, fits in a standard slot, on a special offer at the warehouse and can be with us tomorrow. It also tastes like an old sock, so nobody buys it. The value we add is that we are making the best use of time and keep waste to a minimum. Plus, the business is out of pocket because the Marmite drink is rank, nobody wants to drink it and it doesn’t sell a single item.

What Marmite tastes like.

Remember that the scoring does not reflect importance, it reflects business value and speed of delivery. You may do this for an IT project and you find DR scores lower than most other features. This is not because DR is not important, but more likely because there are other delays or constraints caused by factors such as licencing, compatibility and physical infrastructure. This means DR, while important as a deliverable, will take longer than enabling a new line of business SaaS application for example.

Bringing it all together

I know that drinks vending machines have bugger all to do with IT projects, but the concepts and the constraints remain exactly the same. You have a big thing – Office 365 (think vending machine), you have things within it that people consume – Outlook, SharePoint, Yammer (think drinks types) and you need to deliver it to end customers as quickly as you can without second guessing which features they want. Let’s say you don’t add Teams because you already use Slack (you hipster, you!). We’d know this because Teams would have a low score.

In summary, the way to break out of analysis paralysis is to a) break the project down into small, deliverable chunks and b) use weightings, metrics and empirical data to define the decision making processes on what to deliver and when. This makes the whole process so much more visible.

Remember that most decisions are reversible, don’t be afraid to get it wrong once in a while and have an open culture within all members of the project board and teams. Finally, don’t embrace frameworks like a cult – choose what works for that project and put away the guitars and joss sticks!



Avoiding vendor lock-in in the public cloud

A little while back, I had a pretty frank discussion with a customer about vendor lock-in in the public cloud and he left me under no illusions that he saw cloud more as a threat than an opportunity. I did wonder if there had been some incident in the past that had left him feeling this way, but didn’t really feel it appropriate to probe into that much further.

Instead of dwelling on the negatives of this situation, we decided to accentuate the positives and try to formulate some advice on how best this risk could be mitigated. This was especially important as there was already a significant investment made by the business into public cloud deployments. It is an important issue though – it’s easy enough to get in, but how do you get out? There are several strategies you could use, I’m just going to call out a couple of them as an example.

To start with, back in the days of all on premises deployments, generally you would try and go for a “best of breed” approach. You have a business problem that needs a technical solution so you look at all the potential solutions and choose the best fit based on a number of requirements. Typically these include cost, scalability, support, existing skill sets and strength of the vendor in the market (Gartner Magic Quadrant, etc.). This applies equally in the public cloud – it’s still a product set in a technical solution so the perspective needn’t change all that much.

One potential strategy is to use the best of breed approach to look at all public cloud vendors (for the purpose of this article, I really just mean the “big three” of AWS, Azure and Google Cloud Platform). As you might expect, the best cost, support and deployment options for say SQL Server on Windows would probably be from Microsoft. In that case, you deploy that part of the solution in Azure.

Conversely, you may have a need for a CDN solution and decide that AWS CloudFront represents the best solution, so you build that part of your solution around that product. This way you are mitigating risk by spreading services across two vendors while still retaining the best of breed approach.

However, “doing the splits” is not always preferable. It’s two sets of skills, two lots of billing to deal with and two vendors to punch if anything goes badly wrong.

Another more pragmatic approach is to make open source technologies a key plank of your strategy. Products such as MySQL, Postgres, Linux, Docker, Java, .NET, Chef and Puppet are widely available on public cloud platforms and mean that any effort put into these technologies can be moved elsewhere if need be (even back on premises if you need to). Not only this, but skills in the market place are pretty commoditised now and mean that bringing in new staff to help with the deployments (or even using outside parties) is made easier and more cost effective.

You could go down the road of deploying a typical web application on AWS using Postgres, Linux, Chef, Docker and Java and if for any reason later this approach becomes too expensive or other issues occur, it’s far easier to pick up the data you’ve generated in these environments, walk over to a competitor, drop it down and carry on.

Obviously this masks some of the complexities of how that move would actually take place, such as timelines, cost and skills required, but it presents a sensible approach to stakeholders that provider migration has been considered and has been accounted for in the technical solution.

The stark reality is that whatever you are doing with technology, there will always be an element of vendor lock in. Obviously from a financial perspective there is a motive for them to do that, but also this comes of innovation when a new technology is created which adds new formats and data blobs to the landscape. The key to addressing this is taking a balanced view and being able to tell project stakeholders that you’re taking a best of breed approach based on requirements and you have built in safeguards in case issues occur in future that prompt a re-evaluation of the underlying provider.



What is the Cloud Shared Responsibility Model and why should I care?

When I have discussions with customers moving to a public cloud provider, one of the main topics of conversation (quite rightly) is security of services and servers in the cloud. Long discussions and whiteboarding takes place where loads of boxes and arrows are drawn and in the end, the customer is confident about the long term security of their organisation’s assets when moving to Azure or AWS.

Almost as an aside, one of the questions I ask is how patching of VMs will be performed and a very common answer is “doesn’t Microsoft/AWS patch them for us?”. At this point I ask if they’ve heard of the Shared Responsibility Model and often the answer is “no”. So much so that I thought a quick blog post was in order to reinforce this point.

So then, what is the Shared Responsibility Model? Put simply, when you move services into a public cloud provider, you are responsible for some or most of the security and operational aspects of the server (tuning, anti-virus, backup, etc.) and your provider is responsible for services lower down the stack that you don’t have access to, such as the hypervisor host, physical racks, power and cooling.

That being said, there is a bit more to it than that, depending on whether or not we’re talking about IaaS, PaaS or SaaS. The ownership of responsibility can be thought of as a “sliding scale” depending on the service model. To illustrate what I mean, take a look at the diagram below, helpfully stolen from Microsoft (thanks, boys!).

Reading the diagram from left to right, you can see that in the left most column where all services are hosted on prem, it is entirely the responsibility of the customer to provide all of the security characteristics. There is no cloud provider involved and you are responsible for racking, stacking, cooling, patching, cabling, IAM and networking.

As we move right to the IaaS column, you can see subtle shades of grey emerging (quite literally) as with IaaS, you’re hosting virtual machines in a public cloud provider such as Azure or AWS. The provider is responsible for DC and rack security and some of the host infrastructure (for example, cloud providers patch the host on your behalf), but your responsibility is to ensure that workloads are effectively spread across hosts in appropriate fault and update domains for continuity of service.

Note however that in the IaaS model, as you the customer are pretty much responsible for everything from the guest upwards, it’s down to you to configure IAM, endpoint security and keep up to date with security patches. This is where a lot of the confusion creeps in. Your cloud provider¬†is not on the hook if you fail to patch and properly secure your VMs (including network and external access). Every IaaS project requires a patching and security strategy to be baked in from day one and not retrofitted. This may mean extending on prem AD and WSUS for IAM and patching, to leverage existing processes. This is fine and will work, you don’t necessarily need to reinvent the wheel here. Plus if you re-use existing processes, it may shorten any formal on boarding of the project with Service Management.

Carrying on across the matrix to the next column on the right is the PaaS model. In this model, you are consuming pre-built features from a cloud provider. This is most commonly database services such as SQL Server or MySQL but also includes pre-built web environments such as Elastic Beanstalk in AWS. Because you are paying for a sliver of a larger, multi-tenant service, your provider will handle more layers of the lower stack, including the virtual machines the database engine is running on as well as the database engine itself. Typically in this example, the customer does not have any access to the underlying virtual machine either via SSH or RDP, as with IaaS.

However, as the matrix shows, there is still a level of responsibility on the customer (though the operational burden is reduced). In the case of Database PaaS, the customer is still in charge of backing up and securing (i.e. encryption and identity and access management) the data. This is not the responsibility of the provider with the exception of logical isolation from other tenants and the physical security of the hardware involved.

Finally, in the far right column is the SaaS model. The goal of this model is for the customer to obtain a service with as little administrative/operational overhead as possible. As shown in the matrix, the provider is responsible for everything in the stack from the application down, including networking, backup, patching, availability and physical security. IAM functions are shared as most SaaS is multi-tenant, so the provider must enforce isolation (in the same way as PaaS) and the customer must ensure only authorised personnel can access the SaaS solution.

You will note that endpoint security is also classed as a shared responsibility. Taking Office 365 as an example, Microsoft provide security tools such as anti-virus scanning and data loss prevention controls, it is up to the customer to configure this to suit their use case. Microsoft’s responsibility ends with providing the service and the customer’s starts with turning the knobs to make it work to their taste. You will also notice that as in all other cases, it is solely the customer’s responsibility to ensure the classification and accountability of the data. This is not the same as the reliability of the services beneath it (networking, storage and compute) as this is addressed in the lower layers of the model.

I hope this article provides a bit of clarity on what the Shared Responsibility Model is and why you should care. Please don’t assume that just because you’re “going cloud” that a lot of these issues will go away. Get yourself some sound and trusted advice and make sure this model is accounted for in your project plan.

For your further reading pleasure, I have included links below to documentation explaining provider’s stances and implementation of the model :-

As always, any questions or comments on this post can be left below or feel free to ping me on Twitter @ChrisBeckett


Exam 70-740 : Installation, Storage, and Compute with Windows Server 2016 : Exam Tips and Feedback


I sat and passed the above exam today, and as there seems to be a total lack of information on this exam out there (aside from what’s in the exam blueprint), I thought I would pass on the benefit of my experience and offer a few tips for folks out there planning on taking it.

The 70-740 forms part of the three exams needed to fulfil MCSA : Server 2016 (the other two being 70-741 and 70-742) if you don’t already have a 2012 MCSA, which I don’t. The first exam concentrates on installation, storage and compute as you may guess by the title of the post.

The exam itself is 47 questions and the time allotted is 120 minutes. It seems that gone are the days when you got really verbose scenarios and then asked if the equally wordy solutions matched the requirements. The questions today were pretty concise, and that included ones where a scenario was given and potential solutions offered.

If you’ve done any of the recent MCSA exams such as Office 365, I thought the questions were even shorter than that, so perfect for someone like me with the attention span of a gnat.

In terms of the question formats, there are the usual drag and drops (3 from 6, say), drop down complete the PowerShell commands, select a couple of correct answers from 6 or 8 and then the right answer from 6 or 8. The focus of the questions is pretty faithful to the exam blueprint and you should study (amongst others) on the following areas :-

  • NLB
  • Storage Spaces
  • Nano Server installation and customisation
  • Product activation models
  • Remote PowerShell sessions
  • Hyper-V (create VMs, create VHDs, limitations of nested virtualisation)
  • Failover Clustering (Shared VHDXs, monitoring, live migration requirements)
  • NTFS vs ReFS
  • iSCSI
  • Storage Replicas
  • Containers and Docker commands

The exam took me around 35-40 minutes and I managed to pass with a 796, which was a pleasant surprise as I’d been studying quite piecemeal for the exam and a lot of my answers in the exam itself were educated guesses. As usual, if you aren’t sure, rule out the ones you know can’t be right and then play the percentages after that. Also, a lot of PowerShell answers revolve around a “Get-Something” and “Set-Something” structure, so that may help if you’re not sure.

On now to 70-741 and hopefully I can wrap up the remaining two MCSA exams fairly quickly. Good luck if you’re sitting this one soon. In terms of study resources, I used PluralSight, bought the MS Press Study Guide (use the code MCPEBOOK for 50% off the eBook version)¬†and used a lot of Technet articles and also Hands On Labs to lab stuff I couldn’t quite get my head around.