09-05-17

Event Review – Google Cloud Next London – Day Two

Seenit demo

The day 2 keynote started with an in depth discussion of Cloud Spanner, as mentioned previously. AWS and Azure provide highly scalable and highly tunable NoSQL services in the form of DynamoDB etc, but when it comes to more traditional “meat and potatoes” RDBMS solutions, they are constrained by the limitations of the products they use, such as MySQL, SQL Server, Postgres, etc.

Cloud Spanner is different as it is a fully scalable RDBMS solution the cloud that offers all the same benefits as the NoSQL solutions in Azure and AWS. Much of the complexity of sharding the database and replicating it globally has been taken care of within Cloud Spanner. Automatic tuning is also done over time by background algorithms.

Cloud Spanner will be GA’d on May 16th and well worth a look if you have ACID database requirements at scale.

A representative from The Telegraph was brought up to discuss how GC’s data solutions allow them to perform very precise consumer targeting using analytics. It was also worth noting that they are a multi-cloud environment, using best of breed tools depending on the use case. Rare and ballsy!

An example of the powerful Google APIs available was then demonstrated by a UK startup called Seenit. They use the Google Video Intelligence API to automatically tag videos that are uploaded to their service. Shazam then came up on stage to discuss their use of the Google Cloud platform and to share some of the numbers they have for their service.

Shazam by numbers

As you can see from the picture above, there have been over a billion downloads for the app and more than 300 million daily active users. Those numbers take some processing! One of the key takeaways for Shazam was that in some cases, traffic spikes can be predicted, such as at a major sporting event or during a Black Friday sale. This is less the case with Shazam, so they have to have an underlying platform that can be resilient to these spikes.

There was a demo of GPU usage in the cloud, around the use case of rendering video. The key benefit of cloud GPU is that you can harness massive scalability at a fraction of the cost it would take to provision your own kit. Not only that, but consumption based charging means that you only pay for what you use, making it a highly cost effective option.

For the final demo of the keynote, there was a show and tell around changes coming to G Suite. This includes Hangouts, which has had some major engineering done to it. It will support call in delegates, a Green Room to hold attendees before the meeting starts and also the support for a new device called the Jamboard. This is a touch screen whiteboard that can be shared with delegates in the Hangouts meeting who can also interact with the virtual whiteboard, making it a team interactive session. Jamboards are still not available, but expect them to cost a few thousand pounds/dollars on release.

One of the new aspects of G Suite that I liked was the addition of bots and natural language support. Bots are integrated with Hangouts so that you can assign a project task to a team member, or you can use the bot to find the next free meeting slot for all delegates, all of which takes time in the real world.

Hangouts improvements

Natural language support was demonstrated in Sheets, whereby a user wanted to apply a particular formula but didn’t know how. By expressing what they wanted to do in natural language, Sheets was able to construct a complex formula that achieved these results in a split second, again illustrating the value of the powerful Google APIs.

A final demo was given by another UK startup called Ravelin. They have a service that detects fraud in financial transactions using powerful Machine Learning techniques. They then draw heat maps of suspected fraud activity and this can at a glance show parts of the country where fraud is most likely.

The service sits in the workflow for online payments and can return positive or negative results in milliseconds, thus not delaying the checkout process for the end consumer. Really impressive stuff!

More security and compliance in the cloud

After the keynote, I went to the first breakout of the day which was about security and compliance. This did not just cover GCP but also mobile as well. A Google service called Safety Net makes 400 million checks a day against devices to help prevent attacks and data leaks. This is leveraged by Google Play, whose payment platform serves 1 billion users worldwide.

One stat that blew me away was that 99% of all exploits happen a year after the CVE was published. This is a bit of a damning statement and shows that security and patching is still not treated seriously enough. On the other side of the coin, Android still has a lot to do in this area, so in some respects I thought it was a bit rich of Google to point fingers.

Are you the weakest link?

Google has 15 regions and 100 POPs in 33 countries, with a global fibre network backbone that carries a third of all internet traffic daily. The Google Peering website has more information on the global network and is worth a visit. Google really emphasised their desire to be the securest cloud provider possible by noting that they have 700+ security researchers and have published 160 academic security white papers. Phishing is still the most common way of delivering malicious payloads.

DLP is now available for both GMail and Drive, meaning the leak of data to unauthorised sources can now be prevented. There is also support for FIDO approved tokens, which are USB sticks with a fingerprint scanner on board. These are fairly cheap and provide an additional layer of security. The session wrapped with announcements around expiring access and IRM support for Drive, S/MIME support for Gmail and third party apps white listing for G Suite.

To mention GDPR  – Google have stated that you are the data controller and Google are the data processor. Google has certified all infrastructure for FedRAMP, only provider to do that. Although FedRAMP doesn’t apply outside of the US, there may be cases where this level of certification will be useful to show security compliance.

Cloud networking for Enterprises

My next breakout was on GC networking. I have to say that as a rule, the way GC does this is very similar to AWS with VPC and subnet constructs, along with load balancing capabilities. Load balancing comes in three main flavours – HTTP(S), SSL and TCP Proxy. You can also have both internal and external load balancing.

Load balancing can be globally distributed, to help enable high availability and good levels of resilience. This uses IP AnyCast to achieve this functionality. IPv6 is now supported on load balancers, but you can only have one address type to each load balancer. In respect of CDN, there is a Google CDN, but you can also use third party CDN providers such as Akamai or Fastly.

Fastly took part in the breakout to explain how their solution works. It adds a layer of scalability and also performance on top of public cloud providers. It is custom code written by Fastly to determine optimal routes for network traffic globally. I’m sure it does a lot more than that, so feel free to check them out.

The Fastly network

Andromeda is the name of the SDN written by Google to control all networking functions. There is a 60Gbps link between VMs in the same region and live migration of VMs is available (unique to GC at the time of writing). GCP firewalls are stateful, accept ingress/egress rules and deny is the default unless overridden.

DDos protection at layer 3 and 4 with Cloud CDN and Load Balancer, with third party appliances supported (Checkpoint, Palo Alto, F5, etc.). Identity Aware Proxy can be used to create ACLs for access to external and internal sites using G Suite credentials. In respect of VPCs, you can have a single VPC that can be used globally and also shared with other organisations. VPCs have expandable IP address ranges, so you don’t need to decide up front how many addresses you will need, this can be changed later.

There is private access to Google services from VPCs including cloud storage, so think of S3 endpoints in AWS and you’ll get the idea. Traffic does not traverse the public internet, but uses Google’s network backbone. You can access any region from a single interconnect through Google’s network (think Direct Connect or ExpressRoute).

Like Azure and AWS, VPC network peering is available. VMs support multi NICs and you can have 10 NICs per VM. XPNs define cross project networking and you can have shareable central network admin, shared VPN, fine grained IAM controls and the Cloud Router supports BGP. Finally, in terms of high bandwidth/low latency connections, you can have a direct connection to Google with Partner interconnections also available.

To wrap up

To summarise, props to Google for a very good event. There was loads of technical deep dive content and if I had one criticism, it would be that the exhibition hall was a bit sparse, but I expect that will be addressed pretty quickly. In respect of functionality, I was pleasantly surprised with how much is currently available in GC.

Standouts for me include the VM charging model, VM sizing, live migration of VMs and added flexibility around the networking piece. It’s clear that Google want to position GC as having all the core stuff you’d expect, but with the availability of the APIs that help run the consumer side of Google, with some massively powerful APIs available.

 

 

05-05-17

Event Review – Google Cloud Next London – Day One

I was fortunate enough to spend the last couple of days at the Google Cloud Next London event at the ExCel centre and I have a few thoughts about it I’d like to share. The main takeaway I got from the event is that while there may not be the breadth of services within Google Cloud (GCP) as there is in AWS or Azure, GCP is not a “me too” public cloud hyperscaler.

While some core services such as cloud storage, VPC networking, IaaS and databases are available, there are some key differences with GCP that are worth knowing about. My interpretation of what I saw over the couple of days was that Google have taken some of the core services they’ve been delivering for years, such as Machine Learning, Maps and Artificial Intelligence and presenting them as APIs for customers to consume within their GCP account.

This is a massive difference from what I can see with AWS and Azure. Sure, there are components of the above available in those platforms, but these are services which have been at the heart of Google’s consumer services for over a decade and they have incredible power. In terms of market size, both AWS and Azure dwarf GCP, but don’t be fooled into thinking this is not a priority area for Google, because it is. They have ground to make up, but they have very big war chests of capital to spend and also have some of the smartest people on the planet working for them.

To start with, in the keynote, there was the usual run down of event numbers, but the one that was most interesting for me was that there were 4,500 delegates, which is up a whopping 300% on last year, and 67% of registered attendees described themselves as developers. Google Cloud is made up of GCP, G Suite (Gmail and the other consumer apps), Maps and APIs, Chrome and Android. Google Cloud provides services to 1 billion people worldwide per day. Incredible!

Gratuitous GC partner slide

There was the usual shout out of thanks to the event sponsors. One thing I did notice in contrast to other vendor events I’ve been to was the paucity of partners in the exhibition hall. There were several big names including Rackspace, Intel and Equinix but obviously building a strong partner ecosystem is still very much a work in progress.

We then had a short section with Diane Greene, who many industry veterans will know as one of the founders of VMware. She is now Senior VP for Google Cloud and it’s her job to get Google Cloud better recognition in the market. Something I found quite odd about this section is that she seemed quite ill prepared for her content and brought some paper notes with her on stage, which is very unusual these days. There were several quite long pauses and it seemed very under-rehearsed, which surprised me. Normally the keynote speakers are well versed and very slick.

GDPR and GC investment

Anyway, moving on to other factoids – Greene committed Google to be fully GDPR compliant by the time it becomes law next May. She also stated there has been $29.4 billion spent on Google Cloud in the last three years. The Google fibre backbone carries one third of all internet traffic. Let that sink in for a minute!

There is ongoing investment in the GC infrastructure and when complete in late 2017/early 2018, there will be 17 regions and 50 availability zones in the GC environment, which will be market leading.

 

GCP regions, planned and current

Google Cloud billing model

One aspect of the conference that was really interesting was the billing model for virtual machines. In the field, my experience with AWS and Azure has been one of pain when trying to determine the most cost effective way to provide compute services. It becomes a minefield of right sizing instances, purchasing reserved instances, deciding what you might need in three year’s time, looking at Microsoft enterprise agreements to try and leverage Hybrid Use Benefit. Painful!

The GCP billing model is one in which you can have custom VM sizes (much like we’ve always had with vSphere, Hyper-V and KVM), so there is less waste per VM. Also, the longer you use a VM, the cheaper the cost becomes (this is referred to as sustained usage discount). Billing is also done per minute, which is in contrast to AWS and Azure who bill per hour. So even if you only use a part hour, you still pay the full amount.

It is estimated that 45% of public cloud compute spend is wasted, the GC billing model should help reduce this figure. You can also change VM sizes at any time and the sustained usage discount can result in “up to” 57% savings. Worth looking at, I think you’ll agree.

Lush from the UK were brought up to discuss their migration to GCP and they performed this in 22 days and they calculate 40% savings on hosting charges per year. Not bad!

Co-existence and migration

There has also been a lot of work done within GCP to support Windows native tools such as PowerShell (there are GCP cmdlets) and Visual Studio. There are also migration tools that can live move VMs from vSphere,. Hyper-V and KVM, as you’d probably expect. Worth mentioning too at this point that GCP has live migration for VMs as per vSphere and Hyper-V, which is unique to GCP right now, certainly to the best of my knowledge.

G Suite improvements

Lots of work has been done around G Suite, including improvements to Drive to allow for team sharing of documents and also using predictive algorithms to put documents at the top of the Drive page within one click, rather than having to search through folders for the document you’re looking for. Google claim a 40% hit rate from the suggested documents.

There are also add ons from the likes of QuickBooks, where you can raise an invoice from directly within Gmail and be able to reconcile it when you get back to QuickBooks. Nice!

Encryption in the cloud

Once the opening keynote wrapped, I went to my first breakout session which was about encryption within GC. I’m not going to pretend I’m an expert in this field, but Maya Kaczorowski clearly is, and she is a security PM at Google. The process of encrypting data within the GC environment can be summarised thus :-

  • Data uploaded to GC is “chunked” into small pieces (variable size)
  • Each chunk is encrypted and has it’s own key
  • Chunks are written randomly across the GC environment
  • Getting one chunk of data compromised is effectively useless as you will still need the other chunks
  • There is a strict hierarchy to the Key Management Service (shown below)

Google key hierarchy

A replay of this session is available on YouTube and is well worth a watch. Probably a couple of times so you actually understand it!

What’s new in Kubernetes and Google Container Engine

Next up was a Kubernetes session and how it works with Google Container Engine (GCE). I have to say, I’ve heard the name of Kubernetes thrown around a lot, but never really had the time or the inclination to see what all the fuss is about. As I understand it, Kubernetes is a wrapper over the top of container technologies such as Docker to provide more enterprise management and features such as clustering and scaling.

Kubernetes was written initially by Google before being open sourced and it’s rapidly becoming one of the biggest open source projects ever. One of the key drivers for using containers and Kubernetes is the ability to port your environment to any platform. Containers and Kubernetes can be run on Azure, AWS, GC or even on prem. Using this technology avoids vendor lock in, if this is a concern for you.

Kubernetes contributors and users

There is also a very high release cadence – a new version ships every three months and version 1.7 is due at the end of June (1.6 is the current version). The essence of containerisation is that you can start to use and develop microservices (services broken down into very small, fast moving parts rather than one huge bound up, inflexible monolithic stack). Containers also are stateless in the sense that data is stored elsewhere (cloud storage bucket, etc) and are disposable items.

In a Kubernetes cluster, you can now scale up to 5,000 pods per cluster. A cluster is a collection of nodes (think VMs) and pods are container items running isolated from each other on a node. Clusters can be multi-zone and multi-region and now also have the concept of “taints” and “tolerances”. Think of taints as node characteristics such as having a GPU, or a certain RAM or CPU size. A tolerance is a container rule that allows or disallows affinity based on the node taint. For example, a tolerance would allow a container to run on a node with a GPU only.

The final point of note here is that Google offer a managed Kubernetes service called Google Container Engine.

From Blobs to Relational Tables, where do I store my data?

My next breakout was to try and get a better view of the different storage options within GC. One of the first points made was really interesting in that Rolls Royce actually lease engines to airlines so they can collect telemetry data and have the ability to tune engines as well as perform pro-active maintenance based on data received back from the engines.

In summary, your storage options include:-

  • RDBMS – Cloud SQL
  • Data Warehousing – BigQuery
  • Hadoop – Cloud Storage
  • NoSQL – Cloud BigTable
  • NoSQL Docs – Cloud datastore
  • Scalable RDBMS – Cloud Spanner

Cloud Storage can have several different characteristics, including multi-region, regional, nearline and coldline. This is very similar to the options provided by AWS and Azure. Cloud Storage has an availability SLA of 99.95% and you use the same API to access all storage tiers.

Data lifecycle policies are available available in a similar way to S3, moving data between the tiers when rules are triggered. Delivery Network is performed using the Cloud CDN product and message queuing is performed using Cloud Pub/Sub. Cloud Storage for hybrid environments is also available in a similar way to StorSimple or the AWS Storage Gateway using partner solutions such as Panzura (cold storage, backup, tiering device, etc.)

Cloud SQL – 99.95% SLA, with failover replica and read replicas, which seemed very similar to how AWS RDS works. One interesting product was Cloud Spanner. This is a horizontally scalable RDBMS solution that offers typical SQL features such as ACID but with the scalability of typical cloud NoSQL solutions. This to me seemed a pretty unique feature of GC, I haven’t seen this elsewhere. Cloud Spanner also provides global consistency, 99.99% uptime SLA and a 99.999% multi-region availability SLA. Cool stuff!

Serverless Options on GCP

My next breakout was on serverless options on GCP. Serverless seems to the latest trend in cloud computing that for some people is the answer to everything and nothing. Both AWS and Azure provide serverless products, and there are a lot of similarities with the Google Functions product.

To briefly deconstruct serverless tech, this is where you use event driven process to perform a specific task. For example, a file gets uploaded to a storage bucket and this causes an event trigger where “stuff” is performed by a fleet of servers. Once this task is complete, the process goes back to sleep again.

The main benefit of serverless is cost and management. You aren’t spinning VMs up and down and you aren’t paying compute fees for idle VMs. Functions is charged per 100ms of usage and also how much RAM is assigned to the process. The back end also auto scales so you don’t have to worry about setting up your own auto scaling policies.

Cloud Functions is in it’s infancy right now, so only node.js is supported but more language support will be added over time. Cloud storage, Pub/Sub channels and HTTP webhooks can be used to capture events for serverless processes.

Day Two wrap up to come in the next post!