Simplifying the Cloud Network

Monday, February 7, 2022

Introduction

Just like when we stare up at the sky and notice the gradually shifting, changing patterns in the clouds, so it is when we look at the technical offerings and options available from our public cloud providers. People often talk about “cloud migrations”, implying a shift from one static environment (usually on-premises) to another static environment (“the cloud”) - and indeed that is often how it pans out. But the public cloud providers (Amazon, Microsoft, Google etc.) are constantly honing and developing their products as well as introducing brand-new technologies, to provide customers with ever-increasing options on what their cloud environments look like.

For an organisation like the DVLA, which over the last few years has made great strides in increasing its technical agility to meet ever more diverse customer demands, the trick is to keep pace with these new offerings without being too close to what we call the “bleeding edge” – i.e. that potentially painful place where immature technology can cause more problems than it seeks to solve. And indeed, when we look back at the last few years of DVLA’s cloud journey, there has been a gradual shifting and evolution of how and what we deploy into the cloud. 

We’ve seen solutions being built on virtual machines, then ‘containers’ and now more and more ‘serverless’ deployments. Our container orchestration platform – Kubernetes – has shifted from being self-hosted, virtual machine-based, to an Amazon Web Services (AWS)  managed service. 

We’ve seen databases built on virtual machines shift recently to SaaS (software as a service) deployments hosted in other clouds. And there’s also been a shift in the cloud network, by which I mean the technology that underpins all our cloud solutions and means that things that need to talk to each other can, and those that shouldn’t talk to each other can’t.

The rest of this blog focuses on how we have evolved (and continue to evolve) our cloud networking architecture to support the DVLA’s cloud journey. I make no apologies for the fact that the subject matter may be a little on the dry side for some. If you have no interest in network routing, encrypted connections, or firewalls then I can hardly blame you; there are plenty of posts covering a myriad of other topics to choose from and I shall bid you a good day. If however, those terms stimulate the cerebrum and quicken the heart-rate, then read on!

In the beginning…

In the beginning, the DVLA deployed cloud components into a few virtual private clouds (VPCs). And then as our cloud journey continued, we created more VPCs…and then even more. VPCs are an Amazon Web Services (AWS) construct and are like a private, self-contained network zone in the cloud. So at the simplest level, you might imagine having a ‘Production’ VPC, a ‘Development’ VPC and a ‘Test’ VPC, each one containing all deployed systems but only for that environment. ​​​​​​​

So, the ‘Production’ VPC would host all Production systems, the ‘Test’ VPC would host all Test systems and so on. Even though this is greatly simplified, it isn’t a million miles from where we were at the beginning, i.e. something like this…

Even though this is a perfectly good pattern for basic deployments, as we deployed more and more solutions into the cloud the complexity of our cloud network inevitably grew. 

For example:

  • We deployed tooling to automate the deployment of code, virtual machines and containers. That tooling needed to be able to communicate with both Production and non-Production environments. 
  • We deployed systems that needed to talk to our on-premises environments, again both Production and non-Production. 
  • We split development of solutions across ‘squads’ each having their own AWS accounts and VPCs. 
  • We deployed a container orchestration platform, owned by a new Cloud Enablement squad, which needed to communicate with on-prem as well as some of the ‘original’ VPCs. 
  • Each time a new VPC was created we would have to decide which other VPCs it needed to communicate with and set up a dedicated network link between those VPCs (called VPC ‘peering’ in AWS parlance).
  • We would also need to decide whether it needed to talk to our on-prem networks and if so, create dedicated virtual private network (VPN) connections – encrypted network connections that traverse the internet – from the VPC to our on-prem infrastructure. 

Over time, the result of all this growth was a more and more complicated mesh of cloud network connections leading to an architecture that looked more like this:-

And in fact, by now we have even more VPCs so that even though this network mesh worked, it was becoming harder and harder to support and to trouble-shoot connectivity issues and the like. Which is where AWS’s Transit Gateway came in…

The Transit Gateway

AWS’s Transit Gateway was released in November 2018 and is essentially a central routing hub. Instead of connecting VPCs and on-prem networks directly to each other, leading to each VPC having multiple different connections as seen previously, each VPC now has a single, resilient connection to the Transit Gateway. Instead of dedicated VPN links between on-prem and each VPC we now have a single, resilient connection from on-prem to the Transit Gateway. Routing tables within the Transit Gateway now control which VPCs can send packets to which and which VPCs can send and receive traffic with our on-prem networks. 

So, from the previous diagram we have now evolved to a much simpler architecture where all communications to, from and within our cloud environments is managed in one place and can be represented like this:-

The Internet

Almost all systems, even the most secure, often have to communicate with other systems on the internet. This may be to allow access to the system from the public, or to allow the system to pull down security patches from on-line repositories or to allow the system to communicate with another system hosted by a different organisation. 

Naturally the Cloud Engineering squads within the DVLA strive to make these connections as secure as possible to allow the required traffic but prevent anything else. The challenge until recently with the DVLA’s architecture was that each VPC had its own internet connection, i.e. something like this:-


The difficulty here of course is that there are multiple points of connection to the internet and therefore multiple points where the security of that connection must be managed.

It was this consideration that drove the DVLA to route all traffic from its VPCs to the internet via a dedicated ‘egress’ VPC hosting an AWS firewall service, which inspects all outbound traffic and blocks anything to a domain that isn’t on the DVLA’s list of permitted destinations. 

Having already attached VPCs to the Transit Gateway meant that this change in outbound routing was relatively trivial. Each VPC now, by default, routes all of its traffic to the Transit Gateway and allows the Transit Gateway to ‘decide’ where to forward the traffic. We simply change which VPC to forward internet traffic to and this immediately takes effect for all of the VPCs attached to the Transit Gateway. 

Consolidating our outbound internet connections gives us the following simplified architecture:-


PrivateLink

Supporting its own IT systems, the DVLA has a number of interfaces, or network connections, to the IT systems of other organisations. 

For example, we use services provided by the Passport Office or DWP and provide services to the likes of the Home Office or DVSA, to barely scratch the surface of these types of interfaces. 

We establish these interfaces over a myriad of different network connection types; for now, we still have the PSN which is a closed, secure network between UK Government agencies and departments. 

We have a large number of VPNs, especially with commercial motoring organisations that consume our services, and we also build these interfaces directly across the internet.

The obvious challenge when exposing services on the internet is security. Every aspect of the connection has to be carefully configured so that only the traffic we want is allowed to the service and that the traffic itself is resistant to eavesdropping. So, firewalls have to be carefully configured, encryption protocols have to be established and so on. Any errors in the configuration could lead to a security breach and damage the reputation and integrity of the DVLA.

AWS have introduced a new way of exposing services to other organisations called PrivateLink, which to some extent simplifies the challenge of providing services to other organisations, so long as they are also AWS customers.

PrivateLink allows you to expose just the service you want to another customer’s VPC and routes the traffic across AWS’s own network, so the traffic never traverses the internet. In this way, AWS takes care of the firewalling between the two different organisations and provides a degree of assurance in terms of knowing ‘where the traffic has been’. In essence, we go from a connection that looks like:-


…to one that looks like:-


The Horizon

Hopefully this article has illustrated the way in which the DVLA is continually evolving its cloud network architecture to help simplify and consolidate its network links, which in turn helps us support the ever-growing diversity of cloud deployments. And of course, just as the clouds never stay still, our cloud networking journey will continue.

On the immediate horizon is a re-structure of our cloud network in the best way to support our adoption of AWS Organisations. Further out, we envisage consolidating all our inbound as well as outbound internet traffic. Beyond that, the long-range forecast points to the adoption of AWS Direct Connect to link our on-prem networks with the cloud, providing much greater guarantees of network bandwidth, latency and the like.