8 min read 10 October 2018

EC2 in AWS' technological ocean

Let's take a look at the architectural changes which have made EC2 the fundamental part of most environments launched in AWS.

Amazon EC2

Karol Junde CTO & Co-Founder

EC2 is a core service in AWS’ roster. It has gone through a long, bumpy road, and it differs much from its archetype. Let’s take a look at the architectural changes which have made EC2 the fundamental part of most environments launched in AWS.

5m below sea level — PV

Generally, it’s believed that AWS started with Xen virtualization type. They began with PV (paravirtualization) which can be considered a lighter form of virtualization. The key factor is the provisioning of near-native speed in comparison to full virtualization. However, there’s a need for the guest system to be modified to provide awareness of hypervisor and to make efficient hypercells. In other words, these modifications allow the hypervisor to export a modified version of the underlying hardware to an instance. It’s considered a drawback, as you can imagine a situation where you’d like to recover or build an EC2 in another AWS region. You’d need to find a matching kernel — which might be time-consuming and laborious.

Another thing worth to keeping in mind is that in such a scenario the kernel would make hypercells instead of well-known privileged instructions, which in turn would lead to significant overheads. Additionally, the storage and network drivers, used by system would be para-virtualized.

10m below sea level — HVM

Having PV onboard, AWS started using another hypervisor configuration which was called HVM. In this scenario, the guest system runs as if it was placed on a bare metal platform and it’s not aware that it’s sharing processing time with other clients on common hardware.

At the end of the day, HVM can use hardware extensions which provide fast access to underlying hardware without any modifications. If there was a need to take advantage of enhanced networking and GPU processing then HVM is your best choice.

12,5m below sea level — PVHVM

PVHVM brought another improvement to the situation. Firstly, as you’ve probably noticed, paravirtual guest systems tend to perform better with network and storage operations than their younger sibling HVM. Higher performance is achieved via the usage of special drivers for I/O instead of emulating network and disk hardware (as in the case of HVM) which significantly reduces the final overhead. After PVHVM release, PV drivers started to be available for HVM guests. Those drivers are also called PV on HVM, which are commonly known as paravirt drivers that use HVM features.

After this change I was a little bit confused: the AWS presented instances that could be run as PV or HVM but as you dig in deeper you come to realize that both options were available: HVM could boot and then run PV drivers or alternatively just run PVHVM (paravirt on HVM drivers). The conclusion is, quite simply, that the instances with HVM label in AWS are HVM with PVHVM drivers.

The key argument, saying that HVM is slower than PV and isn’t sing as much paravirt, became untrue and some companies using AWS on a large scale began seeing benefits in the transition from PV to PVHVM for their workloads, mainly relying on CPU and memory levels.

15m below sea level — towards enhanced networking

In 2013, after the journey with hypervisors, AWS started introducing instances with hardware virtualization support for network interfaces called SR-IOV(Single Root I/O Virtualization). In SR-IOV model each NIC is a physical function (PF) with full-features PCIe functions and has multiple virtual functions (VF), lightweight PCIe functions which share the PF resources. VFs were designed solely to move the data in and out. The main outcome was the reduction of I/O bottleneck and, of course, a tradeoff between the number of virtual machines a physical server can realistically support. Contrary to Xen virtualized driver, SR-IOV driver running on a cloud instance can perform DMA (Direct Memory Access) to NIC hardware to achieve better performance. What’s really important here is that the DMA operation from the device to Virtual Machine memory does not compromise the safety of the underlying hardware.

Initially, AWS implemented Intel 82599 with speed up to 10Gbps and after ENA (Enhanced Networking Adapter) driver announcement in 2016, they boosted it up to 25Gbps, reduced latency and increased the packet rate. After the first release in 2014, it was a stunning breakthrough but the world was still waiting for hardware virtualization for volumes.

AWS long-term goals regarding ENA driver were to take advantage of the higher bandwidth options in the future without the need to install newer drivers or to make other changes to the configuration. One of the features which enabled the performance improvement was Receive Side Steering.

15,5m below sea level — receive side steering

To start from the beginning, we should explain two basic terms: hardware interrupt and softirq. Hardware interrupt is generally a signal from a device that is sent to the CPU when the device needs to perform an input or output operation. In other words, the device ‘interrupts’ the CPU to draw its attention when the CPU is doing something else. Then we’ve got softirq which is similar to hardware interrupt request but not as critical. So, when the data is copied to socket buffer it arrives at NIC, interrupt signal is generated for the CPU. After the interruption, the processor’s interrupt service routine reads the Interrupt Status Register to determine what type of interrupt signal occurred and what action needs to be taken. After that, the acknowledgement to NIC is sent with the message “Hey, I’m ready to serve.” Basically, for this reason, ‘interrupt’ work is combined with 2 things:

First one, where the CPU ACK NIC saying “Hey, I’ve got it” at which point hardware interrupt is completed and the NIC returns to the previous job. What‘s worth mentioning is that hardware ‘interrupt’ needs to be quick so the system isn’t held up by prolonged interruptions.
Second one, where CPU’s backlog queue is put as softirq so whenever it gets chance, it starts processing and moving the packet up to TCP/ IP stack.

Receive side steering is a hardware implementation of NIC which enables a single NIC rx queue to receive softirq workload distributed among several CPUs which in effect prevents network traffic from being bottlenecked on a single NIC hardware queue. In case of mono-queues only, the hardware interrupt generated is from a single queue and the same CPU is also responsible for processing softIRQ.

Scenarios with bottleneck also happened with RPS enabled on mono-queue, (Receive Packet Steering which is software implementation done for NICs), where the incoming packets are hashed, the load is distributed across multiple CPU processors. Therefore, ec2 equipped with ENAs launched a new era with greater performance, achieved via such improvements like described RSS.

20m below sea level — deeper EBS optimization

As I’ve mentioned before, ENA provided enhancements in ec2 networking part but the world expected new solutions for handling volumes. For this reason, Amazon Web Services launched EBS optimized instances, with i3 instances, which allowed its users to have dedicated link from an instance to EBS service, instead of sharing one with other AWS services. They used SR-IOV and the nvme (non-volatile memory host) storage driver.

And what is NVMe? So, NVMe is an interface protocol for accessing flash storage via a PCIe bus. Unlike the traditional all-flash architectures which are limited to a single queue, NVMe supports tens of thousands of parallel queues, each with the ability to support tens of thousands of concurrent commands.

With Xen hypervisor underneath, domO is involved in the I/O path. Dom0 as mgmt VM uses nvme driver to access EBS volume, whereas an active ec2 instance, Xen paravirtual split driver model for block network is used to handle I/O. Dom0 takes the request over shared ring from ec2, compiles it and modifies each request as nvme request. After completion dom0 sends the response to ec2.

The main goal of Nitro card release was to provide performance close to bare metal.

22,5m below sea level — local NVMe storage Nitro

In 2017 AWS implemented local nvme nitro cards which protected from unauthorized traffic from local ec2 instance to flash storage. Generally, the key that was mandatory for data retrieving from a flash device, was stored only on a nitro card instead of flash. Therefore, whenever an instance was a key was destroyed and data written on flash device irretrievable.

25m below sea level — 25Gbps enhanced networking

With the announcement the Intel chip elimination, AWS moved into a new era of high performance and scalable architecture. They left only the Nitro card where all the processes required for communication with EBS volumes and cryptographic part took place. It brought a lot of benefits, among which were the more available resources on ec2 instances — around 12,4% more for the largest c4 instances. It was all possible because there was no need for dedicated cores in dom0 for taking requests over shared ring from ec2 instances. A good example was the c5 instance which benefited from Nitro card not only in terms of networking but also of EBS connectivity. It used a combination of EBS nvme (custom silicon cards by Annapurnalabs) and ENA for enhanced networking and eliminated the need for dom0 management VM.

50m below sea level — bare metal!

The most recent instances, announced by AWS during its last re:invent conference, were bare metal (i3.metal), adding no performance overhead. It’s a great choice for applications that need to run in non-virtualized environments, because of the licensing requirements.

As you see Amazon has come a long way to be in a place it’s right now, providing their customers a wide variety of instances which allow to meet different kinds of requirements. Time will tell what their next step will be, but I am pretty sure that they haven’t said their last word in this era.