Compute Virtualization (such as EC2) works by dividing the physical machine (and thus the CPU) into many virtual machines, that each get served a slice of the physical machine (a high performance CPU with 4 cores might get divided into 8 virtual machines for example). To keep the total available compute power fair between the virtualized machines (while allowing virtualized machines make use of additional unused CPU cycles), something called CPU steal time was invented (by IBM I believe, that invited the whole concept of Virtual Machines).
Many web servers for many startups and companies in the world run on Amazon EC2. But during the past year (I believe, I have no clear evidence when they started this), Amazon EC2 started to change how they prioritize CPU cycles for the virtualized machine instances they offer as part of their Elastic Compute Cloud (EC2).
Amazon started to use a more “batch workload” geared prioritization model for their CPU instances (mainly micro and small from what I have seen) instances. Batch type division of CPU resources are not suited for real-time tasks such as a Web server, why? Keep reading.
Batch workloads vs. Real-time workloads
The batch workload model looks at a task (or a multitude of tasks) to run until completion, and for a batch style job if it runs fairly fast for the first hour and at 50% (of the CPU capacity) the second hour is quite ok, vs running the jobs at 75% for 2 hours. It will not affect the overall performance or completion time for processing those batch jobs (as long as the total allotted CPU time is the same). For most batch model jobs it is even ok if they are able to burst CPU usage (over the allotted CPU time) for periods of heavy usage, and then penalized (run at perhaps 20% of the “normal” speed), again total amount of CPU available to process the jobs is what is important.
For real-time workloads it is quite different, take a web server if it runs at adequate speed for 3 hours, and then there is heavy usage (it can make use of a burst of additional CPU cycles), but if it happens at the cost of 25% of “adequate” performance level being available for 1 hour it will adversely effect how well your web server can server web pages For most situations that are real-time, or semi real-time like web servers, it is often better to have a constant available CPU performance (more or less, with very little penalization of steal time). And add more machines as needed.
Why does this matter? And how is the virtualized model different from physical ones? Because there is no “steal time” to make up for overage usage from your virtual machine.
What I have seen with EC2 lately is that they do these type of 1h long penalizations from overage usage or over allotment. And often to an extent where the web servers almost stop responding.
I wish Amazon either would step back to the old model of how they prioritize CPU usage (which still use the steal time model, but never got “Steal Times” of 90% ore more for 2 hours+) or… Give me more control over how this is done.
Right now I am considering starting to move servers away from Amazon EC2 (in spite of all the benefits) just for this reason.