This blog post explains the need for resource control in a system, and how developers can use the resctl-demo program to build an intuitive working knowledge of resource control that they can apply to improve server reliability, responsiveness and utilization.
Most modern systems, such as a server, a laptop, or even a phone, run multiple things at the same time. It can be a web browser and some compilation job, a web server and a host of monitoring and management jobs or multiple database instances. Let’s say you’re running web servers and you had a couple of outages in the past where a malfunctioning monitoring application kept eating up memory, eventually driving the whole system into the ground through thrashing. You’d have likely been so much happier if the system had killed and restarted the monitoring application instead of bringing down the whole system. There’s got to be a way to do that, right?
One solution lies in protecting the web server’s memory against the rest of the system. Limiting the maximum memory allowed for the rest of the system is one way, but finding the right limit value is challenging given how dynamic everything is. The web server isn’t always at full load and it’d be better to use any available memory for the rest of the system.
Let’s say you decide to use
memory.low to protect the web server’s memory, coupled with out-of-swap kill in
systemd-oomd so that the malfunctioning application is killed after the exhausting swap. However, after deploying this configuration, while the machines stop going down, you find that the web servers are still severely impacted when the monitoring application malfunctions. Why would that be?
The idea behind resource control is to optimize how a system’s resources are distributed across workloads. Resource control allows different tasks to share system resources without interfering with one another. This allows the system to host multiple workloads while preventing them from stepping on each other's toes. In Linux, cgroup (Control Groups) provides the fundamental resource control features. cgroup organizes workloads on a system into a tree structure, and configures the resource distribution across the hierarchy.
However, sometimes the interactions between resources and configuration can be complex and counterintuitive. So, the kernel team at Meta developed resource control demo, or resctl-demo, a program that can reproduce various resource contention scenarios using simulated workloads, and demonstrates and documents the solutions.
Let’s see whether we can simulate the above web server problem using resctl-demo:
In the above image, the graph on the middle left pane, and the command & documentation panel on the bottom right are the only things that matter for our purpose. We can see that rd-hashd is enabled at Load level 100% on the right. rd-hashd is a simulated workload that mimics the memory and IO requirements of a typical production web server at Meta. You can also see that only memory protection is turned on, which applies appropriate
memory.low protection. On the graph pane, the green line represents the throughput and the blue line represents the average per-request latency of our web server stand-in. You can see that the web server is ramped up with load close to 100% and latency close to 75 milliseconds, which is the latency we’re targeting.
The cursor (green highlight) is already on the Cold memory hog trigger. Let’s press enter and see what happens:
Note that the time scale on the graph has been compressed by pressing
t to show a wider range. The latency spiked, and the RPS dropped close to zero. The ~100 seconds of disruption is not as catastrophic as the whole machine going down, but still far from ideal. Let’s take a look at the graph view by pressing
The two graphs on the right side show that the memory and IO pressures rose over the disruption. We protected rd-hashd’s memory but not IO. rd-hashd has its baseline IO requirement, which is pushed higher by the modulated, but still increased competition for memory. Because we aren’t controlling IO, the memory hog can adversely affect rd-hashd. Let’s enable IO control and repeat the same scenario:
Other than enabling the IO protection, everything, including the timing, is similar to the previous run. However, there’s no drastic RPS drop. Let’s look at the same graph view to see what’s different:
workload slice that contains rd-hashd only experiences minimal memory and IO pressures while the
system slice that hosts the memory hog is experiencing very high memory and IO pressures. We are protecting the main workload by throttling the auxiliary part of the system as we originally intended.
This example shows how memory and IO controls interact with each other. As you can see, it is not enough to just protect memory even when memory is the source of contention.
We have worked on resource control in our fleet for many years, improving various cgroup features, developing powerful tools and refining best practices. We have also tried very hard to keep resource control methods as generic and robust as possible so that they can easily be applied to systems in any domain, not just specialized server setups.
The resctl-demo program contains comprehensive explanations with live demo scenarios on various aspects of resource control. To make testing easier, we’ve provided an Amazon Machine Image (AMI) that can be used to launch a fully configured instance on AWS, and a USB drive image that can be installed on a local machine. You can access the pre-made system images and the documentation of various demo scenarios at the Resource Control Demo site.
Watch a demo of the web server problem and its solution in the following video: