Talk about technical debt. About two decades ago, processors sold by AMD needed a workaround to help support a then-new configuration standard called ACPI, or the Advanced Configuration and Power Interface. And that workaround turned into a major headache in the modern day.
On Linux circa 2002, AMD hardware needed a little help to support ACPI, so a technical workaround called a “dummy wait op” was added, which essentially waits for the CPU to complete a task entirely before shooting more instructions its way.
This was what was necessary to make single-core Athlon processors work well with Linux in 2002. But the problem is, we’re in a world where even low-end processors have a whole bunch of cores and can safely work around issues like this, with no problem. AMD just announced the seventh generation of Ryzen, and that, along with modern generations of EPYC and Threadripper, are still being held up by this dummy wait op, despite it not being a technical problem for the processor lines in years. While 2002-era Athlons had legitimate technical reasons why they needed it, we are not in 2002 anymore, and that “dummy wait op” is actually holding the chips back in certain cases.
As AMD engineer Prateek Nayak put it in a patch submission:
However, sampling certain workloads with IBS on AMD Zen3 system shows that a significant amount of time is spent in the dummy op, which incorrectly gets accounted as C-State residency. A large C-State residency value can prime the cpuidle governor to recommend a deeper C-State during the subsequent idle instances, starting a vicious cycle, leading to performance degradation on workloads that rapidly switch between busy and idle phases.
So, depending on the task, to put it all another way, AMD chips have been getting slowed down significantly because of a presumption that was only true for only a short time but has been treated as true in the kernel code for decades.
Dave Hansen, an Intel employee who works closely on the Linux kernel, noted in his patch addition to the kernel that most modern systems do not actually need this workaround—and that Intel hasn’t needed it for years.
“First and foremost, modern systems should not be using this code. Typical Intel systems have not used it in over a decade because it is horribly inferior to MWAIT-based idle,” he wrote. “Despite this, people do seem to be tripping over this workaround on AMD system today.”
Now, to be fair, Linux code is a complex beast, and it is often difficult to assess where legacy code may actually be hurting performance in certain cases. In this particular case, it took AMD a while to assess exactly where the holdup was coming from, and then what would be necessary to fix it.
Linux is great because of its broad support of platforms new and old, but the challenge is that these fixes intended for very old systems can creep out of the woodwork and create challenges for new ones. This is a 30-year-old code base, and sometimes getting it to fighting shape in 2022 means managing the technical debt of two decades ago.
Anyway, for AMD owners: Enjoy your faster performance on version 6.0 of the Linux kernel.
Time limit given ⏲: 30 minutes
Time left on clock ⏲: 1 minute, 18 seconds