First Gen AMD Ryzen Kernel Freezing Bug
Image by enriquelopezgarre from Pixabay
The Problem
I have been experiencing a hard freeze on my first gen Ryzen 1700x system which has been very annoying. This is the kind of freeze where I can’t even drop into a virtual terminal or do an Alt-SysRq.
This happens on both my Debian 10 and Ubuntu 19.04 installs running kernel versions 4.19 and 5.0.0 respectively. It would happen almost randomly but prominently when my CPU was idle and not doing much.
I assumed this was a bug in the firmware/bios of my mobo or in the kernel that was causing such a horrible lockup, so I did some digging online. I found this post on the Level1Techs forum that was more or less the same problem that I was facing.
I also found posts on AMD’s website, AMD’s sub reddit, and kernel.org
It seemed that I was not alone in experiencing this bug related to CPU power states.
Possible Fixes / Workarounds
Fix 1: Disable C-States With Kernel Parameters
On a distro that uses GRUB, edit /etc/default/grub
go ahead and add the
following options:
GRUB_CMDLINE_LINUX="processor.max_cstate=1 idle=nomwait rcu_nocbs=0-n"
“n” is going to be (number of cpus) - 1. If you are unsure how many cores your cpu has, run
nproc
.
After that run update-grub
or grub-mkconfig -o /boot/grub/grub.cfg
as root and reboot your system and the changes should be in place.
Explanation
Adding ‘processor.max_cstate=1’ will ensure that your CPU will not go into sleep states which seems to be the cause of the halting on these early Ryzen CPUs. I have had success with this option. There are no guarantees it will work for you – but I recommend trying it.
Adding ‘rcu_nocbs=0-n’ limits the number of CPU cores that the kernel assigns to handle softirqs (software interrupts), but note this will only work if your kernel was compiled with this option – I think this is the case with most distros. Read more about RCU’s here.
Adding ‘idle=nomwait’ will “Disable mwait for CPU C-states” as a possible mitigation for the issue.
Fix 2: Disable C-States With A Python Script
Github user r4m0n made a handy Python script that will let you check or change the C-States on your CPU. This is a good option if you don’t want to disable C-States in the BIOS or for confirming that CPU C-States are actually disabled.
Fix 3: Disable C-States and Idle Power Settings In BIOS/UEFI
Other suggested fixes include tweaking settings in the BIOS such as turning off C-States (Power States) for the CPU and idle power settings for the PSU if applicable.
They should be under a menu called “AMD CBS”
System Specs
For reference, here are the specs of my machine
System: Kernel: 4.19.0-5-amd64 x86_64
Distro: Debian GNU/Linux 10 (buster)
Mobo: ROG STRIX B350-F GAMING
CPU: AMD Ryzen 7 1700X
MEM: 32GB
Graphics: AMD Radeon RX 480 Graphics
(POLARIS10 DRM 3.27.0 4.19.0-5-amd64 LLVM 7.0.1)
v: 4.5 Mesa 18.3.6
Driver: amdgpu
If none of these workarounds help, some users reported success from opting to RMA their CPU’s for newer models.
November 2020 Update
Looking at recent kernel bug reports, this seems to be an ongoing CPU firmware bug in Linux that is still affecting first gen Ryzen CPUs out there.
I have been running Arch on the same hardware with kernel version 5.9.4 as of writing.
Running with processor.max_cstate=1
with no lockups in a heck of a long
time, but I have seen other’s say that this doesn’t work for them.
I can also confirm that there is no issue with the CPU on Microsoft Windows 10. It seems like a bug somewhere in the Linux’s microcode/firmware for power management in AMD CPUs. AMD has also not given a definitive answer/fix yet which is unfortunate.