I’ve had a very boring problem for the last couple months, that I could never find the time to diagnose, till two days ago it finally got over my head. My computer would, sometimes, with no apparent reason, refuse to suspend (or actually, it would begin and then, after twenty seconds, interrupt the suspend procedure, breaking all my internet connections, and making the CPU and fans overwork).
This has been going on for a while, and even though I thought that it was linked to a VM I was working on in VirtualBox, I had no clue how to diagnose. Actually, the problem came from defunct processes trying to read a SSHFS share that was a directory in the VM. When the VM would reboot or be shut down, the SSHFS share became invalid. Having a cp process (and probably others like Thunar and ls) trying to connect to it would suffice to trigger the bug. The process would hang up, and killing it would often result in it being stuck as defunct (don’t ask me why, I don’t have the faintest idea).
So, how did this prevent Suspend2RAM from suspending my computer? Well, Suspend2RAM asks applications doing I/O activities to freeze, and it will not suspend if one of the tasks did not answer within 20 seconds, writing this message in dmesg instead :
Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
This is because the defunct process, that is still considered as currently doing I/O, is of course not able to respond to a signal. Now, this bug is annoying from a layman’s point of view and it’s pretty hard to figure out why such a behaviour can not be avoided. As far as I’m concerned, two design mistakes made this possible: