#############################
# Blocked State Cant Reboot #
#############################

Working with systems that are in blocked state (D state in “ps” output) – means they are waiting for IO from slow media (drives etc). There is no way to get a blocked state process to return or kill, you just need to wait it out (which sometimes will probably be forever, or a very long time). So the only solution to end blocked state is a reboot, when the system comes back try to avoid the same commands that got you in a blocked state in the first place (maybe try to figure out why they blocked and try to fix that issue – a common cause of blocked D states with BTFS systems is fragmentation on the volume). However sometimes even a reboot command will get hung up. So you can force a reboot thru the kernel using sysrq-trigger.

First make sure its safe to reboot and it wont hurt data (if it will hurt data, make sure that data is backed up and customer knows the consequences). And just try these simple reboots:

reboot
systemctl -r now
reboot -f

If above fails, here is the fun stuff, first we analyze the system and see why its hung and where its hung:

ps # or ps aux, to see which processes are in D state
dmesg > /root/dmesg.bak # backup dmesg
dmesg -c # clear dmesg 
dmesg # make sure dmesg is cleared
echo w > /proc/sysrq-trigger # generate call trace for all processes that are in blocked (D state)
dmesg # look at the output
dmesg -c # clear dmesg
dmesg # confirm dmesg is cleared
echo l > /proc/sysrq-trigger # generate CPU back trace of all the processors and the operations they are running right now
dmesg # look at the output

Analyze whats stuck. Most likely it will btrfs related. Sometimes CPU backtrace output doesnt show anything useful because its the CPU is waiting on IO, so its not going to be processing anything BTRFS related.

Lets say your trying to reboot but your reboot commands are not working?

Try this forced reboot (IGNORE THIS: dont forget to change boot-flash syslinux.cfg file contents of the “default …” line to “default TechSupport” or to “default Normal” depending on what you want – TechSupport to boot to TSM, Normal to boot to Normal mode, which is the default behavior – warning: do not have an extra space at the end of the default line or else it will not work, make sure you have no typos, or you might brick the box – you can confirm that immediately after your Normal or TechSupport is a newline ‘\n’ character by running “od -c syslinux.cfg” and lookg for something like this “d e f a u l t N o r m a l \n \n”)

echo b > /proc/sysrq-trigger # this forces a reboot without flushing/syncing or unmounting anything (I recommend to use this if filesystem errors are not going to be important, example if your going to do a factory default right after wards)

To do a more conventional safe reboot you can do this:

r*unRaw (take control of keyboard back from X11)
e*tErminate (send SIGTERM to all processes, allowing them to terminate gracefully),
i*kIll (send SIGILL to all processes, forcing them to terminate immediately),
s*Sync (flush data to disk),
u*Unmount (remount all filesystems read-only),
b*reBoot.

On servers that dont use X & thus dont usually dont work with display monitors (devices mainly controlled thru console or ethernet – such as NASes): I dont recommend running r, because we have no keyboard on these console devices, also dont run e and i because that will kill your processes (including the SSH and TELNET process I assume)

Instead just run s,u,b

echo s > /proc/sysrq-trigger # sync everything
echo u > /proc/sysrq-trigger # unmount / remount all FS as readonly
echo b > /proc/sysrq-trigger # reboot forcefully

If all else fails: reboot with power cables and power buttons

More docs:
https://www.kernel.org/doc/Documentation/sysrq.txt
http://www.thegeekstuff.com/2008/12/safe-reboot-of-linux-using-magic-sysrq-key/
https://en.wikipedia.org/wiki/Magic_SysRq_key

Leave a Reply

Your email address will not be published. Required fields are marked *