Kernel Crash Dumps
This article explains how to capture the kernel crash dumps (also known as kdumps). Kdumps are produced by kernel panic or lockup. To be simple, just a single kernel is used both for the ordinary system and recovery. The described method is almost distribution independent.
This article is based on KDump on Gentoo by Richard Freeman (rich0) , and the first version is posted by the author.
Installation
Kernel
Activate the following kernel options:
CONFIG_KEXEC, CONFIG_CRASH_DUMP, CONFIG_RELOCATABLE
CONFIG_DEBUG_KERNEL, CONFIG_DEBUG_INFO
CONFIG_PROC_FS, CONFIG_PROC_KCORE, CONFIG_PROC_VMCORE
Processor type and features --->
[*] kexec system call
[*] kernel crash dumps
[*] Build a relocatable kernel
Kernel hacking --->
[*] Kernel debugging
Compile-time checks and compiler options --->
[*] Compile the kernel with debug info
File systems --->
Pseudo filesystems --->
-*- /proc file system support
[*] /proc/kcore support
[*] /proc/vmcore support
CONFIG_PHYSICAL_START might need to be set greater than 2 MB (
0x200000
) on some motherboards to offset the kernel's memory space enough to avoid the BIOS clobber. Try setting 0x1000000
(16 MB) if the above Kernel options are not working as expected.USE flags
USE flags for sys-apps/kexec-tools Load another kernel from the currently executing Linux kernel
booke
|
Include support for Book-E memory management |
lzma
|
Enables support for LZMA compressed kernel images |
selinux
|
!!internal use only!! Security Enhanced Linux support, this must be set by the selinux profile or breakage will occur |
xen
|
Enable extended xen support |
zlib
|
Add support for zlib compression |
Emerge
Merge:
root #
emerge --ask sys-apps/kexec-tools
Configuration
local.d script
Create /etc/local.d/kdump.start containing:
#!/bin/bash
kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll maxcpus=1 reset_devices"
Your system may require core headers in ELF32 or ELF64 format for the kernel to boot. Check the manpage for details.
When using an initramfs, a reference to it will need passed as a parameter. For example:
#!/bin/bash
kexec -p /boot/kernel-genkernel-x86_64-3.16.1-gentoo \
--initrd=/boot/initramfs-genkernel-x86_64-3.16.1-gentoo \
--append="root=/dev/mapper/lvm-slash single irqpoll maxcpus=1 reset_devices dolvm softlevel=kdump"
Now make this file executable:
root #
chmod u+x /etc/local.d/kdump.start
Note the kernel has to be readable. A typical Gentoo configuration leaves /boot unmounted, so either remove noauto from the /etc/fstab file or place a copy of the kernel in a place that is mounted during a crash.
Bootloader
Add the crashkernel=64M nokaslr
argument to the kernel command-line via the bootloader (most likely GRUB) for systems with up to around 12 GB of RAM.
nokaslr
disables KASLR security feature. You can omit this option, but then you will have to manually load symbols from all kernel sections in gdb because kernel location is randomized.Usage
First, run the above script:
root #
/etc/local.d/kdump.start
It loads the rescue kernel image which is run after kernel crash.
Whenever a kernel panic or lockup (hard/soft if the kernel is set to detect them) occurs, kexec runs the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file - this is the crash dump. Then reboot the system to get back to a normal configuration; the system might not be stable and should not continue to operate in this state.
A kernel panic can be forced on demand by executing the following command (do not forget to save all data, log-out other users, and leave the filesystems in a clean state by the invocation of the sync command before doing this):
root #
echo c | tee /proc/sysrq-trigger
Troubleshooting
Kernel is not loading
If the kernel is not loading when kexec is called, check to to see if kernel compression was set to xz (lzma) format.
If xz compression is used the sys-apps/kexec-tools package will need to be re-emerged with the lzma
USE flag enabled.
VGA not resetting
After loading a kexec crash kernel and after a kernel panic kexec does not appear to load the crash kernel. The output on the display freezes.
This might be caused by the VGA port not being reset. The solution may be to tell kexec to reset the display output on the VGA port. Something like the following could work (the important options being --reset-vga --console-vga
):
root #
kexec -p /boot/kernel-gentoo --initrd=/boot/initramfs-gentoo --reset-vga --console-vga --command-line="root=/dev/sda3 maxcpus=1 irqpoll"