1 / 42

Linux Kernel Crash Dumps

Linux Kernel Crash Dumps. Matt D. Robinson and Tom Morano Silicon Graphics Computer Systems. Objectives LKCD Components Kernel Design Considerations Kernel Initiating Dumps Kernel Dumping Hooks/Execution Dump Initiation Code/Layout Dump Tunables Introduction to LCRASH.

morey
Télécharger la présentation

Linux Kernel Crash Dumps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linux Kernel Crash Dumps Matt D. Robinson and Tom Morano Silicon Graphics Computer Systems

  2. Objectives LKCD Components Kernel Design Considerations Kernel Initiating Dumps Kernel Dumping Hooks/Execution Dump Initiation Code/Layout Dump Tunables Introduction to LCRASH Linux Kernel Crash DumpsContents

  3. LKCD created for Linux customers, support personnel and Linux kernel engineers LKCD reduces MTBF and MTTR statistics Kernel problems are resolved more quickly As the Linux kernel becomes more complex, the need for LKCD increases Linux Kernel Crash DumpsObjectives

  4. Linux Kernel Crash DumpsLKCD Components LKCD Components • Kernel changes to configure, catch kernel failures, and save crash dumps • User level scripts to save and configure system memory to a crash dump • LCRASH, the kernel crash dump analyzer

  5. Linux Kernel Crash DumpsKernel Design Considerations The biggest design considerations were: • Dump Save Mechanism • Raw I/O vs. Buffer Cache I/O • Kernel Code Location • Dump Storage NOTE: Other crash dump products available for Linux may use different dumping methods than those described here

  6. Linux Kernel Crash DumpsKernel Design Considerations Dump Save Mechanism Crash Kernel Save Memory to Swap Space in Kernel Reset System Disk PROM Save Memory to Swap Space from PROM/BIOS Disk Reset System

  7. Linux Kernel Crash DumpsKernel Design Considerations Kernel save method chosen because: • PROM/BIOS is too architecture-specific • reset/power-off may clear memory • kernel disk driver restrictions • no disk to filesystem validation at PROM • code can be modified in kernel; PROM code is difficult to make changes for (backwards compatibility issues)

  8. Linux Kernel Crash DumpsKernel Design Considerations Raw I/O vs. Buffer Cache I/O • Buffer cache locking prevents handling dump workaround without major performance hit on basic I/O • Re-entry interrupt locking problem • Raw I/O is not fully supported in Linux yet (in the kernel) - kiobuf code needs more work • IDE, RAID, etc., drivers need raw I/O hooks (current plan is to create driver layer above to avoid necessary locking)

  9. Linux Kernel Crash DumpsKernel Design Considerations Kernel Code Location • Code changes are separated into generic and architecture-specific files • kernel/vmdump.c • arch/<arch>/kernel/vmdump.c • Additional modifications made to linux/include/sysctl.h, kernel/sysctl.c, and kernel crash hook functions

  10. Linux Kernel Crash DumpsKernel Design Considerations Dump Storage • Memory dumps are saved to swap space • Swapping during boot-up is an issue • Disk partition tables in memory -- could this cause a data corruption problem? • Cannot assume filesystem layer will be available during crash

  11. Linux Kernel Crash DumpsKernel Initiating Dumps Initiating Dump Process • Change to /proc/sys/vmdump/dumpdev calls dump_open() in kernel • dump_open() checks to ensure the • device specified is a block device • device points to a valid swap partition • device has valid character device file_operations table (currently SCSI only, due to lack of raw I/O capability for IDE disks)

  12. Linux Kernel Crash DumpsKernel Initiating Dumps • Errors in dump_open() are logged to system log buffer • Changes needed for 2.3 (without devfs) due to mismatch between block and character major/minor pairs for the same disk device • Success of dump_open() displays: dump_open(): dump device opened: 0x803 [sd(8,3)]

  13. Linux Kernel Crash DumpsKernel Dumping Hooks Kernel Hooks for Executing Crash Dump • panic() was modified to perform SMP freeze and to call dump_execute() • die_if_kernel() or die() calls dump_execute() after KDB, GDB, and show_registers() are done • NMI (Non-Maskable Interrupt) hooks still needed for systems that support the capability in hardware

  14. Linux Kernel Crash DumpsKernel Dumping Hooks Kernel Hooks and Parameters • panic(): register state is not saved, panic string is saved • die_if_kernel() or die(): registers are saved, panic string is generic (for now) • Interrupt handlers vs. non I/O request lock dumping needs to be differentiated

  15. Linux Kernel Crash DumpsKernel Dump Execution Kernel Dump Execution • dump_execute() checks to see if dumping is turned on • If DUMP_NONE is set, it returns immediately • __dump_execute(), which is architecture-specific, is called to save the dump • Within __dump_execute(), dump header values are saved, memory pages are saved, and the function returns when complete

  16. Linux Kernel Crash DumpsKernel Dump Layout Kernel Dump Layout Dump Header Dump Page Headers Dump Pages

  17. Linux Kernel Crash DumpsKernel Dump Layout • Dump header is written out first; it contains basic information about dump • Memory pages are written next, each with a page header containing • virtual address of the page in memory • size of page (important if compressed) • page flags (compressed, raw, dump end) • The last step is a re-write of the dump header which updates the total number of pages written

  18. Linux Kernel Crash DumpsKernel Dump Limitations Kernel Dump Limitations • Current interrupt crashes will lock up with re-entry to disk driver function • Dump header needs to be written out more often • Raw I/O capabilities need to be added in kernel for more disk drivers (using kiobufs, scatter-gather lists, etc.) • Page typing needed for ordered dumps • More architectures need to be supported

  19. Linux Kernel Crash DumpsKernel Recovery of Crash Dump Kernel Reboot After Crash • During early boot-up, the system runs the /etc/rc.d/rc.sysinit script, which in turn runs/sbin/vmdump • /sbin/vmdump runs with either the config or save option config sets all dump tunables and attempts to open the dump device save looks for a crash dump in dump device and saves it to disk (if requested)

  20. Linux Kernel Crash DumpsKernel /proc Tunables Kernel Tunables • /proc/sys/vmdump contains all LKCD kernel tunables • /proc/sys/kernel/panic is modified so that the system reboots after LKCD creates a crash dump • dumpdev holds the name of the dump device • dump_compress_pages determines if the memory pages should be compressed • dump_level indicates which pages to dump to disk (only three levels currently supported)

  21. Linux Kernel Crash DumpsKernel Dump Tunables • /etc/sysconfig/vmdump holds all LKCD tunables (the /proc tunables are changed automatically): DUMP_ACTIVE=1 DUMPDEV=/dev/vmdump DUMPDIR=/var/log/vmdump DUMP_SAVE=1 DUMP_LEVEL=4 DUMP_COMPRESS_PAGES=1 PANIC_TIMEOUT=5

  22. Linux Kernel Crash DumpsKernel Dump Tunables DUMP_ACTIVE Determines if the crash dump scripts should perform any actions; the default value is 1 (active). Set to 0 to not save or configure system for crash dumps

  23. Linux Kernel Crash DumpsKernel Dump Tunables DUMPDEV The name of the dump device; this typically is /dev/vmdump. NOTE: It is recommended to change what device /dev/vmdump points to rather than to change this value directly, as /dev/vmdump is normally a symbolic link.

  24. Linux Kernel Crash DumpsKernel Dump Tunables DUMPDIR The name of the directory to save dumps to; this typically is /var/log/vmdump. DUMP_SAVE Whether to save the crash dump to disk or not. The system will still be configured to save crash dumps regardless of the value of DUMP_SAVE.

  25. Linux Kernel Crash DumpsKernel Dump Tunables DUMP_LEVEL Determines how much memory (or not) should be saved in the crash dump. Default value is 4 (DUMP_ALL), although other values such as 0 (DUMP_NONE) and 1 (DUMP_HEADER) are also valid. This sets /proc/sys/vmdump/dump_level to the same value (/sbin/vmdump config).

  26. Linux Kernel Crash DumpsKernel Dump Tunables DUMP_COMPRESS_PAGES Determines whether to compress memory pages when saving memory image to disk. Defaults to 1 (compress). This sets /proc/sys/vmdump/dump_compress_pages to the same value (/sbin/vmdump config).

  27. Linux Kernel Crash DumpsKernel Dump Tunables PANIC_TIMEOUT Changes the amount of time to sleep before resetting the system after a software failure. Changes /proc/sys/kernel/panic to the same value (/sbin/vmdump config). NOTE: This value should always be non-zero; if zero, the system will spin indefinitely until it is reset by hand.

  28. Linux Kernel Crash DumpsKernel Dump Files Kernel Dump Files • vmdump.N holds the crash dump data saved from DUMPDEV; it is a copy of the memory image at the time of the system crash • map.N is a copy of /boot/System.map • Both files needed to perform crash analysis; addresses in map.N point to values in vmdump.N; if the files do not come from the same kernel build, crash analysis may be inaccurate

  29. Linux Kernel Crash DumpsIntroduction to LCRASH Overview of LCRASH • Linux system crash dump analysis tool • Provides access to kernel data in LKCD crash dumps or live system memory • Displays detailed information about a system crash • Can be used interactively or to generate system crash dump reports

  30. Linux Kernel Crash DumpsIntroduction to LCRASH LCRASH Crash Dump Report • General system information • Type of crash • Dump of the system log_buf • List of active tasks • Kernel stack trace showing the function calls leading up to the point of the crash

  31. Linux Kernel Crash DumpsIntroduction to LCRASH LCRASH Interactive Commands • For a more detailed examination of the elements of a crash • Kernel data displayed in a clear, easy-to-read manner • Invoked via an ASCII command line user interface featuring command line editing and command history • Command output can be piped to utilities such as more and grep

  32. Linux Kernel Crash DumpsIntroduction to LCRASH Examples of LCRASH commands stat Displays pertinent system information and the contents of the log_buf array vtop Displays virtual to physical address mappings ptype Displays arbitrary kernel structures from the crash dump symbol Displays kernel symbol information dump Displays the contents of system memory in a variety of bases and data sizes task Displays key information from selected tasks or all tasks running on the system trace Displays a kernel stack trace for one or more task dis Disassembles one or more machine instructions

  33. Linux Kernel Crash DumpsIntroduction to LCRASH The libklib Library Library of low-level functions providing access to the system dump and kernel symbol table • Translate virtual addresses into physical addresses • Determine the address of kernel symbols • Access memory pages in the dump or live system memory • Read in blocks of kernel data • Access kernel data type information

  34. Linux Kernel Crash DumpsIntroduction to LCRASH Accessing Kernel Symbol Information • The System.map file contains the virtual address of all kernel symbols (variables, functions, etc.) • LCRASH parses the System.map file at startup and builds an internal table of kernel symbols • Functions determine the address of a kernel symbol, or locate a symbol matching a particular address

  35. Linux Kernel Crash DumpsIntroduction to LCRASH Reading in Blocks of Data from a Dump • LCRASH can’t access data in a system dump directly • Functions read in blocks of data from a system dump or live system memory • Kernel virtual addresses are translated into physical address • Memory pages in the dump are uncompressed automatically • The desired data is then copied into an LCRASH buffer

  36. Linux Kernel Crash DumpsIntroduction to LCRASH Accessing Kernel Type Information • Facilities provided for accessing extended information in the kernel symbol table (when built using the -gstabs compiler option) • Kernel data type definitions, including type and size of kernel structure members • Data types of global variables • Function parameters • Source code line numbers of kernel functions • Most production systems are not built with the -gstabs flag

  37. Linux Kernel Crash DumpsIntroduction to LCRASH Generating Kernel Stack Traces • LCRASH is able to generate kernel stack traces without using frame pointers • Various heuristics are applied to each stack frame to determine what the PC, RA, SP, and frame pointer should be • Derived values are sanity checked to ensure they are at least reasonable • The entire stack trace is constructed before it is displayed • Most x86 kernels do not use frame pointers

  38. Linux Kernel Crash DumpsIntroduction to LCRASH • LCRASH commands for displaying kernel stack traces trace displays a stack trace for one or more active tasks strace displays an arbitrary stack trace using a PC, RA, and SP provided on the command line; or finds all valid stack trace fragments in a stack mktrace manually constructs a stack trace, frame-by-frame, using PC, RA, and SP values supplied on the command line

  39. Linux Kernel Crash DumpsIntroduction to LCRASH Location of LCRASH Source LCRASH source code was located in the kernel source tree to ensure that • LCRASH gets built along with the kernel • LCRASH uses the same configuration options and header files as the kernel • an LCRASH exists that can analyze crash dumps from a newly built kernel • any changes to kernel header files that break the LCRASH build get resolved quickly

  40. Linux Kernel Crash DumpsIntroduction to LCRASH Support for Multiple System Architectures • LCRASH impacted by differences in system architecture • Functionality and source code organized much like the Linux kernel • There are architecture dependent and architecture independent sections in both LCRASH and libklib • At the present time, i386 is fully supported (alpha and ia64 are under development)

  41. Linux Kernel Crash DumpsIntroduction to LCRASH Adding New LCRASH Commands • LCRASH was designed to make it easy to add new commands • Access to raw data in the crash dump is made through calls to libklib API functions • Provisions made for both generic and architecture specific commands

  42. Linux Kernel Crash DumpsIntroduction to LCRASH For more information about LKCD, review the web site at: http://oss.sgi.com/projects/lkcd

More Related