linux command – crash

Loading

通常我們可以透過以下幾種指令來檢視系統開機時間點來判斷是否有不正常的開關機.

[root@localhost ~]# who -b
         system boot  2022-07-25 18:18
[root@localhost ~]# who -r
         run-level 5  2022-07-25 18:18
[root@localhost ~]# last reboot -3
reboot   system boot  2.6.32-696.30.1. Mon Jul 25 18:18 - 15:57 (2+21:39)
reboot   system boot  2.6.32-696.30.1. Mon Jul 25 07:40 - 15:57 (3+08:17)
reboot   system boot  2.6.32-696.30.1. Mon Feb  7 07:32 - 17:30 (165+09:58)

wtmp begins Tue Jan 21 10:14:55 2020
[root@localhost ~]# uptime
 15:56:26 up 2 days, 21:38,  3 users,  load average: 0.11, 0.28, 0.20

如果是系統當機,可以透過 kdump 功能來檢視系統硬體是發生出了什麼問題.

kdump 是 Linux kernel 的一個功能,當 kernel crash 時 kdump 會把目前核心的記憶體情況儲存起來至 /var/crash/timestamp/vmcore ,之後可以透過 crash 這支程式來分析該這映像檔用以除錯.

關於 kdump 請參考 https://benjr.tw/10764

測試環境為 CnetOS 7 x86_64 (虛擬機)

crash 需要透過 yum 安裝套件 crash 與 debuginfo-install 安裝 kernel.

[root@localhost ~]# yum install crash
[root@localhost ~]# debuginfo-install kernel

crash 需要兩個參數,一是原核心,以及系統 crash kdump 所產生的檔案.

[root@localhost ~]# crash /usr/lib/debug/lib/modules/`uname -r`/vmlinux /var/crash/127.0.0.1-2018-08-10-03\:13\:56/vmcore

crash 7.2.0-6.el7
Copyright (C) 2002-2017  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [706MB]: patching 82603 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-862.9.1.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2018-08-10-03:13:56/vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Fri Aug 10 03:13:52 2018
      UPTIME: 00:01:19
LOAD AVERAGE: 1.34, 0.59, 0.22
       TASKS: 349
    NODENAME: localhost.localdomain
     RELEASE: 3.10.0-862.9.1.el7.x86_64
     VERSION: #1 SMP Mon Jul 16 16:29:36 UTC 2018
     MACHINE: x86_64  (2294 Mhz)
      MEMORY: 1 GB
       PANIC: "SysRq : Trigger a crash"
         PID: 7593
     COMMAND: "bash"
        TASK: ffff8ef8c625cf10  [THREAD_INFO: ffff8ef8f7fe4000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

上面訊息就可以得知系統是為什麼 crash 了.

  • KERNEL : 指定 crash 時運行的核心.
  • DUMPFILE : 儲存記憶體核心的檔案名稱.
  • CPUS : 系統上的 CPU 數量.
  • DATE : crash 的時間.
  • TASKS : crash 時記憶體中的 tasks 任務數.
  • NODENAME : crash 系統名稱.
  • RELEASE : 核心版本.
  • MACHINE : CPU 的總類.
  • MEMORY : crash 系統的記憶體大小.
  • PANIC : 系統發生 crash 的類型.
    • SysRq (System Request) refers to Magic Keys, which allow you to send instructions directly to the kernel.
    • Oops is a deviation from the expected, correct behavior of the kernel.
  • PID : 造成 crash 的 process ID.
  • COMMAND : 造成 crash 的 COMMAND.
  • TASK : 造成系統 crash 時該 process 所使用的記憶體位址.
  • CPU : 造成系統 crash 時該 process 所使用的 CPU 編號.
  • STATE : 造成系統 crash 時該 process 的狀態.

log – dump system message buffer

如果需要比較詳細的資料可以透過參數 log .

crash> log
....略
[   80.154338] task: ffff8ef8c625cf10 ti: ffff8ef8f7fe4000 task.ti: ffff8ef8f7fe4000
[   80.154340] RIP: 0010:[<ffffffffad62f7f6>]  [<ffffffffad62f7f6>] sysrq_handle_crash+0x16/0x20
[   80.154346] RSP: 0018:ffff8ef8f7fe7e58  EFLAGS: 00010246
[   80.154349] RAX: ffffffffad62f7e0 RBX: ffffffffaded7aa0 RCX: 0000000000000000
[   80.154351] RDX: 0000000000000000 RSI: ffff8ef8fa613978 RDI: 0000000000000063
[   80.154353] RBP: ffff8ef8f7fe7e58 R08: ffffffffae1bf8bc R09: 6873617263206120
[   80.154356] R10: 0000000000000733 R11: 0000000000000732 R12: 0000000000000063
[   80.154358] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
[   80.154361] FS:  00007f4ccc90f740(0000) GS:ffff8ef8fa600000(0000) knlGS:0000000000000000
[   80.154364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   80.154366] CR2: 0000000000000000 CR3: 000000000de08000 CR4: 00000000001607f0
[   80.154437] Call Trace:
[   80.154451]  [<ffffffffad63001d>] __handle_sysrq+0x10d/0x170
[   80.154457]  [<ffffffffad63048f>] write_sysrq_trigger+0x2f/0x40
[   80.154462]  [<ffffffffad490f50>] proc_reg_write+0x40/0x80
[   80.154467]  [<ffffffffad41b490>] vfs_write+0xc0/0x1f0
[   80.154471]  [<ffffffffad41c2bf>] SyS_write+0x7f/0xf0
[   80.154478]  [<ffffffffad920795>] system_call_fastpath+0x1c/0x21
[   80.154481] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 0f 1f 44 00 00 55 48 89 e5 c7 05 61 3c 81 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 de 
[   80.154568] RIP  [<ffffffffad62f7f6>] sysrq_handle_crash+0x16/0x20
[   80.154574]  RSP <ffff8ef8f7fe7e58>
[   80.154576] CR2: 0000000000000000

kdump 會產生兩個檔案,一為 vmcore , 另一個為 vmcore-dmesg.txt ,這檔案就如同透過 crash 使用 log 來檢視已 crash 系統的 dmesg 檔案.

bt – backtrace

透過 bt 參數可以得知,造成 crash 的 process 是做了什麼事情.

PID: 7593   TASK: ffff8ef8c625cf10  CPU: 0   COMMAND: "bash"
 #0 [ffff8ef8f7fe7ad0] machine_kexec at ffffffffad26178a
 #1 [ffff8ef8f7fe7b30] __crash_kexec at ffffffffad313bf2
 #2 [ffff8ef8f7fe7c00] crash_kexec at ffffffffad313ce0
 #3 [ffff8ef8f7fe7c18] oops_end at ffffffffad918738
 #4 [ffff8ef8f7fe7c40] no_context at ffffffffad90807e
 #5 [ffff8ef8f7fe7c90] __bad_area_nosemaphore at ffffffffad908115
 #6 [ffff8ef8f7fe7ce0] bad_area at ffffffffad9084a5
 #7 [ffff8ef8f7fe7d08] __do_page_fault at ffffffffad91b89f
 #8 [ffff8ef8f7fe7d70] do_page_fault at ffffffffad91b8e5
 #9 [ffff8ef8f7fe7da0] page_fault at ffffffffad917758
    [exception RIP: sysrq_handle_crash+22]
    RIP: ffffffffad62f7f6  RSP: ffff8ef8f7fe7e58  RFLAGS: 00010246
    RAX: ffffffffad62f7e0  RBX: ffffffffaded7aa0  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8ef8fa613978  RDI: 0000000000000063
    RBP: ffff8ef8f7fe7e58   R8: ffffffffae1bf8bc   R9: 6873617263206120
    R10: 0000000000000733  R11: 0000000000000732  R12: 0000000000000063
    R13: 0000000000000000  R14: 0000000000000004  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff8ef8f7fe7e60] __handle_sysrq at ffffffffad63001d
#11 [ffff8ef8f7fe7e90] write_sysrq_trigger at ffffffffad63048f
#12 [ffff8ef8f7fe7ea8] proc_reg_write at ffffffffad490f50
#13 [ffff8ef8f7fe7ec8] vfs_write at ffffffffad41b490
#14 [ffff8ef8f7fe7f08] sys_write at ffffffffad41c2bf
#15 [ffff8ef8f7fe7f50] system_call_fastpath at ffffffffad920795
    RIP: 00007f4ccbffacd0  RSP: 00007ffd8eaa7990  RFLAGS: 00010246
    RAX: 0000000000000001  RBX: 0000000000000002  RCX: 0000000000000000
    RDX: 0000000000000002  RSI: 00007f4ccc91c000  RDI: 0000000000000001
    RBP: 00007f4ccc91c000   R8: 000000000000000a   R9: 00007f4ccc90f740
    R10: 00007f4ccc90f740  R11: 0000000000000246  R12: 00007f4ccc2d2400
    R13: 0000000000000002  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

井號(#)編號的皆為 call trace ,當系統 crash 時會執行的 kernel functions,透過這些資料我們可以了解在 Crash 是發生了什麼事情.

其他指令的使用可以透過 help 來查詢.

crash> help

*              files          mach           repeat         timer          
alias          foreach        mod            runq           tree           
ascii          fuser          mount          search         union          
bt             gdb            net            set            vm             
btop           help           p              sig            vtop           
dev            ipcs           ps             struct         waitq          
dis            irq            pte            swap           whatis         
eval           kmem           ptob           sym            wr             
exit           list           ptov           sys            q              
extend         log            rd             task           

crash version: 7.2.0-6.el7   gdb version: 7.6
For help on any command above, enter "help <command>".
For help on input options, enter "help input".
For help on output options, enter "help output".

要離開 crash 可以使用 q 或是 exit.

crash> q
沒有解決問題,試試搜尋本站其他內容

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料