通常我們可以透過以下幾種指令來檢視系統開機時間點來判斷是否有不正常的開關機.
[root@localhost ~]# who -b system boot 2022-07-25 18:18
[root@localhost ~]# who -r run-level 5 2022-07-25 18:18
[root@localhost ~]# last reboot -3 reboot system boot 2.6.32-696.30.1. Mon Jul 25 18:18 - 15:57 (2+21:39) reboot system boot 2.6.32-696.30.1. Mon Jul 25 07:40 - 15:57 (3+08:17) reboot system boot 2.6.32-696.30.1. Mon Feb 7 07:32 - 17:30 (165+09:58) wtmp begins Tue Jan 21 10:14:55 2020
[root@localhost ~]# uptime 15:56:26 up 2 days, 21:38, 3 users, load average: 0.11, 0.28, 0.20
如果是系統當機,可以透過 kdump 功能來檢視系統硬體是發生出了什麼問題.
kdump 是 Linux kernel 的一個功能,當 kernel crash 時 kdump 會把目前核心的記憶體情況儲存起來至 /var/crash/timestamp/vmcore ,之後可以透過 crash 這支程式來分析該這映像檔用以除錯.
關於 kdump 請參考 https://benjr.tw/10764
測試環境為 CnetOS 7 x86_64 (虛擬機)
crash 需要透過 yum 安裝套件 crash 與 debuginfo-install 安裝 kernel.
[root@localhost ~]# yum install crash [root@localhost ~]# debuginfo-install kernel
crash 需要兩個參數,一是原核心,以及系統 crash kdump 所產生的檔案.
[root@localhost ~]# crash /usr/lib/debug/lib/modules/`uname -r`/vmlinux /var/crash/127.0.0.1-2018-08-10-03\:13\:56/vmcore crash 7.2.0-6.el7 Copyright (C) 2002-2017 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [706MB]: patching 82603 gdb minimal_symbol values KERNEL: /usr/lib/debug/lib/modules/3.10.0-862.9.1.el7.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2018-08-10-03:13:56/vmcore [PARTIAL DUMP] CPUS: 2 DATE: Fri Aug 10 03:13:52 2018 UPTIME: 00:01:19 LOAD AVERAGE: 1.34, 0.59, 0.22 TASKS: 349 NODENAME: localhost.localdomain RELEASE: 3.10.0-862.9.1.el7.x86_64 VERSION: #1 SMP Mon Jul 16 16:29:36 UTC 2018 MACHINE: x86_64 (2294 Mhz) MEMORY: 1 GB PANIC: "SysRq : Trigger a crash" PID: 7593 COMMAND: "bash" TASK: ffff8ef8c625cf10 [THREAD_INFO: ffff8ef8f7fe4000] CPU: 0 STATE: TASK_RUNNING (SYSRQ)
上面訊息就可以得知系統是為什麼 crash 了.
- KERNEL : 指定 crash 時運行的核心.
- DUMPFILE : 儲存記憶體核心的檔案名稱.
- CPUS : 系統上的 CPU 數量.
- DATE : crash 的時間.
- TASKS : crash 時記憶體中的 tasks 任務數.
- NODENAME : crash 系統名稱.
- RELEASE : 核心版本.
- MACHINE : CPU 的總類.
- MEMORY : crash 系統的記憶體大小.
- PANIC : 系統發生 crash 的類型.
- SysRq (System Request) refers to Magic Keys, which allow you to send instructions directly to the kernel.
- Oops is a deviation from the expected, correct behavior of the kernel.
- PID : 造成 crash 的 process ID.
- COMMAND : 造成 crash 的 COMMAND.
- TASK : 造成系統 crash 時該 process 所使用的記憶體位址.
- CPU : 造成系統 crash 時該 process 所使用的 CPU 編號.
- STATE : 造成系統 crash 時該 process 的狀態.
log – dump system message buffer
如果需要比較詳細的資料可以透過參數 log .
crash> log ....略 [ 80.154338] task: ffff8ef8c625cf10 ti: ffff8ef8f7fe4000 task.ti: ffff8ef8f7fe4000 [ 80.154340] RIP: 0010:[<ffffffffad62f7f6>] [<ffffffffad62f7f6>] sysrq_handle_crash+0x16/0x20 [ 80.154346] RSP: 0018:ffff8ef8f7fe7e58 EFLAGS: 00010246 [ 80.154349] RAX: ffffffffad62f7e0 RBX: ffffffffaded7aa0 RCX: 0000000000000000 [ 80.154351] RDX: 0000000000000000 RSI: ffff8ef8fa613978 RDI: 0000000000000063 [ 80.154353] RBP: ffff8ef8f7fe7e58 R08: ffffffffae1bf8bc R09: 6873617263206120 [ 80.154356] R10: 0000000000000733 R11: 0000000000000732 R12: 0000000000000063 [ 80.154358] R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000 [ 80.154361] FS: 00007f4ccc90f740(0000) GS:ffff8ef8fa600000(0000) knlGS:0000000000000000 [ 80.154364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 80.154366] CR2: 0000000000000000 CR3: 000000000de08000 CR4: 00000000001607f0 [ 80.154437] Call Trace: [ 80.154451] [<ffffffffad63001d>] __handle_sysrq+0x10d/0x170 [ 80.154457] [<ffffffffad63048f>] write_sysrq_trigger+0x2f/0x40 [ 80.154462] [<ffffffffad490f50>] proc_reg_write+0x40/0x80 [ 80.154467] [<ffffffffad41b490>] vfs_write+0xc0/0x1f0 [ 80.154471] [<ffffffffad41c2bf>] SyS_write+0x7f/0xf0 [ 80.154478] [<ffffffffad920795>] system_call_fastpath+0x1c/0x21 [ 80.154481] Code: eb 9b 45 01 f4 45 39 65 34 75 e5 4c 89 ef e8 e2 f7 ff ff eb db 0f 1f 44 00 00 55 48 89 e5 c7 05 61 3c 81 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 0f 1f 44 00 00 55 31 c0 c7 05 de [ 80.154568] RIP [<ffffffffad62f7f6>] sysrq_handle_crash+0x16/0x20 [ 80.154574] RSP <ffff8ef8f7fe7e58> [ 80.154576] CR2: 0000000000000000
kdump 會產生兩個檔案,一為 vmcore , 另一個為 vmcore-dmesg.txt ,這檔案就如同透過 crash 使用 log 來檢視已 crash 系統的 dmesg 檔案.
bt – backtrace
透過 bt 參數可以得知,造成 crash 的 process 是做了什麼事情.
PID: 7593 TASK: ffff8ef8c625cf10 CPU: 0 COMMAND: "bash" #0 [ffff8ef8f7fe7ad0] machine_kexec at ffffffffad26178a #1 [ffff8ef8f7fe7b30] __crash_kexec at ffffffffad313bf2 #2 [ffff8ef8f7fe7c00] crash_kexec at ffffffffad313ce0 #3 [ffff8ef8f7fe7c18] oops_end at ffffffffad918738 #4 [ffff8ef8f7fe7c40] no_context at ffffffffad90807e #5 [ffff8ef8f7fe7c90] __bad_area_nosemaphore at ffffffffad908115 #6 [ffff8ef8f7fe7ce0] bad_area at ffffffffad9084a5 #7 [ffff8ef8f7fe7d08] __do_page_fault at ffffffffad91b89f #8 [ffff8ef8f7fe7d70] do_page_fault at ffffffffad91b8e5 #9 [ffff8ef8f7fe7da0] page_fault at ffffffffad917758 [exception RIP: sysrq_handle_crash+22] RIP: ffffffffad62f7f6 RSP: ffff8ef8f7fe7e58 RFLAGS: 00010246 RAX: ffffffffad62f7e0 RBX: ffffffffaded7aa0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8ef8fa613978 RDI: 0000000000000063 RBP: ffff8ef8f7fe7e58 R8: ffffffffae1bf8bc R9: 6873617263206120 R10: 0000000000000733 R11: 0000000000000732 R12: 0000000000000063 R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff8ef8f7fe7e60] __handle_sysrq at ffffffffad63001d #11 [ffff8ef8f7fe7e90] write_sysrq_trigger at ffffffffad63048f #12 [ffff8ef8f7fe7ea8] proc_reg_write at ffffffffad490f50 #13 [ffff8ef8f7fe7ec8] vfs_write at ffffffffad41b490 #14 [ffff8ef8f7fe7f08] sys_write at ffffffffad41c2bf #15 [ffff8ef8f7fe7f50] system_call_fastpath at ffffffffad920795 RIP: 00007f4ccbffacd0 RSP: 00007ffd8eaa7990 RFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 00007f4ccc91c000 RDI: 0000000000000001 RBP: 00007f4ccc91c000 R8: 000000000000000a R9: 00007f4ccc90f740 R10: 00007f4ccc90f740 R11: 0000000000000246 R12: 00007f4ccc2d2400 R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
井號(#)編號的皆為 call trace ,當系統 crash 時會執行的 kernel functions,透過這些資料我們可以了解在 Crash 是發生了什麼事情.
其他指令的使用可以透過 help 來查詢.
crash> help * files mach repeat timer alias foreach mod runq tree ascii fuser mount search union bt gdb net set vm btop help p sig vtop dev ipcs ps struct waitq dis irq pte swap whatis eval kmem ptob sym wr exit list ptov sys q extend log rd task crash version: 7.2.0-6.el7 gdb version: 7.6 For help on any command above, enter "help <command>". For help on input options, enter "help input". For help on output options, enter "help output".
要離開 crash 可以使用 q 或是 exit.
crash> q
沒有解決問題,試試搜尋本站其他內容