透過自己寫 Systemd Unit Files 方式來取代傳統的 (crontab) cron table 來啟動程式,執行一段時間卻發生程式莫名死掉了.
main process exited, code=killed, status=9/KILL
我們可以利用 Linux 指令 auditctl & ausearch 來監控是被誰殺掉 kill 了.
測試環境 CentOS 8 x86_64 (虛擬機)
程式如下.寫了一支會定期 60 秒寫入時間資料到檔案的 c++ 程式.
[root@localhost ~]# vi test.cpp #include <fstream> #include <iostream> #include <chrono> #include <ctime> #include <unistd.h> using namespace std; int main() { while(true) { ofstream myFile_Handler; // File Open myFile_Handler.open("/tmp/1.txt", std::ios_base::app); auto timenow = chrono::system_clock::to_time_t(chrono::system_clock::now()); // Write to the file myFile_Handler << ctime(&timenow) << endl; // File Close myFile_Handler.close(); // Sleep sleep(60); } }
編譯成可執行檔.
[root@localhost ~]# g++ test.cpp -o test bash: g++: command not found... Install package 'gcc-c++' to provide command 'g++'? [N/y] y
程式放到 /sbin/ 路徑.
[root@localhost ~]# cp test /sbin/
開始編輯 Systemd Unit Files (一般使用者寫的 service 檔案放在 /etc/systemd/system/ ,系統的放在 /usr/lib/systemd/system).
[root@localhost ~]# vi /etc/systemd/system/test.service [Unit] Description=Test Job [Service] Type=simple ExecStart=/sbin/test [Install] WantedBy=multi-user.target
只簡單設定3個區塊
- [unit]
Description=Test Job
Description 敘述該 Systemd Unit Files 目的.
- [Service]
Type=simple ExecStart=/sbin/test
Type=simple – A long-running process that does not background its self and stays attached to the shell.
ExecStart – 指定執行程式. - [Install]
WantedBy=multi-user.target
指定哪一個 runlevel 執行.
啟動服務
[root@localhost ~]# systemctl enable test.service Created symlink /etc/systemd/system/multi-user.target.wants/test.service → /etc/systemd/system/test.service. [root@localhost ~]# systemctl start test.service [root@localhost ~]# systemctl status test.service ● test.service - Test Job Loaded: loaded (/etc/systemd/system/test.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2022-11-25 11:02:40 CST; 7s ago Main PID: 89850 (test) Tasks: 1 (limit: 49322) Memory: 292.0K CGroup: /system.slice/test.service └─89850 /sbin/test Nov 25 11:02:40 localhost.localdomain systemd[1]: Started Test Job.
檢視 /sbin/test (定期 60 秒寫入時間資料到檔案 /tmp/1.txt) 是否有正常執行.
[root@localhost ~]# cat /tmp/1.txt Fri Nov 25 11:02:40 2022 Fri Nov 25 11:03:40 2022 Fri Nov 25 11:04:40 2022
設定監控 kill 程式.
[root@localhost ~]# auditctl -a exit,always -F arch=b64 -S kill -k kill_process [root@localhost ~]# auditctl -l -a always,exit -F arch=b64 -S kill -F key=kill_process
- -a [list,action|action,list]
- exit – Add a rule to the syscall exit list.
- always – Allocate an audit context, always fill it in at syscall entry time, and always write out a record at syscall exit time.
- -F
arch – The CPU architecture of the syscall. Supports 32 bit (b32) , 64 bit (b64) - -S
Syscall name - -k
key Set a filter key on an audit rule.
測試一下,手動把程式刪除 kill.
[root@localhost ~]# ps -aux | grep -i test root 68586 0.0 0.0 13780 1832 ? Ss 16:54 0:00 /sbin/test root 68858 0.0 0.0 12136 1156 pts/0 S+ 17:21 0:00 grep --color=auto -i test [root@localhost ~]# kill 68586 [root@localhost ~]# ps -aux | grep -i test root 68865 0.0 0.0 12136 1144 pts/0 S+ 17:21 0:00 grep --color=auto -i test
監控的資訊都會被寫入以下檔案.
[root@localhost ~]# cat /var/log/audit/audit.log
上面檔案內容資訊過多,所以我們通常是透過 ausearch 指令來搜尋,可以透過先前自訂的 key 來搜尋.
[root@localhost ~]# ausearch -k kill_process time->Fri Mar 31 17:21:11 2023 type=PROCTITLE msg=audit(1680254471.502:45035): proctitle=617564697463746C002D6100657869742C616C77617973002D46006172636800623634002D53006B696C6C002D6B006B696C6C5F70726F63657373 type=SYSCALL msg=audit(1680254471.502:45035): arch=c000003e syscall=44 success=yes exit=1068 a0=4 a1=7fff65ffb4d0 a2=42c a3=0 items=0 ppid=8029 pid=68855 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="auditctl" exe="/usr/sbin/auditctl" key=(null) type=CONFIG_CHANGE msg=audit(1680254471.502:45035): auid=0 ses=2 op=add_rule key="kill_process" list=4 res=1 ---- time->Fri Mar 31 17:21:25 2023 type=PROCTITLE msg=audit(1680254485.165:45036): proctitle="-bash" type=OBJ_PID msg=audit(1680254485.165:45036): opid=68586 oauid=-1 ouid=0 oses=-1 ocomm="test" type=SYSCALL msg=audit(1680254485.165:45036): arch=c000003e syscall=62 success=yes exit=0 a0=10bea a1=f a2=0 a3=7f1216e0c280 items=0 ppid=8021 pid=8029 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="bash" exe="/usr/bin/bash" key="kill_process"
或是透過 Syscall name 的方式來搜尋.
[root@localhost ~]# ausearch -sc kill time->Fri Mar 31 17:21:25 2023 type=PROCTITLE msg=audit(1680254485.165:45036): proctitle="-bash" type=OBJ_PID msg=audit(1680254485.165:45036): opid=68586 oauid=-1 ouid=0 oses=-1 ocomm="test" type=SYSCALL msg=audit(1680254485.165:45036): arch=c000003e syscall=62 success=yes exit=0 a0=10bea a1=f a2=0 a3=7f1216e0c280 items=0 ppid=8021 pid=8029 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="bash" exe="/usr/bin/bash" key="kill_process"
可手動把 監控 rules 刪除,或是寫到 /etc/audit/rules.d/ 讓該 rule 永久有效.
[root@localhost ~]# auditctl -D No rules
沒有解決問題,試試搜尋本站其他內容