一支簡單的 copy 程式花了很多的執行時間,透過 top 來檢視,卻發現程式大部分的時間花在 wa : iowait (CPU 花在等待 I/O 的百分比),完全不知道為什麼.
測試環境為 CentOS7 虛擬機
程式到底在執行哪一些呼叫,在 Linux 環境下可以透過 perf – Performance analysis tools for Linux 來檢視,核心已包含 perf 功能,只需要安裝工具及可以開始統計.
[root@localhost ~]# yum -y install perf
下面是常用的幾個指令
- perf stat
在 perf stat 指令後面接著要觀察的指令 stressapptest ,待程式執行完畢會回報執行統計數據.[root@localhost ~]# perf stat stressapptest -s 20 -M 256 -m 8 -C 8 -W 2018/04/02-22:15:38(EDT) Log: Commandline - stressapptest -s 20 -M 256 -m 8 -C 8 -W 2018/04/02-22:15:38(EDT) Stats: SAT revision 1.0.7_autoconf, 64 bit binary 2018/04/02-22:15:38(EDT) Log: root @ localhost.localdomain on Mon Apr 2 21:28:30 EDT 2018 from open source release 2018/04/02-22:15:38(EDT) Log: 1 nodes, 1 cpus. 2018/04/02-22:15:38(EDT) Log: Prefer plain malloc memory allocation. 2018/04/02-22:15:38(EDT) Log: Using mmap() allocation at 0x7f9a52534000. 2018/04/02-22:15:38(EDT) Stats: Starting SAT, 256M, 20 seconds 2018/04/02-22:15:41(EDT) Log: region number 1 exceeds region count 1 2018/04/02-22:15:41(EDT) Log: Region mask: 0x1 2018/04/02-22:15:51(EDT) Log: Seconds remaining: 10 2018/04/02-22:16:07(EDT) Stats: Found 0 hardware incidents 2018/04/02-22:16:07(EDT) Stats: Completed: 104172.00M in 26.21s 3974.44MB/s, with 0 hardware incidents, 0 errors 2018/04/02-22:16:07(EDT) Stats: Memory Copy: 104172.00M at 5223.94MB/s 2018/04/02-22:16:07(EDT) Stats: File Copy: 0.00M at 0.00MB/s 2018/04/02-22:16:07(EDT) Stats: Net Copy: 0.00M at 0.00MB/s 2018/04/02-22:16:07(EDT) Stats: Data Check: 0.00M at 0.00MB/s 2018/04/02-22:16:07(EDT) Stats: Invert Data: 0.00M at 0.00MB/s 2018/04/02-22:16:07(EDT) Stats: Disk: 0.00M at 0.00MB/s 2018/04/02-22:16:07(EDT) 2018/04/02-22:16:07(EDT) Status: PASS - please verify no corrected errors 2018/04/02-22:16:07(EDT) Performance counter stats for 'stressapptest -s 20 -M 256 -m 8 -C 8 -W': 25931.916079 task-clock (msec) # 0.894 CPUs utilized 53,688 context-switches # 0.002 M/sec 0 cpu-migrations # 0.000 K/sec 66,028 page-faults # 0.003 M/sec <not supported> cycles 0 stalled-cycles-frontend <not supported> instructions <not supported> branches <not supported> branch-misses 29.010419437 seconds time elapsed
其他參數請參考 #perf stat -help
- perf record
剛剛的 perf stat 並不會把執行統計數據儲存檔案,可以透過 perf record 儲存成 perf.data ,[root@localhost ~]# perf record stressapptest -s 20 -M 256 -m 8 -C 8 -W 2018/04/02-22:20:19(EDT) Log: Commandline - stressapptest -s 20 -M 256 -m 8 -C 8 -W 2018/04/02-22:20:19(EDT) Stats: SAT revision 1.0.7_autoconf, 64 bit binary 2018/04/02-22:20:19(EDT) Log: root @ localhost.localdomain on Mon Apr 2 21:28:30 EDT 2018 from open source release 2018/04/02-22:20:19(EDT) Log: 1 nodes, 1 cpus. 2018/04/02-22:20:19(EDT) Log: Prefer plain malloc memory allocation. 2018/04/02-22:20:19(EDT) Log: Using mmap() allocation at 0x7f6af220c000. 2018/04/02-22:20:19(EDT) Stats: Starting SAT, 256M, 20 seconds 2018/04/02-22:20:20(EDT) Log: region number 1 exceeds region count 1 2018/04/02-22:20:20(EDT) Log: Region mask: 0x1 2018/04/02-22:20:30(EDT) Log: Seconds remaining: 10 2018/04/02-22:20:40(EDT) Stats: Found 0 hardware incidents 2018/04/02-22:20:40(EDT) Stats: Completed: 98114.00M in 20.13s 4873.30MB/s, with 0 hardware incidents, 0 errors 2018/04/02-22:20:40(EDT) Stats: Memory Copy: 98114.00M at 4920.03MB/s 2018/04/02-22:20:40(EDT) Stats: File Copy: 0.00M at 0.00MB/s 2018/04/02-22:20:40(EDT) Stats: Net Copy: 0.00M at 0.00MB/s 2018/04/02-22:20:40(EDT) Stats: Data Check: 0.00M at 0.00MB/s 2018/04/02-22:20:40(EDT) Stats: Invert Data: 0.00M at 0.00MB/s 2018/04/02-22:20:40(EDT) Stats: Disk: 0.00M at 0.00MB/s 2018/04/02-22:20:40(EDT) 2018/04/02-22:20:40(EDT) Status: PASS - please verify no corrected errors 2018/04/02-22:20:40(EDT) [ perf record: Woken up 12 times to write data ] [ perf record: Captured and wrote 2.955 MB perf.data (76965 samples) ]
- perf report
如果之前有用過 perf record 產生了 perf.data , 就可以透過 perf report 來讀取執行統計數據. - perf top
執行統計數據的顯示會類似 top 指令的方式,後面可以直接 -p,只觀察單一程式所產生的數據.其他參數請參考 #perf top -help 剛剛的程式會花很多時間在執行,用 #perf stat 可能比較不適合,可以利用即時監控的方式來做,只需要知道該程式的 PID (Process ID) , 可以透過 # ps -aux | grep -i tool_name 來查詢.
[root@localhost ~]# perf top -p 2603 Samples: 445 of event 'cycles', Event count (approx.): 12225437 Overhead Shared Object Symbol 9.21% [kernel] [k] selinux_task_wait 8.61% [kernel] [k] avtab_search_node 5.70% [kernel] [k] copy_page_rep
- perf list
[root@localhost ~]# perf list -help Usage: perf list [<options>] [hw|sw|cache|tracepoint|pmu|sdt|event_glob] -d, --desc Print extra event descriptions. --no-desc to not print. -v, --long-desc Print longer event descriptions.
其他可用參數:
- annotate
Read perf.data (created by perf record) and display annotated code - archive
Create archive with object files with build-ids found in perf.data file - bench
General framework for benchmark suites - buildid-cache
Manage build-id cache. - buildid-list
List the buildids in a perf.data file - config
Get and set variables in a configuration file. - data
Data file related processing - diff
Read perf.data files and display the differential profile - evlist
List the event names in a perf.data file - ftrace
simple wrapper for kernel’s ftrace functionality - inject
Filter to augment the events stream with additional information - kallsyms
Searches running kernel for symbols - kmem
Tool to trace/measure kernel memory properties - kvm
Tool to trace/measure kvm guest os - lock
Analyze lock events - mem
Profile memory accesses - sched
Tool to trace/measure scheduler properties (latencies) - script
Read perf.data (created by perf record) and display trace output - test
Runs sanity tests. - timechart
Tool to visualize total system behavior during a workload - probe
Define new dynamic tracepoints - trace
strace inspired tool
相關訊息
我們會在 /var/log/messages 裡面看到關於 perf 的訊息,這是代表什麼意思?
kernel: perf: interrupt took too long (19841 > 19753), lowering kernel.perf_event_max_sample_rate to 10000
透過 perf 監視會消耗 CPU 的使用率,預設 perf_cpu_time_max_percent 為 25% 為上限.
[root@localhost ~]# cat /proc/sys/kernel/perf_cpu_time_max_percent 25
還有就是採樣率,預設 perf_event_max_sample_rate 為 10000 (單位 ? 秒)
[root@localhost ~]# cat /proc/sys/kernel/perf_event_max_sample_rate 10000
超過這兩個限制就會出現以上訊息,核心也會自動調整 kernel.perf_event_max_sample_rate
沒有解決問題,試試搜尋本站其他內容