Linux command – perf

一支簡單的 copy 程式花了很多的執行時間,透過 top 來檢視,卻發現程式大部分的時間花在 wa : iowait (CPU 花在等待 I/O 的百分比),完全不知道為什麼.

測試環境為 CentOS7 虛擬機

程式到底在執行哪一些呼叫,在 Linux 環境下可以透過 perf – Performance analysis tools for Linux 來檢視,核心已包含 perf 功能,只需要安裝工具及可以開始統計.

[root@localhost ~]# yum -y install perf

下面是常用的幾個指令

  • perf stat
    在 perf stat 指令後面接著要觀察的指令 stressapptest ,待程式執行完畢會回報執行統計數據.

    [root@localhost ~]# perf stat stressapptest -s 20 -M 256 -m 8 -C 8 -W 
    2018/04/02-22:15:38(EDT) Log: Commandline - stressapptest -s 20 -M 256 -m 8 -C 8 -W
    2018/04/02-22:15:38(EDT) Stats: SAT revision 1.0.7_autoconf, 64 bit binary
    2018/04/02-22:15:38(EDT) Log: root @ localhost.localdomain on Mon Apr  2 21:28:30 EDT 2018 from open source release
    2018/04/02-22:15:38(EDT) Log: 1 nodes, 1 cpus.
    2018/04/02-22:15:38(EDT) Log: Prefer plain malloc memory allocation.
    2018/04/02-22:15:38(EDT) Log: Using mmap() allocation at 0x7f9a52534000.
    2018/04/02-22:15:38(EDT) Stats: Starting SAT, 256M, 20 seconds
    2018/04/02-22:15:41(EDT) Log: region number 1 exceeds region count 1
    2018/04/02-22:15:41(EDT) Log: Region mask: 0x1
    2018/04/02-22:15:51(EDT) Log: Seconds remaining: 10
    2018/04/02-22:16:07(EDT) Stats: Found 0 hardware incidents
    2018/04/02-22:16:07(EDT) Stats: Completed: 104172.00M in 26.21s 3974.44MB/s, with 0 hardware incidents, 0 errors
    2018/04/02-22:16:07(EDT) Stats: Memory Copy: 104172.00M at 5223.94MB/s
    2018/04/02-22:16:07(EDT) Stats: File Copy: 0.00M at 0.00MB/s
    2018/04/02-22:16:07(EDT) Stats: Net Copy: 0.00M at 0.00MB/s
    2018/04/02-22:16:07(EDT) Stats: Data Check: 0.00M at 0.00MB/s
    2018/04/02-22:16:07(EDT) Stats: Invert Data: 0.00M at 0.00MB/s
    2018/04/02-22:16:07(EDT) Stats: Disk: 0.00M at 0.00MB/s
    2018/04/02-22:16:07(EDT) 
    2018/04/02-22:16:07(EDT) Status: PASS - please verify no corrected errors
    2018/04/02-22:16:07(EDT) 
    
     Performance counter stats for 'stressapptest -s 20 -M 256 -m 8 -C 8 -W':
    
          25931.916079      task-clock (msec)         #    0.894 CPUs utilized          
                53,688      context-switches          #    0.002 M/sec                  
                     0      cpu-migrations            #    0.000 K/sec                  
                66,028      page-faults               #    0.003 M/sec                  
       <not supported>      cycles                                                      
                     0      stalled-cycles-frontend                                     
       <not supported>      instructions                                                
       <not supported>      branches                                                    
       <not supported>      branch-misses                                               
    
          29.010419437 seconds time elapsed
    

    其他參數請參考 #perf stat -help

  • perf record
    剛剛的 perf stat 並不會把執行統計數據儲存檔案,可以透過 perf record 儲存成 perf.data ,

    [root@localhost ~]# perf record stressapptest -s 20 -M 256 -m 8 -C 8 -W
    2018/04/02-22:20:19(EDT) Log: Commandline - stressapptest -s 20 -M 256 -m 8 -C 8 -W
    2018/04/02-22:20:19(EDT) Stats: SAT revision 1.0.7_autoconf, 64 bit binary
    2018/04/02-22:20:19(EDT) Log: root @ localhost.localdomain on Mon Apr  2 21:28:30 EDT 2018 from open source release
    2018/04/02-22:20:19(EDT) Log: 1 nodes, 1 cpus.
    2018/04/02-22:20:19(EDT) Log: Prefer plain malloc memory allocation.
    2018/04/02-22:20:19(EDT) Log: Using mmap() allocation at 0x7f6af220c000.
    2018/04/02-22:20:19(EDT) Stats: Starting SAT, 256M, 20 seconds
    2018/04/02-22:20:20(EDT) Log: region number 1 exceeds region count 1
    2018/04/02-22:20:20(EDT) Log: Region mask: 0x1
    2018/04/02-22:20:30(EDT) Log: Seconds remaining: 10
    2018/04/02-22:20:40(EDT) Stats: Found 0 hardware incidents
    2018/04/02-22:20:40(EDT) Stats: Completed: 98114.00M in 20.13s 4873.30MB/s, with 0 hardware incidents, 0 errors
    2018/04/02-22:20:40(EDT) Stats: Memory Copy: 98114.00M at 4920.03MB/s
    2018/04/02-22:20:40(EDT) Stats: File Copy: 0.00M at 0.00MB/s
    2018/04/02-22:20:40(EDT) Stats: Net Copy: 0.00M at 0.00MB/s
    2018/04/02-22:20:40(EDT) Stats: Data Check: 0.00M at 0.00MB/s
    2018/04/02-22:20:40(EDT) Stats: Invert Data: 0.00M at 0.00MB/s
    2018/04/02-22:20:40(EDT) Stats: Disk: 0.00M at 0.00MB/s
    2018/04/02-22:20:40(EDT) 
    2018/04/02-22:20:40(EDT) Status: PASS - please verify no corrected errors
    2018/04/02-22:20:40(EDT) 
    [ perf record: Woken up 12 times to write data ]
    [ perf record: Captured and wrote 2.955 MB perf.data (76965 samples) ]
    
  • perf report
    如果之前有用過 perf record 產生了 perf.data , 就可以透過 perf report 來讀取執行統計數據.
  • perf top
    執行統計數據的顯示會類似 top 指令的方式,後面可以直接 -p ,只觀察單一程式所產生的數據.其他參數請參考 #perf top -help

    剛剛的程式會花很多時間在執行,用 #perf stat 可能比較不適合,可以利用即時監控的方式來做,只需要知道該程式的 PID (Process ID) , 可以透過 # ps -aux | grep -i tool_name 來查詢.

    [root@localhost ~]# perf top -p 2603
    Samples: 445  of event 'cycles', Event count (approx.): 12225437
    Overhead  Shared Object     Symbol
       9.21%  [kernel]          [k] selinux_task_wait
       8.61%  [kernel]          [k] avtab_search_node
       5.70%  [kernel]          [k] copy_page_rep
    
  • perf list
    [root@localhost ~]# perf list -help
    
     Usage: perf list [<options>] [hw|sw|cache|tracepoint|pmu|sdt|event_glob]
    
        -d, --desc            Print extra event descriptions. --no-desc to not print.
        -v, --long-desc       Print longer event descriptions.
    

其他可用參數:

  • annotate
    Read perf.data (created by perf record) and display annotated code
  • archive
    Create archive with object files with build-ids found in perf.data file
  • bench
    General framework for benchmark suites
  • buildid-cache
    Manage build-id cache.
  • buildid-list
    List the buildids in a perf.data file
  • config
    Get and set variables in a configuration file.
  • data
    Data file related processing
  • diff
    Read perf.data files and display the differential profile
  • evlist
    List the event names in a perf.data file
  • ftrace
    simple wrapper for kernel’s ftrace functionality
  • inject
    Filter to augment the events stream with additional information
  • kallsyms
    Searches running kernel for symbols
  • kmem
    Tool to trace/measure kernel memory properties
  • kvm
    Tool to trace/measure kvm guest os
  • lock
    Analyze lock events
  • mem
    Profile memory accesses
  • sched
    Tool to trace/measure scheduler properties (latencies)
  • script
    Read perf.data (created by perf record) and display trace output
  • test
    Runs sanity tests.
  • timechart
    Tool to visualize total system behavior during a workload
  • probe
    Define new dynamic tracepoints
  • trace
    strace inspired tool

發表迴響

你的電子郵件位址並不會被公開。 必要欄位標記為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步瞭解 Akismet 如何處理網站訪客的留言資料