744 瀏覽數

Linux command – nvidia-smi

一般 GPU 可以透過 TechPowerUp GPU-Z https://www.techpowerup.com/download/techpowerup-gpu-z/ 來查看資料,如果是 Nvidia 可以透過 nvidia-smi 指令.

測試版本為 CUDA Toolkit 8.0 ,在 Linux 下安裝請參考 http://benjr.tw/98666

要怎麼確認你的 Nvidia GPU 運作正常,可以先透過 lspci 檢視裝置是否正確顯示.

root@benben:~# lspci | grep -i nvidia
86:00.0 3D controller: NVIDIA Corporation Device 15f8 (rev a1)

接著透過 Nvidia 所提供的 nvidia-smi(NVIDIA System Management Interface)工具檢查驅動程式與 GPU 的資訊.

root@benben:~# nvidia-smi 
Wed Nov  8 09:44:55 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   37C    P0    33W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

常用參數

  • -L, –list-gpus
    列出系統裡所有的 NVDIA GPUs.
  • -q, –query
    把系統的 GPU unit 所有資訊顯示出來.
  • -d, –display
    把系統的 GPU 部分資訊 (MEMORY , UTILIZATION , ECC , TEMPERATURE, POWER , CLOCK, COMPUTE , PIDS , PERFORMANCE , SUPPORTED_CLOCKS, PAGE_RETIREMENT , ACCOUNTING, ENCODER) 顯示出來.
  • –query-gpu
    如果需要詳細的資訊可以使用 –query-gpu ,至於要顯示哪一些詳細的資訊,可以先用參數 –help-query-gpu 查詢.

-d CLOCK

root@benben:~# nvidia-smi -q –d CLOCK
GPU 0000:63:00.0
Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    Applications Clocks
        Graphics                    : 875 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    SM Clock Samples
        Duration                    : 3730.56 sec
        Number of Samples           : 8
        Max                         : 875 MHz
        Min                         : 324 MHz
        Avg                         : 873 MHz
    Memory Clock Samples
        Duration                    : 344.77 sec
        Number of Samples           : 12
        Max                         : 715 MHz
        Min                         : 715 MHz
        Avg                         : 715 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A

其他與 Clock 相關參數.

  • -ac , –applications-clocks= Specifies – clocks as a pair (e.g. 2000,800) that defines GPU’s speed in MHz while running applications on a GPU.
  • –q –d SUPPORTED_CLOCKS – Set one of supported clocks
  • –auto-boost-default=ENABLED -i 0 – Enable boosting GPU clocks (K80 and later)
  • nvidia-smi –rac – Reset clocks back to base

-d POWER

root@benben:~# nvidia-smi -q –d POWER

與 power 相關參數使用.

  • nvidia-smi –pl N – Set power cap (maximum wattage the GPU will use)
  • nvidia-smi -pm 1 – Enable persistence mode
  • nvidia-smi stats -i -d -pwrDraw – Command that provides continuous monitoring of detail stats such as power

–query-gpu

root@benben:~# nvidia-smi --format=csv --query-gpu=utilization.gpu,power.draw,temperature.gpu 
utilization.gpu [%], power.draw [W], temperature.gpu
0 %, 33.17 W, 37

要長時間觀察可以加入 -l SEC (每 # 秒顯示一次)

root@benben:~# nvidia-smi -l 5 --format=csv --query-gpu=utilization.gpu,power.draw,temperature.gpu 

-l SEC
每 # 秒顯示一次.

發表迴響