前面有使用過 Stressful Application Test (Stressapptest) https://benjr.tw/96740 這邊針對他的記憶體 NUMA 測試來做說明.
測試環境為 Ubuntu16.04 64bits
NUMA (Non-uniform memory access)
NUMA (Non-uniform memory access) 把 CPU 與記憶體區分成不同的結點 Node (不同的 CPU 各自擁有記憶體),彼此的 CPU 節點再透過 QPI (Intel QuickPath Interconnect) 這個介面做溝通.
關於 NUMA 請參考 https://benjr.tw/96788
與 NUMA 相關參數
- –local_numa :
choose memory regions associated with each CPU to be tested by that CPU ,使用該節點上的記憶體空間. - –remote_numa :
choose memory regions not associated with each CPU to be tested by that .使用另一個節點的記憶體空間.
測試環境為 Ubuntu 16.04 64bits
透過 numastat 可以看到我的系統有兩個 Node (Node0 與 Node1)
root@ubuntu:~# numastat node0 node1 numa_hit 684516 565302 numa_miss 0 0 numa_foreign 0 0 interleave_hit 4908 15701 local_node 682033 546987 other_node 2483 18315
上面的數值所代表
- numa_hit
Memory successfully allocated on this node as intended.
記憶體成功配置至此節點 - numa_miss
Memory allocated on this node despite the process preferring some different node. Each numa_miss has a numa_foreign on another node.
原先預定的節點的記憶體不足,而配置至此節點. numa_miss 與另一個節點的 numa_foreign 是相對應的. - numa_foreign
Memory intended for this node, but actually allocated on some different node. Each numa_foreign has a numa_miss on another node.
原先預定至此節點的記憶體但被配置至其他節點上. numa_foreign 與另一個節點的 numa_miss 是相對應的. - interleave_hit
Interleaved memory successfully allocated on this node as intended.
The number of interleave policy allocations that were intended for a specific node and succeeded there. - local_node
Memory allocated on this node while a process was running on it.
該節點上的程序成功配置到該節點的記憶體空間. - other_node
Memory allocated on this node while a process was running on some other node.
該節點上的程序,成功配置到另一個節點的記憶體空間.
測試 local_numa
root@ubuntu:~# stressapptest -s 30 -M 500 --local_numa Log: Commandline - stressapptest -s 30 -M 500 --local_numa Stats: SAT revision 1.0.6_autoconf, 64 bit binary Log: buildd @ kapok on Wed Jan 21 17:09:35 UTC 2015 from open source release Log: 1 nodes, 16 cpus. Log: Defaulting to 16 copy threads Log: Prefer plain malloc memory allocation. Log: Using memaligned allocation at 0x7fc80ed5d000. Stats: Starting SAT, 500M, 30 seconds Log: Region mask: 0x1 Log: Seconds remaining: 20 Log: Seconds remaining: 10 Stats: Found 0 hardware incidents Stats: Completed: 409174.00M in 30.00s 13637.74MB/s, with 0 hardware incidents, 0 errors Stats: Memory Copy: 409174.00M at 13638.94MB/s Stats: File Copy: 0.00M at 0.00MB/s Stats: Net Copy: 0.00M at 0.00MB/s Stats: Data Check: 0.00M at 0.00MB/s Stats: Invert Data: 0.00M at 0.00MB/s Stats: Disk: 0.00M at 0.00MB/s Status: PASS - please verify no corrected errors
local_node 明顯增加 node0 – 682033(測試前) 701210(測試後) , node1 – 546987(測試前) 571896(測試後) , other_node 沒有增加.
root@ubuntu:~# numastat node0 node1 numa_hit 703693 590211 numa_miss 0 0 numa_foreign 0 0 interleave_hit 4908 15701 local_node 701210 571896 other_node 2483 18315
測試 remote_numa
root@ubuntu:~# stressapptest -s 30 -M 500 --remote_numa Log: Commandline - stressapptest -s 30 -M 500 --remote_numa Stats: SAT revision 1.0.6_autoconf, 64 bit binary Log: buildd @ kapok on Wed Jan 21 17:09:35 UTC 2015 from open source release Log: 1 nodes, 16 cpus. Log: Defaulting to 16 copy threads Log: Prefer plain malloc memory allocation. Log: Using memaligned allocation at 0x7fe0490a9000. Stats: Starting SAT, 500M, 30 seconds Log: Region mask: 0x1 Log: Seconds remaining: 20 Log: Seconds remaining: 10 Stats: Found 0 hardware incidents Stats: Completed: 419376.00M in 30.01s 13976.86MB/s, with 0 hardware incidents, 0 errors Stats: Memory Copy: 419376.00M at 13977.69MB/s Stats: File Copy: 0.00M at 0.00MB/s Stats: Net Copy: 0.00M at 0.00MB/s Stats: Data Check: 0.00M at 0.00MB/s Stats: Invert Data: 0.00M at 0.00MB/s Stats: Disk: 0.00M at 0.00MB/s Status: PASS - please verify no corrected errors
Remote NUMA 測試出來的值 Memory Copy: 419376.00M , 13977.69MB/s 與 Local NUMA – Memory Copy: 409174.00M , 13638.94MB/s 並無明顯的差別.
local_node 明顯增加 node0 – 701210 (測試前) 728102(測試後) , node1 – 571896(測試前) 598015(測試後) ,奇怪的是 other_node 沒有增加.
root@ubuntu:~# numastat node0 node1 numa_hit 730585 616330 numa_miss 0 0 numa_foreign 0 0 interleave_hit 4908 15701 local_node 728102 598015 other_node 2483 18315
透過 #numastat -m 來觀察整個 NUMA 記憶體使用 (MemTotal , MemFree , MemUsed) ,可以看到不管使用 local_numa 或是 remote_numa 來測試其記憶體分配皆是對等平均分配方式,不確認是程式問題還是系統限制,讓 stressapptest local_numa , remote_numa 不如預期.
NUMA 測試建議可以透過 #numacl 指令來限制,使用指定處理器搭配指定記憶體.
root@ubuntu:~# numactl --interleave=0,1 ./stressapptest -s 180 -M 32000
關於 numactl 指令說明請參考 https://benjr.tw/96788
One thought on “Linux command – Stressful Application Test (NUMA)”