手邊有兩張 ConnectX®-3 EN Single/Dual-Port 10 and 40 Gigabit Ethernet Adapters w/ PCI Express 3.0
詳細資料 http://www.mellanox.com/page/products_dyn?product_family=127
圖片出自於 http://www.mellanox.com/uploads/product_families/cat_71/gfx_00975.jpg
在 Ubuntu 14.04 看的到卻抓不到裝置.,所以要先自行編譯模組 mlx4_en ,請參考 https://benjr.tw/58956
不過這個 mlx4_en 出現的會是一般 Ethernet 裝置,需要安裝 Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) 才會使用 ib0 , ib1 ( infiniband device) 等裝置名稱.
關於 IPOIB – IP over InfiniBand 請自行參考.
雖然官方有說先前安裝的 MLNX_EN 會和 OFED 有相依性問題.(我想他的意思應該是如果要單存使用 MLNX_EN 的時候,須要先移除 OFED),在安裝 mlnx_en 的時候會幫我們移除 all OFED packages 套件.
MLNX_EN driver cannot coexist with OFED software on the same machine. Hence when installing MLNX_EN all OFED packages should be removed (done by the mlnx_en install script)
詳細內容請參考 MLNX_EN 使用手冊 http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_Ethernet_Single_and_Dual_QSFP+_Port_Adapter_Card_User_Manual.pdf
安裝與設定 MLNX_OFED 請參考使用手冊 http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf
所以之前我已經安裝過 MLNX_EN 並不會影響到 OFED 的安裝.
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
新版本需要自行上網參考.我採用的是 MLNX_OFED_LINUX v2.2-1.0.1 版本
root@benjr:~# wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-2.2-1.0.1/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64.tgz root@benjr:~# tar MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64.tgz root@benjr:~# cd MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64
我先移除了 mlnx_en (sbin/mlnx_en_uninstall.sh)然後再 直接安裝 OFED
root@benjr:~/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64# ./mlnxofedinstall .......... This program will install the MLNX_OFED_LINUX package on your machine. Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed. Do you want to continue?[y/N]:y .......... Error: One or more packages depends on MLNX_OFED. Those packages should be removed before uninstalling MLNX_OFED: mlnx-en-utils mlnx-en-dkms fio libopensm5 libibumad3 libibmad5 libibnetdisc5 To force uninstallation use '--force' flag.
不過還是顯是錯誤,系統建議使用 –force 參數來安裝,透過 –force 就沒有問題了.
root@benjr:~/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64# ./mlnxofedinstall --force Log: /tmp/ofed.build.log Logs dir: /tmp/OFED.29507.logs Below is the list of OFED packages that you have chosen (some may have been added by the installer due to package dependencies): ofed-scripts mlnx-ofed-kernel-utils mlnx-ofed-kernel-dkms iser-dkms srp-dkms libibverbs1 ibverbs-utils libibverbs-dev libibverbs1-dbg libmlx4-1 libmlx4-dev libmlx4-1-dbg libmlx5-1 libmlx5-dev libmlx5-1-dbg libibumad libibumad-static libibumad-devel ibacm ibacm-dev librdmacm1 librdmacm-utils librdmacm-dev libibmad mstflint libibmad-static libibmad-devel libopensm opensm opensm-doc libopensm-devel infiniband-diags infiniband-diags-compat mft kernel-mft-dkms libibcm1 libibcm-dev perftest ibutils2 libibdm1 ibutils cc-mgr ar-mgr dump-pr ibsim ibsim-doc knem-dkms knem mxm fca openmpi mpitests ummunotify ummunotify-dkms rds-tools libdapl2 dapl2-utils libdapl-dev srptools This program will install the MLNX_OFED_LINUX package on your machine. Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed. Checking SW Requirements... Removing old packages... Installing new packages Installing ofed-scripts-2.2... Installing mlnx-ofed-kernel-utils-2.2... Installing mlnx-ofed-kernel-dkms-2.2... Installing iser-dkms-1.2... Installing srp-dkms-1.3.2... Installing libibverbs1-1.1.7mlnx1... Installing ibverbs-utils-1.1.7mlnx1... Installing libibverbs-dev-1.1.7mlnx1... Installing libibverbs1-dbg-1.1.7mlnx1... Installing libmlx4-1-1.0.5mlnx1... Installing libmlx4-dev-1.0.5mlnx1... Installing libmlx4-1-dbg-1.0.5mlnx1... Installing libmlx5-1-1.0.1mlnx1... Installing libmlx5-dev-1.0.1mlnx1... Installing libmlx5-1-dbg-1.0.1mlnx1... Installing libibumad-1.3.8.MLNX20130522.da65ddf... Installing libibumad-static-1.3.8.MLNX20130522.da65ddf... Installing libibumad-devel-1.3.8.MLNX20130522.da65ddf... Installing ibacm-1.0.8mlnx4... Installing ibacm-dev-1.0.8mlnx4... Installing librdmacm1-1.0.17.2mlnx3... Installing librdmacm-utils-1.0.17.2mlnx3... Installing librdmacm-dev-1.0.17.2mlnx3... Installing libibmad-1.3.9.MLNX20130522.1e79ec6... Installing mstflint-3.6.0... Installing libibmad-static-1.3.9.MLNX20130522.1e79ec6... Installing libibmad-devel-1.3.9.MLNX20130522.1e79ec6... Installing libopensm-... Installing opensm-... Installing opensm-doc-... Installing libopensm-devel-... Installing infiniband-diags-1.6.2.MLNX20131223.744ec44... Installing infiniband-diags-compat-1.6.2.MLNX20131223.744ec44... Installing mft-... Installing kernel-mft-dkms-... Installing libibcm1-1.0.5mlnx1... Installing libibcm-dev-1.0.5mlnx1... Installing perftest-2.2... Installing ibutils2-... Installing libibdm1-1.5.7.1... Installing ibutils-1.5.7.1... Installing cc-mgr-... Installing ar-mgr-... Installing dump-pr-... Installing ibsim-0.5... Installing ibsim-doc-0.5... Installing knem-dkms-1.1.1.90mlnx... Installing knem-1.1.1.90mlnx... Installing mxm-... Installing fca-... Installing openmpi-1.6.5... Installing mpitests-3.2.9... Installing ummunotify-1.0... Installing ummunotify-dkms-1.0... Installing rds-tools-2.0.7... Installing libdapl2-2.0.40mlnx1... Installing dapl2-utils-2.0.40mlnx1... Installing libdapl-dev-2.0.40mlnx1... Installing srptools-1.0.1... Attempting to perform Firmware update... Querying Mellanox devices firmware ... Device #1: ---------- Device Type: ConnectX3 Part Number: MCX354A-FCB_A2-A5 Description: ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6 PSID: MT_1090120019 PCI Device Name: 0000:03:00.0 Versions: Current Available FW 2.11.0500 2.31.5050 PXE N/A 3.4.0225 Status: Update required --------- Found 1 device(s) requiring firmware update... Device #1: Updating FW ... Done Restart needed for updates to take effect. Log File: /tmp/OFED.29507.logs/fw_update.log Please reboot your system for the changes to take effect. Configuring /etc/security/limits.conf. Device (03:00.0): 03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] Link Width: 8x PCI Link Speed: 5Gb/s Installation passed successfully
安裝完成,需要重新開機才會生效.
root@benjr:~# reboot
重開機之後你會發現多了 ib0 ,ib1 的裝置.
root@benjr:~# ifconfig ib0 Link encap:UNSPEC HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-0-00 UP BROADCAST MULTICAST MTU:4092 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ib1 Link encap:UNSPEC HWaddr A0-00-01-20-FE-80-00-00-00-00-00-00-00-00-0-00 inet addr:192.9.1.2 Bcast:192.9.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:a5:cd22/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:167358 errors:0 dropped:0 overruns:0 frame:0 TX packets:178096 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:6824386 (6.8 MB) TX bytes:3595776588 (3.5 GB)
也可以透過 ibstat 來檢視裝置.
root@benjr:~# ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.31.5050 Hardware version: 1 Node GUID: 0x0002c90300a5cd20 System image GUID: 0x0002c90300a5cd23 Port 1: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x0251486a Port GUID: 0x0002c90300a5cd21 Link layer: InfiniBand Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0002c90001a5cd22 Link layer: Ethernet
至於使用時須要透過 opensm 來設定 -g 後面接的就是 GUID 剛剛透過 ibstat 可以看到目前裝置的 GUID.
-B, –daemon
Run in daemon mode – OpenSM will run in the background.
-g, –guid
This option specifies the local port GUID value with which OpenSM should bind.
root@benjr:~# opensm -g 0002c90300a5c2e2 --daemon ------------------------------------------------- OpenSM 4.1.5.MLNX20140424.25abcb5 Command Line Arguments: Guid <0x2c90300a5c2e2> Daemon mode Log File: /var/log/opensm.log -------------------------------------------------
不過效能卻測起來怪怪的.
root@benjr:~# qperf 192.9.1.1 tcp_bw tcp_lat conf tcp_bw: bw = 598 MB/sec tcp_lat: latency = 24.7 us conf: loc_node = benjr loc_cpu = 16 Cores: Intel Xeon E5520 @ 2.27GHz loc_os = Linux 3.13.0-24-generic loc_qperf = 0.4.9 rem_node = benjr rem_cpu = 16 Cores: Mixed CPUs rem_os = Linux 3.13.0-24-generic rem_qperf = 0.4.9
InfiniBand 有兩種資料傳輸模式 Connected 或是 Datagram (預設值)可以使用,所以我來試試看 Connected 會不會好一點.
opensmd 要先關閉才能重啟 openibd 的服務.
root@benjr:/etc/infiniband# /etc/init.d/opensmd stop Shutting down opensm: * done
還要修改 /etc/infiniband/openib.conf 將 SET_IPOIB_CM 設定為 yes (預設值為 no)
root@benjr: vi /etc/infiniband/openib.conf # Enable IPoIB Connected Mode SET_IPOIB_CM=yes
設定好之後就可以將 openibd 與 opensmd 服務啟動.
root@benjr:/etc/infiniband# /etc/init.d/openibd restart Unloading HCA driver: [ OK ] Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: No configuration found for ib0 No configuration found for ib1 Setting up service network . . . [ done ] root@benjr:/etc/infiniband# /etc/init.d/opensmd start Starting opensm: * done
檢查一下模式是否已經修改成為 Connected
root@benjr:/etc/infiniband# cat /sys/class/net/ib0/mode connected root@benjr:/etc/infiniband# cat /sys/class/net/ib1/mode connected
ib0 , ib1 MTU 的預設值為 65520
root@benjr:/etc/infiniband# ifconfig ...... ib0 Link encap:UNSPEC HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00 UP BROADCAST MULTICAST MTU:65520 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ib1 Link encap:UNSPEC HWaddr A0-00-01-20-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.9.1.2 Bcast:192.9.1.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:a5:cd22/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:80732 errors:0 dropped:0 overruns:0 frame:0 TX packets:55753 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:2982743851 (2.9 GB) TX bytes:2499953 (2.4 MB)
再來試試看效能
root@benjr:/etc/infiniband# qperf 192.9.1.2 tcp_bw tcp_lat conf tcp_bw: bw = 1.49 GB/sec tcp_lat: latency = 28.4 us conf: loc_node = benjr loc_cpu = 16 Cores: Mixed CPUs loc_os = Linux 3.13.0-24-generic loc_qperf = 0.4.9 rem_node = benjr rem_cpu = 16 Cores: Intel Xeon E5520 @ 2.27GHz rem_os = Linux 3.13.0-24-generic rem_qperf = 0.4.9
恩~~還是很差是為什麼呢!!!
2 thoughts on “ConnectX®-3 EN – IP over IB”