3,660 瀏覽數

ConnectX®-3 EN – IP over IB

手邊有兩張 ConnectX®-3 EN Single/Dual-Port 10 and 40 Gigabit Ethernet Adapters w/ PCI Express 3.0
詳細資料 http://www.mellanox.com/page/products_dyn?product_family=127

圖片出自於 http://www.mellanox.com/uploads/product_families/cat_71/gfx_00975.jpg

在 Ubuntu 14.04 看的到卻抓不到裝置.,所以要先自行編譯模組 mlx4_en ,請參考 http://benjr.tw/58956

不過這個 mlx4_en 出現的會是一般 Ethernet 裝置,需要安裝 Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED) 才會使用 ib0 , ib1 ( infiniband device) 等裝置名稱.

關於 IPOIB – IP over InfiniBand 請自行參考.

雖然官方有說先前安裝的 MLNX_EN 會和 OFED 有相依性問題.(我想他的意思應該是如果要單存使用 MLNX_EN 的時候,須要先移除 OFED),在安裝 mlnx_en 的時候會幫我們移除 all OFED packages 套件.
MLNX_EN driver cannot coexist with OFED software on the same machine. Hence when installing MLNX_EN all OFED packages should be removed (done by the mlnx_en install script)

詳細內容請參考 MLNX_EN 使用手冊 http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_Ethernet_Single_and_Dual_QSFP+_Port_Adapter_Card_User_Manual.pdf

安裝與設定 MLNX_OFED 請參考使用手冊 http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.2-1.0.1.pdf

所以之前我已經安裝過 MLNX_EN 並不會影響到 OFED 的安裝.
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
新版本需要自行上網參考.我採用的是 MLNX_OFED_LINUX v2.2-1.0.1 版本

root@benjr:~# wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-2.2-1.0.1/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64.tgz
root@benjr:~# tar MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64.tgz
root@benjr:~# cd MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64

我先移除了 mlnx_en (sbin/mlnx_en_uninstall.sh)然後再 直接安裝 OFED

root@benjr:~/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64# ./mlnxofedinstall
..........
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed.
Do you want to continue?[y/N]:y
..........
Error: One or more packages depends on MLNX_OFED.
Those packages should be removed before uninstalling MLNX_OFED:

mlnx-en-utils mlnx-en-dkms fio libopensm5 libibumad3 libibmad5 libibnetdisc5

To force uninstallation use '--force' flag.

不過還是顯是錯誤,系統建議使用 –force 參數來安裝,透過 –force 就沒有問題了.

root@benjr:~/MLNX_OFED_LINUX-2.2-1.0.1-ubuntu14.04-x86_64# ./mlnxofedinstall --force
Log: /tmp/ofed.build.log
Logs dir: /tmp/OFED.29507.logs

Below is the list of OFED packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
srp-dkms
libibverbs1
ibverbs-utils
libibverbs-dev
libibverbs1-dbg
libmlx4-1
libmlx4-dev
libmlx4-1-dbg
libmlx5-1
libmlx5-dev
libmlx5-1-dbg
libibumad
libibumad-static
libibumad-devel
ibacm
ibacm-dev
librdmacm1
librdmacm-utils
librdmacm-dev
libibmad
mstflint
libibmad-static
libibmad-devel
libopensm
opensm
opensm-doc
libopensm-devel
infiniband-diags
infiniband-diags-compat
mft
kernel-mft-dkms
libibcm1
libibcm-dev
perftest
ibutils2
libibdm1
ibutils
cc-mgr
ar-mgr
dump-pr
ibsim
ibsim-doc
knem-dkms
knem
mxm
fca
openmpi
mpitests
ummunotify
ummunotify-dkms
rds-tools
libdapl2
dapl2-utils
libdapl-dev
srptools

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed.
Checking SW Requirements...

Removing old packages...

Installing new packages

Installing ofed-scripts-2.2...
Installing mlnx-ofed-kernel-utils-2.2...
Installing mlnx-ofed-kernel-dkms-2.2...
Installing iser-dkms-1.2...
Installing srp-dkms-1.3.2...
Installing libibverbs1-1.1.7mlnx1...
Installing ibverbs-utils-1.1.7mlnx1...
Installing libibverbs-dev-1.1.7mlnx1...
Installing libibverbs1-dbg-1.1.7mlnx1...
Installing libmlx4-1-1.0.5mlnx1...
Installing libmlx4-dev-1.0.5mlnx1...
Installing libmlx4-1-dbg-1.0.5mlnx1...
Installing libmlx5-1-1.0.1mlnx1...
Installing libmlx5-dev-1.0.1mlnx1...
Installing libmlx5-1-dbg-1.0.1mlnx1...
Installing libibumad-1.3.8.MLNX20130522.da65ddf...
Installing libibumad-static-1.3.8.MLNX20130522.da65ddf...
Installing libibumad-devel-1.3.8.MLNX20130522.da65ddf...
Installing ibacm-1.0.8mlnx4...
Installing ibacm-dev-1.0.8mlnx4...
Installing librdmacm1-1.0.17.2mlnx3...
Installing librdmacm-utils-1.0.17.2mlnx3...
Installing librdmacm-dev-1.0.17.2mlnx3...
Installing libibmad-1.3.9.MLNX20130522.1e79ec6...
Installing mstflint-3.6.0...
Installing libibmad-static-1.3.9.MLNX20130522.1e79ec6...
Installing libibmad-devel-1.3.9.MLNX20130522.1e79ec6...
Installing libopensm-...
Installing opensm-...
Installing opensm-doc-...
Installing libopensm-devel-...
Installing infiniband-diags-1.6.2.MLNX20131223.744ec44...
Installing infiniband-diags-compat-1.6.2.MLNX20131223.744ec44...
Installing mft-...
Installing kernel-mft-dkms-...
Installing libibcm1-1.0.5mlnx1...
Installing libibcm-dev-1.0.5mlnx1...
Installing perftest-2.2...
Installing ibutils2-...
Installing libibdm1-1.5.7.1...
Installing ibutils-1.5.7.1...
Installing cc-mgr-...
Installing ar-mgr-...
Installing dump-pr-...
Installing ibsim-0.5...
Installing ibsim-doc-0.5...
Installing knem-dkms-1.1.1.90mlnx...
Installing knem-1.1.1.90mlnx...
Installing mxm-...
Installing fca-...
Installing openmpi-1.6.5...
Installing mpitests-3.2.9...
Installing ummunotify-1.0...
Installing ummunotify-dkms-1.0...
Installing rds-tools-2.0.7...
Installing libdapl2-2.0.40mlnx1...
Installing dapl2-utils-2.0.40mlnx1...
Installing libdapl-dev-2.0.40mlnx1...
Installing srptools-1.0.1...
Attempting to perform Firmware update...
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX3
  Part Number:      MCX354A-FCB_A2-A5
  Description:      ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6
  PSID:             MT_1090120019
  PCI Device Name:  0000:03:00.0
  Versions:         Current        Available
     FW             2.11.0500      2.31.5050
     PXE            N/A            3.4.0225

  Status:           Update required

---------
Found 1 device(s) requiring firmware update...

Device #1: Updating FW ... Done

Restart needed for updates to take effect.
Log File: /tmp/OFED.29507.logs/fw_update.log
Please reboot your system for the changes to take effect.
Configuring /etc/security/limits.conf.
Device (03:00.0):
        03:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Link Width: 8x
        PCI Link Speed: 5Gb/s


Installation passed successfully

安裝完成,需要重新開機才會生效.

root@benjr:~# reboot

重開機之後你會發現多了 ib0 ,ib1 的裝置.

root@benjr:~# ifconfig
ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-0-00
          UP BROADCAST MULTICAST  MTU:4092  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ib1       Link encap:UNSPEC  HWaddr A0-00-01-20-FE-80-00-00-00-00-00-00-00-00-0-00
          inet addr:192.9.1.2  Bcast:192.9.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:a5:cd22/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:167358 errors:0 dropped:0 overruns:0 frame:0
          TX packets:178096 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:6824386 (6.8 MB)  TX bytes:3595776588 (3.5 GB)

也可以透過 ibstat 來檢視裝置.

root@benjr:~# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.31.5050
        Hardware version: 1
        Node GUID: 0x0002c90300a5cd20
        System image GUID: 0x0002c90300a5cd23
        Port 1:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x0251486a
                Port GUID: 0x0002c90300a5cd21
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00010000
                Port GUID: 0x0002c90001a5cd22
                Link layer: Ethernet

至於使用時須要透過 opensm 來設定 -g 後面接的就是 GUID 剛剛透過 ibstat 可以看到目前裝置的 GUID.
-B, –daemon
Run in daemon mode – OpenSM will run in the background.
-g, –guid
This option specifies the local port GUID value with which OpenSM should bind.

root@benjr:~# opensm -g 0002c90300a5c2e2 --daemon
-------------------------------------------------
OpenSM 4.1.5.MLNX20140424.25abcb5
Command Line Arguments:
 Guid <0x2c90300a5c2e2>
 Daemon mode
 Log File: /var/log/opensm.log
-------------------------------------------------

不過效能卻測起來怪怪的.

root@benjr:~# qperf 192.9.1.1 tcp_bw tcp_lat conf
tcp_bw:
    bw  =  598 MB/sec
tcp_lat:
    latency  =  24.7 us
conf:
    loc_node   =  benjr
    loc_cpu    =  16 Cores: Intel Xeon  E5520 @ 2.27GHz
    loc_os     =  Linux 3.13.0-24-generic
    loc_qperf  =  0.4.9
    rem_node   =  benjr
    rem_cpu    =  16 Cores: Mixed CPUs
    rem_os     =  Linux 3.13.0-24-generic
    rem_qperf  =  0.4.9

InfiniBand 有兩種資料傳輸模式 Connected 或是 Datagram (預設值)可以使用,所以我來試試看 Connected 會不會好一點.

opensmd 要先關閉才能重啟 openibd 的服務.

root@benjr:/etc/infiniband# /etc/init.d/opensmd stop
Shutting down opensm:  * done

還要修改 /etc/infiniband/openib.conf 將 SET_IPOIB_CM 設定為 yes (預設值為 no)

root@benjr: vi /etc/infiniband/openib.conf

# Enable IPoIB Connected Mode
SET_IPOIB_CM=yes

設定好之後就可以將 openibd 與 opensmd 服務啟動.

root@benjr:/etc/infiniband# /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]
Setting up InfiniBand network interfaces:
No configuration found for ib0
No configuration found for ib1
Setting up service network . . .                           [  done  ]
root@benjr:/etc/infiniband# /etc/init.d/opensmd start
Starting opensm:  * done

檢查一下模式是否已經修改成為 Connected

root@benjr:/etc/infiniband# cat /sys/class/net/ib0/mode
connected
root@benjr:/etc/infiniband# cat /sys/class/net/ib1/mode
connected

ib0 , ib1 MTU 的預設值為 65520

root@benjr:/etc/infiniband# ifconfig
......
ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00
          UP BROADCAST MULTICAST  MTU:65520  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ib1       Link encap:UNSPEC  HWaddr A0-00-01-20-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:192.9.1.2  Bcast:192.9.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:a5:cd22/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:80732 errors:0 dropped:0 overruns:0 frame:0
          TX packets:55753 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:2982743851 (2.9 GB)  TX bytes:2499953 (2.4 MB)

再來試試看效能

root@benjr:/etc/infiniband# qperf 192.9.1.2 tcp_bw tcp_lat conf
tcp_bw:
    bw  =  1.49 GB/sec
tcp_lat:
    latency  =  28.4 us
conf:
    loc_node   =  benjr
    loc_cpu    =  16 Cores: Mixed CPUs
    loc_os     =  Linux 3.13.0-24-generic
    loc_qperf  =  0.4.9
    rem_node   =  benjr
    rem_cpu    =  16 Cores: Intel Xeon  E5520 @ 2.27GHz
    rem_os     =  Linux 3.13.0-24-generic
    rem_qperf  =  0.4.9

恩~~還是很差是為什麼呢!!!

2 個網友的想法 “ConnectX®-3 EN – IP over IB

  1. 自動參照通知: IPOIB – IP over InfiniBand (IB) | Benjr.tw

  2. 自動參照通知: ConnectX®-3 EN 10 and 40 Gigabit Ethernet | Benjr.tw

發表迴響