Mellanox: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
Boiarino (talk | contribs)
Line 11: Line 11:
  cd MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64
  cd MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64
  ./mlnxofedinstall --add-kernel-support --skip-repo
  ./mlnxofedinstall --add-kernel-support --skip-repo
Follow instructions if any (for example 'You may need to update your initramfs before next boot. To do that, run 'dracut -f' ')
  /etc/init.d/openibd restart
  /etc/init.d/openibd restart



Revision as of 12:43, 7 February 2024

RHEL9.2

For ConnectX-5 card, default driver seems working. For ConnextX-6 do following:

yum install perl-sigtrap kernel-rpm-macros
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz
rm MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz
cd MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64
./mlnxofedinstall --add-kernel-support --skip-repo

Follow instructions if any (for example 'You may need to update your initramfs before next boot. To do that, run 'dracut -f' ')


/etc/init.d/openibd restart

RHEL7.9

For 100G Mellanox ConnectX-5:

Configuring to ethernet:

a. Start MFT.

  1. mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

Unloading MST PCI module (unused) - Success

b. Extract the vendor_part_id parameter. Note: ConnectX-5's ID is 4119.

  1. ibv_devinfo | grep vendor_part_id

vendor_part_id: 4119

vendor_part_id: 4119

c. Query the Host about ConnectX-4 adapters:

  1. mlxconfig -d /dev/mst/mt4119_pciconf0 q

Device #1:


Device type: ConnectX5

PCI device: /dev/mst/mt4119_pciconf0

Configurations: Current

...

LINK_TYPE_P1 1

LINK_TYPE_P2 1

....

Note that the LINK_TYPE_P1 and LINK_TYPE_P2 equal 1 (InfiniBand) by default.

d. Change the port type to Ethernet (LINK_TYPE = 2):

  1. mlxconfig -d /dev/mst/mt4119_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:


Device type: ConnectX5

PCI device: /dev/mst/mt4119_pciconf0

Configurations: Current New

LINK_TYPE_P1 1 2

LINK_TYPE_P2 1 2

Apply new Configuration? ? (y/n) [n] : y

Applying... Done!

-I- Please reboot machine to load new configurations.

e. Reboot the server.


RHEL7.8

cd /root
cp /usr/downloads/MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz
rm MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz
cd MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64
./mlnxofedinstall --add-kernel-support
/etc/init.d/openibd restart

If last command complains about some modules, unload those or just reboot machine.

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

Proceed to following section RHEL7.5.

RHEL7.5

Driver already included into OS, Mellanox card will be identified and configured by default. Still, have to install numactl to have nmtui command, will need it.

yum install numactl


Check (and modify if necessary) following files:

/etc/hostname: contains something like

clonfarm3.jlab.org

/etc/sysconfig/network:

NISDOMAIN=CCCHP
NETWORKDELAY=30
GATEWAY=129.57.167.99

/etc/resolv.conf:

search jlab.org acc.jlab.org
nameserver 129.57.90.255
nameserver 129.57.32.101

run nmtui, set port name like p3p1. Set 'IPv4 as Manual', IP address with mask (like 129.57.167.103/24) and gateway (like 129.57.167.99) only, ignore the rest; file /etc/sysconfig/network-scripts/ifcfg-p3p1 should looks like following:

HWADDR=F4:52:14:41:07:71
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=129.57.167.103
PREFIX=24
GATEWAY=129.57.167.99
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=p3p1
UUID=c9f979fb-c7ca-3237-bacc-4118389d03af
ONBOOT=yes
AUTOCONNECT_PRIORITY=-999

NOTE: make sure 'ONBOOT=yes'.

DO NOT DO IT In file 'ifcfg-p6p1' set 'ONBOOT=yes', in file 'ifcfg-em1' set 'ONBOOT=no'.

Reboot machine. When it started to go back, unplug copper wire and plug mellanox in.

After machine is booted, check /etc/resolv.conf, sometimes it is loosing contents, restore it if needed.

Useful command: netstat -rn shows routing.


NOTE: you may want to install Mellanox driver anyway to have a tool for example to switch from ib to eth:

cd /root
cp /usr/downloads/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz
rm MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz
cd MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64

Install driver using following command (may need option '--force' sometimes):

./mlnxofedinstall --add-kernel-support

Restart it (if complains about some modules, unload those or just reboot machine):

/etc/init.d/openibd restart

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

RHEL7.4

Run

yum install numactl
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz .
gunzip MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz
tar xvf MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tar
rm MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tar
cd MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64
./mlnxofedinstall --force

It will install everything and print following:

Please reboot your system for the changes to take effect.
To load the new driver, run:
/etc/init.d/openibd restart

Change following files:

/etc/hostname: contains something like

clondaq3.jlab.org

/etc/sysconfig/network:

NISDOMAIN=CCCHP
GATEWAY=129.57.167.99

/etc/resolv.conf:

search jlab.org acc.jlab.org
nameserver 129.57.90.255
nameserver 129.57.32.101

run nmtui, set p6p1. Set 'Manual', IP address with mask (like 129.57.167.226/24) and gateway (like 129.57.167.99) only, ignore the rest; file /etc/sysconfig/network-scripts/ifcfg-p6p1 should looks like following:

TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=p6p1
UUID=ec3cd5dc-8f9e-4e13-aa0f-a9c1ec1996c2
DEVICE=p6p1
ONBOOT=no
PROXY_METHOD=none
BROWSER_ONLY=no
IPADDR=129.57.167.226
PREFIX=24
GATEWAY=129.57.167.99

In file 'ifcfg-p6p1' set 'ONBOOT=yes', in file 'ifcfg-em1' set 'ONBOOT=no'.

Reboot machine. When it started to go back, unplug copper wire and plug mellanox in.

After machine is booted, check /etc/resolv.conf and restore it if needed.

Useful command: netstat -rn shows routing.

RHEL7

Run

yum install numactl
cd /root
cp /usr/downloads/mlnx-en-3.1-1.0.4.tgz .
gunzip mlnx-en-3.1-1.0.4.tgz
tar xvf mlnx-en-3.1-1.0.4.tar
rm mlnx-en-3.1-1.0.4.tar
cd mlnx-en-3.1-1.0.4
./install.sh


#Run command
# /sbin/connectx_port_config --show
#to display port settings, change it with
# /sbin/connectx_port_config 
#setting all ports to 'eth'.

Configure new 'eth' port as desired (left port seems p6p2 now ...). Use for example text interface to NetworkManager called 'nmtui'. Set 'Manual', IP address with mask (like 129.57.167.41/24) and gateway (like 129.57.167.99) only, ignore the rest. Reboot machine, unplug copper ethernet when starts booting, it should come back using fiber.

RHEL6

Run

yum install numactl
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64.iso .
mkdir tmp
mount -o loop MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64.iso tmp
cd tmp
./mlnxofedinstall

Run command (as instructed by installation process):

/etc/init.d/openibd restart

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

After it installed, run /usr/bin/system-config-network and add new device eth0 as manual network with appropriate settings (do NOT specify DNS servers !). Shutdown computer, unplug copper cable and start computer, 10G fiber link should become default one. OR, if fiber already plugged in, set BOOT=no for copper port currently in use, set BOOT=yes for fiber port, and reboot.