Mellanox: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
Boiarino (talk | contribs)
Line 22: Line 22:
  Search domains (jlab.org, acc.jlab.org)
  Search domains (jlab.org, acc.jlab.org)


'''NOTE???''' if more then one port is configured (for example 1G DHCP and 40G manual), and all network connections are off/unplugged, Network Manager may keep overwriting ''/etc/resolv.conf'' swinging between settings of those two ports. When 40G port is plugged in, ''/etc/resolv.conf'' will stay in according to that port settings.
'''NOTE???''' if more then one port is configured (for example 1G DHCP and 40G manual), and all network connections are off/unplugged, Network Manager may keep overwriting ''/etc/resolv.conf'' swinging between settings of those two ports. When 40G port is plugged in, ''/etc/resolv.conf'' will stay in according to that port settings. '''Probably need to specify DNS servers and search domains for ALL ports ?'''


== RHEL9.2 ==
== RHEL9.2 ==

Revision as of 11:08, 29 October 2024

RHEL9.4

For 'all' cards:

yum install mstflint
lspci | grep Mellanox
-> a1:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
-> a1:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
mstconfig -d a1:00.0 set LINK_TYPE_P1=ETH
mstconfig -d a1:00.0 set LINK_TYPE_P2=ETH

Reboot machine

NOTE: Network Manager will overwrite /etc/resolv.conf in according to port settings, so ALL settings have to be specified. Run nmtui and configure port, specifying ALL following settings:

IPv4 CONFIGURATION <Manual>
Addresses (like 129.57.167.109/24)
Gateway (like 129.57.167.99)
DNS Servers (129.57.90.255, 129.57.32.101)
Search domains (jlab.org, acc.jlab.org)

NOTE??? if more then one port is configured (for example 1G DHCP and 40G manual), and all network connections are off/unplugged, Network Manager may keep overwriting /etc/resolv.conf swinging between settings of those two ports. When 40G port is plugged in, /etc/resolv.conf will stay in according to that port settings. Probably need to specify DNS servers and search domains for ALL ports ?

RHEL9.2

For ConnectX-5 card, default driver seems working. For ConnextX-6 do following:

yum install perl-sigtrap kernel-rpm-macros
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz
rm MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64.tgz
cd MLNX_OFED_LINUX-23.10-1.1.9.0-rhel9.2-x86_64
./mlnxofedinstall --add-kernel-support --skip-repo

DO NOT DO IT Follow instructions if any (for example 'You may need to update your initramfs before next boot. To do that, run 'dracut -f' ')

/etc/init.d/openibd restart

To see installed cards, run

mst status

For every device shown, setting can be retrieved by 'mlxconfig -d <device> query', for example:

mlxconfig -d /dev/mst/mt4123_pciconf0 query

To convert device from InfiniBand to Ethernet, do follwoing:

mlxconfig -d /dev/mst/mt4123_pciconf0 set LINK_TYPE_P1=2

where _P1 means port number (for dual port card it will be _P1 and _P2), and '=2' means Ethernet. After all settings, reboot machine.


RHEL7.9

Depending on card version, use instructions below or above.


RHEL7.8

cd /root
cp /usr/downloads/MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz
rm MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64.tgz
cd MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.8-x86_64
./mlnxofedinstall --add-kernel-support
/etc/init.d/openibd restart

If last command complains about some modules, unload those or just reboot machine.

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

Proceed to following section RHEL7.5.

RHEL7.5

Driver already included into OS, Mellanox card will be identified and configured by default. Still, have to install numactl to have nmtui command, will need it.

yum install numactl


Check (and modify if necessary) following files:

/etc/hostname: contains something like

clonfarm3.jlab.org

/etc/sysconfig/network:

NISDOMAIN=CCCHP
NETWORKDELAY=30
GATEWAY=129.57.167.99

/etc/resolv.conf:

search jlab.org acc.jlab.org
nameserver 129.57.90.255
nameserver 129.57.32.101

run nmtui, set port name like p3p1. Set 'IPv4 as Manual', IP address with mask (like 129.57.167.103/24) and gateway (like 129.57.167.99) only, ignore the rest; file /etc/sysconfig/network-scripts/ifcfg-p3p1 should looks like following:

HWADDR=F4:52:14:41:07:71
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=129.57.167.103
PREFIX=24
GATEWAY=129.57.167.99
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=p3p1
UUID=c9f979fb-c7ca-3237-bacc-4118389d03af
ONBOOT=yes
AUTOCONNECT_PRIORITY=-999

NOTE: make sure 'ONBOOT=yes'.

DO NOT DO IT In file 'ifcfg-p6p1' set 'ONBOOT=yes', in file 'ifcfg-em1' set 'ONBOOT=no'.

Reboot machine. When it started to go back, unplug copper wire and plug mellanox in.

After machine is booted, check /etc/resolv.conf, sometimes it is loosing contents, restore it if needed.

Useful command: netstat -rn shows routing.


NOTE: you may want to install Mellanox driver anyway to have a tool for example to switch from ib to eth:

cd /root
cp /usr/downloads/MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz .
tar xvf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz
rm MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64.tgz
cd MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.5-x86_64

Install driver using following command (may need option '--force' sometimes):

./mlnxofedinstall --add-kernel-support

Restart it (if complains about some modules, unload those or just reboot machine):

/etc/init.d/openibd restart

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

RHEL7.4

Run

yum install numactl
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz .
gunzip MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tgz
tar xvf MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tar
rm MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64.tar
cd MLNX_OFED_LINUX-4.1-1.0.2.0-rhel7.4-x86_64
./mlnxofedinstall --force

It will install everything and print following:

Please reboot your system for the changes to take effect.
To load the new driver, run:
/etc/init.d/openibd restart

Change following files:

/etc/hostname: contains something like

clondaq3.jlab.org

/etc/sysconfig/network:

NISDOMAIN=CCCHP
GATEWAY=129.57.167.99

/etc/resolv.conf:

search jlab.org acc.jlab.org
nameserver 129.57.90.255
nameserver 129.57.32.101

run nmtui, set p6p1. Set 'Manual', IP address with mask (like 129.57.167.226/24) and gateway (like 129.57.167.99) only, ignore the rest; file /etc/sysconfig/network-scripts/ifcfg-p6p1 should looks like following:

TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=p6p1
UUID=ec3cd5dc-8f9e-4e13-aa0f-a9c1ec1996c2
DEVICE=p6p1
ONBOOT=no
PROXY_METHOD=none
BROWSER_ONLY=no
IPADDR=129.57.167.226
PREFIX=24
GATEWAY=129.57.167.99

In file 'ifcfg-p6p1' set 'ONBOOT=yes', in file 'ifcfg-em1' set 'ONBOOT=no'.

Reboot machine. When it started to go back, unplug copper wire and plug mellanox in.

After machine is booted, check /etc/resolv.conf and restore it if needed.

Useful command: netstat -rn shows routing.

RHEL7

Run

yum install numactl
cd /root
cp /usr/downloads/mlnx-en-3.1-1.0.4.tgz .
gunzip mlnx-en-3.1-1.0.4.tgz
tar xvf mlnx-en-3.1-1.0.4.tar
rm mlnx-en-3.1-1.0.4.tar
cd mlnx-en-3.1-1.0.4
./install.sh


#Run command
# /sbin/connectx_port_config --show
#to display port settings, change it with
# /sbin/connectx_port_config 
#setting all ports to 'eth'.

Configure new 'eth' port as desired (left port seems p6p2 now ...). Use for example text interface to NetworkManager called 'nmtui'. Set 'Manual', IP address with mask (like 129.57.167.41/24) and gateway (like 129.57.167.99) only, ignore the rest. Reboot machine, unplug copper ethernet when starts booting, it should come back using fiber.

RHEL6

Run

yum install numactl
cd /root
cp /usr/downloads/MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64.iso .
mkdir tmp
mount -o loop MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64.iso tmp
cd tmp
./mlnxofedinstall

Run command (as instructed by installation process):

/etc/init.d/openibd restart

Run command

/sbin/connectx_port_config --show

to display port settings, change it with

/sbin/connectx_port_config 

setting all ports to 'eth'.

After it installed, run /usr/bin/system-config-network and add new device eth0 as manual network with appropriate settings (do NOT specify DNS servers !). Shutdown computer, unplug copper cable and start computer, 10G fiber link should become default one. OR, if fiber already plugged in, set BOOT=no for copper port currently in use, set BOOT=yes for fiber port, and reboot.