LACP team interface creation - some success and some problems

Brief description of the problem

OK - further to my first post about error when creating LACP team interfacce through the webui - i have found the following

using nmcli in opensuse to create a team interface does not work because NetworkManager does not have the plugin to manage team interfaces. RedHat has an install candidate in it repos (NetworkManager-team) but there is no such candidate in the opensuse repos

that being said - i decided to try to install team interface using the teamd daemon

Detailed step by step instructions to reproduce the problem

so here are the steps i took

i’m going from a clean install of rockstor here

Install libteam-tools and ethtool

sudo zypper install libteam-tools && sudo zypper install ethtools

then need to create a directory for some files

sudo mkdir /etc/teamd

create a config file for the team0 interface - this is a JSON file that tells teamd how to create the interface, in this case creating an LACP interface - the file is /etc/teamd/team0.conf - use whatever editor you are comfortable with

{
“device”: “team0”,
“runner”: {“name”: “lacp”},
“link_watch”: {“name”: “ethtool”},
“ports”: {
“eth0”: {},
“eth1”: {}
}
}

that takes care of how teamd will bring the interface up
then need to create a startup script that runs at boot - the teamd daemon will not start and attach the ethernet ports if they are already up so need to run the script to shut them down then initialise the teamd daemon to create the interface.

then need to assign an ip address to the team0 interface and bring it up this is what this script does - create it in /etc/teamd/team0.sh

#!/bin/bash
ip link set down eth0
ip link set down eth1

teamd -g -f /etc/teamd/team0.conf -d

ip addr add 192.168.1.2/24 dev team0
ip link set dev team0 up

so the script shuts down the two ethernet ports then initialises teamd interface from the config file and makes it a daemon. then it assigns an ip address from the current network range and brings the team0 interface up.

make the script executable

chmod +w+x team0.sh

and copy the script into /usr/bin/

cp /etc/teamd/team0.sh /usr/bin/team0.sh

next - need to make this a service so it starts up every time we boot the system

in directory /etc/systemd/system/ create a file teamd.service and make it like this

[Unit]
Description=teamd service
Requires=network-online.target
After=network-online.target
Before=rockstor-pre.target

[Service]
Type=forking
ExecStart=/usr/bin/team0.sh

[Install]
WantedBy=multi-user.target

ok - then reboot and check the interfaces

teamdctl team0 state

and you should see
setup:
runner: lacp
ports:
eth0
link watches:
link summary: up
instance[link_watch_0]:
name: ethtool
link: up
down count: 0
runner:
aggregator ID: 3, Selected
selected: yes
state: current
eth1
link watches:
link summary: up
instance[link_watch_0]:
name: ethtool
link: up
down count: 0
runner:
aggregator ID: 3, Selected
selected: yes
state: current
runner:
active: yes

the myip command will probably return the first available ip address from your router above xxx.xxx.xxx.1

so this all goes swimmingly excepot that rockstor will then not start as it is waiting for DEPENDS - i assume that the rockstor-pre is looking for a network and is not finding one - any ideas?

i got the team LACP stuff working - i can ssh into the box at the team0 ip address so i know that’s working.

what is rockstor looking for during initialisation in the network space?

1 Like

@Greg_Simpson Thanks for the detailed documentation you provided above!

The NetworkManager dependency was introduced via this github issue:

and subsequently addressed with this PR:

I believe, the thinking was that the network in a NAS is a critical component that should be up, since the UI depends on it as well as other connectivity.

You could at least test by removing that dependency in the service temporarily and see whether that works for you. I myself have experienced some intermittent issues with the NetworkManager running into trouble due to it taking to long bringing up a network adapter, so since the bonding process takes a while (too long for Network Manager) that might be your option …
I believe there probably is a need for a longer-term solution in this space, though as there has not been much, if any development in the bonding space since moving to Rockstor on OpenSUSE.

I assume, you also had a peek at this thread?

May be @Dahita has some additional insights in their pursuits …

1 Like

May be take a look at this one related to issues of the bonded interface coming up (yes, it’s in the fedora discussion board, but the github issues mentioned, along with the workaround seem to provide some improvement in different distis, though the issue is still open, too):

1 Like

thanks @Hooverdan - great lines of investigation - i will jump on those straight away

some of the redhat doco was helpful

2 Likes

Hi @Greg_Simpson,

Thanks for looking into that and sharing your success so far!

By curiosity, what does systemctl status -l NetworkManager report exactly?

hold the press - i’m just experimenting with the systemd stuff to try and get my service to start after the network is up and before rockstor-pre starts - will update the above steps accordingly - if it works

so i have updated the teamd.service to include some dependencies

[Unit]
Description=teamd service
Requires=network-online.target
After=network-online.target
Before=rockstor-pre.target

[Service]
Type=forking
ExecStart=/usr/bin/team0.sh

[Install]
WantedBy=multi-user.target

team network ends up starting and i can see the team inerface both in ip link show and externally through the ip address.

however - Rockstor still fails to start wioth the foillwoing messages during startup

[FAILED] Failed to start Network Manager Wait Online
[DEPEND] Dependency failed for Build Rockstor
[DEPEND] Dependency failed for Tasks required prior to startinng Rockstor
[DEPEND] Dependency failed for Rockstor startup script
[DEPEND] Dependency failed for Rockstor bootstrapping tasks

and then finally NetworkManager does start a little later on

anyone have any ideas about where to look next (in the rockstor code so i can track down whats happening?

ok so changed tack and thought - ok - let rockstor start and get al;l comfortable before i made my network changes so changed the line in /etc/systemd/system/teamd.service

Before=rockstor-pre.service

to

After=rockstor-bootstrap.service

but didn’t change the outcome

2 Likes

ok further question on this - if i let rockstor come up - then implement the team interface and then delete the “Wired connection x” - is there then a way to tell rockstor to use the new interface through a script?

if i let rockstor come up and then execute the following

nmcli connection delete “Wired connection 1”
nmcli connection delete “Wired connection 2”

then create /etc/teamd/team0.conf

{
    "device": "team0",
    "runner": {
        "name": "lacp",
        "active": true,
        "fast_rate": true,
        "tx_hash": ["layer2", "layer3", "layer4"]
    },
    "link_watch": {"name": "ethtool"},
    "ports": {
        "eth0": {},
        "eth1": {}
    }
}
and then execute

ip link set down eth0

ip link set down eth1

teamd -g -d -f /etc/teamd/team0.conf

ip addr add 192.168.1.2/24 dev team0
ip link set up team0
ip link set mtu 9000 dev team0

then the team0 interface comes up just fine and i can access the Rockstor management GUI on 192.168.1.2 - all good

but the network management screen in rockstor shows up like this - it shows the team0 link as unmanaged but still seems to use it

do i even need to attempt to make the team0 interface “managed”?

ok so i can report that the team0 interface is up and working
if i run this script after rockstor is up then the team0 interface comes up nice and smooth

#!/bin/bash

echo “teamd: setup starting…”
echo “teamd: stopping NetworkManager connections…”
nmcli connection delete “Wired connection 1”
nmcli connection delete “Wired connection 2”

echo “teamd: bringing down interfaces”
ip link set down eth0
ip link set down eth1
sleep 10

echo “teamd: starting teamd daemon”
teamd -g -d -f /etc/teamd/team0.conf
sleep 10

echo “teamd: assigning IP address”
ip address add 192.168.1.2/24 dev team0
sleep 10

echo “teamd: bringing up team0”
ip link set up dev team0
sleep 10

echo “teamd: set MTU”
ip link set mtu 9000 dev team0

echo “team0 setup completed.”

this script needs the following config file /etc/teamd/team0.conf

{
    "device": "team0",
    "runner": {
        "name": "lacp",
        "active": true,
        "fast_rate": true,
        "tx_hash": ["layer2", "layer3", "layer4"]
    },
    "link_watch": {"name": "ethtool"},
    "ports": {
        "eth0": {},
        "eth1": {}
    }
}
this brings up the interface which i can then ssh into and the management gui runs just fine

the problem - when i run the script from a service - it does bugger all - seems to complete ok but no interface

service looks like this

[Unit]
Description=Setup team0 using teamd
After=rockstor-bootstrap.service
Requires=rockstor-bootstrap.service

[Service]
Type=simple
ExecStart=/usr/local/bin/teamd/setup_teamd.sh
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.targe

what am i missing here? it must be something simple

I assume the missing t on target is just a miss in the copypasta?

also, the script will prepare some stuff start teamd and then exits, wouldn’t that then qualify as a type=oneshot as opposed to simple (or exec)? My thinking might be flawed though.

Also, maybe also with

Requires=network-online.target instead of the rockstor-bootstrap.service

Howdy! Welcome! Liking your input!

By the way, maybe your setup needs a special part to improve cp-you’s and enhance LAN Flux capacitor performance!

This group has case badges available either FREE or on ebay for dirt cheap that would surely increase system power!

PM “Me” if you want a couple!

:sunglasses:

ok - so after leaving this to percolate on the back burner for a while - i decided to try to track down why the opensuse implementation of NetworkManager doesn’t play nice with team interfaces and found a post that shows the teamd daemon running and the runner setup pretty much like mine but not working.

the suggested fix (which worked) was as follows

“NetworkManger service runs with restricted capabilities and lacks CAP_CHOWN. As workaround you can create drop-in that adds CAP_CHOWN to CapabilityBoundingSet”

going to try this and report back

update - this method sucked bum as well

once i get the teamd interface up - ssh works and i can log into the gui - but doing anything like installing a rockon fails miserably - it’s looking for a network connection thru networkmanager that doesn’t exit because it doesn’t know about it - i’ve got tthe interface up but not the connection

seems the only way to solve this is to get networkmanager-team available for opensuse or i will try compiling rockstor source on another distro that has it natively and see if that works.

no use altering the rockstor network handling - it’s all based around nmcli so using the teamd stuff won’t work - there should on;y be one standardised method of handling any interface in any one piece of software

3 Likes

anyone got any pointers to start me in right direction for NetworkManager-team?

groan - if i really have to drop back into the cesspool of C again - it might as well be for a good cause

1 Like

On the CAP_CHOWN, are you referring to this and the corresponding bug that was reported?

https://forums.opensuse.org/t/network-teaming-problem/140684/4