I finally figured it out and thought I would share in case someone else
stumbles upon this problem.

 

After doing a lot of research I found that I had to add the following
line to my /etc/modules:

bonding mode=1 miimon=100 downdelay=200 updelay=200

 

It seems to be working perfectly now.

 

Chris Stackpole

 

 

________________________________

From: Stackpole, Chris [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 30, 2008 1:01 PM
To: debian-user@lists.debian.org
Subject: Question regarding bonding of multiple eth's

 

I seem to be having a problem with bonding under Debian Lenny, but I am
not sure exactly what the problem is.

 

I have two servers and each server has two gigabit network cards. We
have two gigabit switches that we use so that we have failover should
one die. I matched both eth0's to switch0 and both eth1's to switch one.
I then bonded the eth's together on both servers. I posted how I did it
below just in case I screwed something up. Once I did the bonding,
everything looks to be OK. I can ping out and I can ping the hosts from
other systems. I pulled the network plug from one of the cards and
watched that the failover worked as it should. Then I plugged it back in
and removed the other. Everything worked as I thought it should; I am
not an expert at bonding but I have used the same method a few times now
without problem.

 

Well I went on about my business and soon complaints began to come in
that one server was much slower then the other. :-/

 

I began investigating and sure enough, one system is slower.
Transferring a 1GB file across the network, I easily maintain ~38-40M/s
on the first host and I usually top out around 15-18MB/s on the other.
Ifconfig shows that both cards are set to the proper speed
(txqueuelen:1000) but it isn't behaving like should be. Worse is when I
do a watch or htop or something else that updates I can notice the lag.
For example, I have ssh'd into the system and have htop running right
now; it is supposed to update every 2 seconds. It works like it should
for a short time but then every once in a while the screen freezes for
about 10 seconds, then everything updates all at once and continues its
2 second update interval.

 

I thought it was the network cards, so I disabled the bonding and tested
each of them. I get gigabit speeds individually. Rebonded the cards and
I am back to the slow speeds. I turned off the system to see if there
was physical damage or something (found nothing) and when I brought it
back up I saw this in the logs:

 

Oct 30 11:53:04 Hostname kernel: [   10.167568] bonding: bond0: Warning:
failed to get speed and duplex from eth0, assumed to be 100Mb/sec and
Full.

Oct 30 11:53:04 Hostname kernel: [   10.167568] bonding: bond0:
enslaving eth0 as an active interface with an up link.

Oct 30 11:53:04 Hostname kernel: [   10.264691] bonding: bond0: Warning:
failed to get speed and duplex from eth1, assumed to be 100Mb/sec and
Full.

Oct 30 11:53:04 Hostname kernel: [   10.264691] bonding: bond0:
enslaving eth1 as an active interface with an up link.

Oct 30 11:53:04 Hostname kernel: [   10.578052] NET: Registered protocol
family 10

Oct 30 11:53:04 Hostname kernel: [   10.579606] lo: Disabled Privacy
Extensions

Oct 30 11:53:05 Hostname kernel: [   12.884391] tg3: eth0: Link is up at
1000 Mbps, full duplex.

Oct 30 11:53:05 Hostname kernel: [   12.884391] tg3: eth0: Flow control
is off for TX and off for RX.

Oct 30 11:53:06 Hostname kernel: [   13.012292] tg3: eth1: Link is up at
1000 Mbps, full duplex.

Oct 30 11:53:06 Hostname kernel: [   13.012292] tg3: eth1: Flow control
is off for TX and off for RX.

 

I see the tg3 messages in the first server, but I don't see the bonding
warnings. My guess is that the bonding is somehow screwed up and stuck
on 100Mb/sec and doesn't update when the cards go to 1000Mb/sec. I tried
to find an answer via google but did not find anything that seemed
useful to me. I see others have had this problem, but I found no
solution that helped me.

 

 I don't know why one works and the other doesn't. They should be pretty
similar in setup and configuration as I didn't do anything drastically
different when I built them.

 

Any help would be appreciated.

 

Thanks!

Chris Stackpole

 

 

How I did the bonding:

# apt-get install ifenslave

# vi /etc/network/interfaces

auto lo

iface lo inet loopback

auto bond0

iface bond0 inet static

            address 10.3.45.3

            netmask 255.255.255.0

            network 10.3.45.0

            broadcast 10.3.45.255

            gateway 10.3.45.251

            dns-nameservers 10.1.1.5 10.1.1.6

            dns-search mydomain.com

            up /sbin/ifenslave bond0 eth0 eth1

            down /sbin/ifenslave -d bond0 eth0 eth1

 

Then I restarted (yeah I know I could have just reset the network but I
restarted).

When it was back up ifconfig shows bond0, eth0, eth1, and lo all
correctly.

Reply via email to