Hi, I had posted this to the gentoo-cluster list, was told to look in the archives here and after search for bonding and "balance-alb" in the beowulf archives found no really clear answers regarding balance-alb and multiple TCP connections. Furthermore bonding.txt is really confusing, using terms interchangeably, so I'll post my question here as well.
I've got the following system: Linux server 2.6.26 #5 SMP Fri Feb 6 12:18:54 CST 2009 ppc64 PPC970FX, altivec supported RackMac3,1 GNU/Linux Setup on a cisco gig switch with both gigabit NICs in the following setup: % cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008) Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth0 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth0 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:0d:93:9e:2b:ca Slave Interface: eth1 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:0d:93:9e:2b:cb Both eth0 and eth1 are running at 1000Mb/s full according to ethtool. I start netserver (from netperf) on the server and then run netperf on four other single gigabit connected clients on the same switch with: netperf -l 45 -t TCP_STREAM -H server but I get the following outputs: client 1: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 45.04 405.68 client 2: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 45.01 216.07 client 3: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 45.87 143.93 client 4: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 45.05 164.05 adding those up gives me: $ echo "405.68+216.07+143.93+164.05" | bc -l 929.73 Not more than 1gbps. bonding.txt from the kernel docs says: "balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces. It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface's worth of throughput." That much is understood, for a *single* TCP connection (or stream) I can't get more than a single NIC's bandwidth unless I have balance-rr enabled and LACP turned on in the switch. However: "balance-tlb: The balance-tlb mode balances outgoing traffic by peer. Since the balancing is done according to MAC address, in a "gatewayed" configuration (as described above), this mode will send all traffic across a single device. However, in a "local" network configuration, this mode balances multiple local network peers across devices in a vaguely intelligent manner (not a simple XOR as in balance-xor or 802.3ad mode), so that mathematically unlucky MAC addresses (i.e., ones that XOR to the same value) will not all "bunch up" on a single interface." What I have is a "local" network configuration with a single switch and multiple clients. Furthermore: "balance-alb: This mode is everything that balance-tlb is, and more. It has all of the features (and restrictions) of balance-tlb, and will also balance incoming traffic from local network peers (as described in the Bonding Module Options section, above)." So what that says to me is that with balance-alb the bonding driver should "balance multiple local network peers across devices [NICs] in a vaguely intelligent manner (not a simple XOR as in balance-xor or 802.3ad mode), so that mathematically unlucky MAC addresses (i.e., ones that XOR to the same value) will not all "bunch up" on a single interface." similar to balance-tlb but for "incoming traffic from local network peers" as well as outgoing. But this is not what I'm seeing as in the test above. Shouldn't I be able to get >> 1gbps with balance-alb mode from multiple TCP streams? It looks like all the connections are bunching up on one interface. If for whatever reason what I'm trying to do isn't possible unless I can turn on LACP in the switch then what's the point of balance-alb or balance-tlb? This seems like active-backup to me but with both NICs enabled and sharing the load but at half capacity or something. Thanks, Sabuj Pattanayek _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf