Joe Landman wrote:

(responses embedded)
Jeff Johnson wrote:
Joe,

   I think you may be dealing with a PCIe fifo issue.

Hi Jeff:

  Possibly.  I had thought about that.  I was thinking more along the
lines of "it is a motherboard NIC, so we don't need no steenkeen high
performance things like 64 bit buffers ..."

The controller in question is a LOM component but not quite a "desktop grade" chip. It is a server class controller. IMHO, Intel and other silicon spinners haven't quite grasped the difference between "enterprise grade" and "hpc grade". I think it is highly likely that this chip can cut your storage cluster mustard but the driver and i/o options may not run well for your application without some Kentucky windage.
   I have seen issues with the Intel PCIe gigabit ethernet onboard parts
when compared to PCIe slot cards and PCIX cards like the ones you are
testing. Specifically the partitioning of the controller's buffers
between rcv and xmit operations (internal to the controller chip itself)
and the controller's relationship with the PCIe buffer on the
northbridge. PCIe, being serial, has different challenges when reaching
the top end of a device's performance capabilities. In this case you are
suffering some buffer throttling.

I played with some (OS/NIC) buffer settings, txqueuelen, and a few other
tunables.  Nothing seems to have impacted it.
The way the Intel controller and e1000 driver interact is that the e1000 driver sets up the rcv buffer at initialization time and the *remainder* is left for xmit. This is not something that can be adjusted using ethtool or a modload option. You have to get into the e1000 driver source, find the rcv buffer size definition and then change it to suit your evil needs. Recompile and enjoy. Here is where the Kentucky windage comes in as you may have to try a few values. Lather, rinse, repeat until you get it right.
   By default the buffers are partitioned for "one size fits most"
scenario. If you know your i/o profile you can use ethtool (or modify

Yeah ...

e1000 driver source) to repartition the controller's fifo to favor rcv
or xmit operations. This results in better performance in situations
where you know you will have heavier writes over reads or vice versa.

Of course, though without knowing your workload in advance you can't
really tune this.

Aside from that, I can't say I have seen many people tune their storage
clusters for workloads of one particular type.  You basically never know
what users will throw your way, and you really don't want one "corner
case" test being the important thing that drives down overall performance.
Unless you are building a generic use resource it is possible to figure out if the environment is favoring reads over writes, etc. You don't have to be exact. Now you are dealing with a 50/50 balance in terms of your ethernet and PCIe rcv/xmit buffer resources. Moving to 60/40 in favor of one direction could make the difference in terms of exhausting your buffer resources and experiencing the slow down.

I could bury a 8 node mpich run of Pallas on the 82573 (first gen Intel gigabite PCIe LOM) until I monkied with the buffer settings. Running Pallas, even at very small message sizes, the buffers were getting buried so bad that it wasn't a matter of slow down but rapidly incrementing dropped packets.

Any chance you are running jumbo frames? If so, turn it off and retest.

Also, use the e1000.sourceforge,net driver. If you are using a driver from Intel or a distro, ditch it.

One of the comments in your original message is key. PCIX works, PCIe is slower. With PCIe being serial you have the ethernet buffering and the PCIe buffering to contend with as well.

   *OR* it is because you are using a Supermicro motherboard..  =)

Owie ... that left a mark ...
Try deploying a 256 node cluster with a motherboard defect that they wouldn't acknowledge. That leaves a mark as well as hefty bar tabs.
I thought it was that I hadn't given the appropriate HPC deity their
burnt (processor) offering ...
The gods prefer FBDIMMs these days..


--
Best Regards,

Jeff Johnson
Vice President
Engineering/Technology
Western Scientific, Inc
[EMAIL PROTECTED]
http://www.wsm.com

5444 Napa Street - San Diego, CA 92110
Tel 800.443.6699  +001.619.220.6580
Fax +001.619.220.6590

"Braccae tuae aperiuntur"

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to