Hi folks

Trying to trace something annoying down, and see if we are running into something that is known.

OFED 1.5 on a 2.6.30.10 kernel. Running a file system atop IPoIB (many reasons, none I care to get into here at the moment). Under light load, the file system gradually grabs memory. Possibly a leak, not entirely sure. Could be the OFED stack underneath. Backing file system is xfs. That is has been (on this hardware in other situations) rock solid stable. Here, xfs, OFED/IPoIB all toss their cookies (and fail allocations) under moderate to heavy load.

Working with the file system vendor on this. I am not sure we have the answer nailed, so I wanted to see who out there is running a big ( >512 nodes) cluster, doing large data transfers (preferably over IPoIB), for data storage, and running a late model OFED. If you fall into this category, please let me know, as I'd like to ask a few questions offline about any observed OFED/IPoIB failure modes. I am not convinced it is OFED/IPoIB, but I'd like to see what other people have run into ... if anything.

  Thanks!

Joe

--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
       http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to