Joe Landman wrote:
> Hi folks
>
>   Trying to trace something annoying down, and see if we are running
> into something that is known.
>
>   OFED 1.5 on a 2.6.30.10 kernel.  Running a file system atop IPoIB
> (many reasons, none I care to get into here at the moment).  Under
> light load, the file system gradually grabs memory.  Possibly a leak,
> not entirely sure.  Could be the OFED stack underneath.  Backing file
> system is xfs.  That is has been (on this hardware in other
> situations) rock solid stable.  Here, xfs, OFED/IPoIB all toss their
> cookies (and fail allocations) under moderate to heavy load.
>
>   Working with the file system vendor on this.  I am not sure we have
> the answer nailed, so I wanted to see who out there is running a big (
> >512 nodes) cluster, doing large data transfers (preferably over
> IPoIB), for data storage, and running a late model OFED.  If you fall
> into this category, please let me know, as I'd like to ask a few
> questions offline about any observed OFED/IPoIB failure modes.  I am
> not convinced it is OFED/IPoIB, but I'd like to see what other people
> have run into ... if anything.
>
>   Thanks!

We're running at OFED 1.4 for our GPFS cluster, with RDMA used for data
and IPoIB used for metadata and backups. We're looking at an upgrade to
1.5 so if you do find anything out I'd be very interested in knowing.

-- 
-- Skylar Thompson (sky...@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to