On Monday 08 January 2007 04:49, Joe Landman wrote: > I found a neat ... feature ... of Linux while getting g03 running in SMP > on cluster nodes. Long story, but the folks I am doing this for don't > have/want to use Linda. They asked us to help them get g03 operational > in SMP parallel. This wasn't painful. Have it integrated into SGE and > our SICE interface now as well. > > Basic idea is that we are getting a kernel exception in the VFS layer > only when running with 2 or more CPUs on an SMP node. Shows up only on > SuSE 9.3 nodes. The other nodes are RHEL 3 based (2.4 kernel, but hey, > its really stable). > > I don't want to post a nasty-looking trap here. > > The problem occurs with both xfs and jfs. Haven't had the chance to try > ext3 yet, though if the issue is in the vfs layer, I can't see how > changing the underlying block device is going to alter the layers (VFS) > above it. > > The net effect of this is that it runs great on the 2.4 based machines, > but gets SIGKILLs when running on the 2.6 based SuSE 9.3 machines. > Looks like the app is tickling the OS bug. I can repeatably cause this > trap, though it seems to occur at "random" places, well, not really. > The way Gaussian runs, it has "links" which are binary modules which > execute a particular portion of the calculation (its pretty neat > really). Each link is read in from the disk. This VFS bug gets > triggered regardless of local or remote FS. > > Any Gaussian users out there see that? Does a kernel upgrade fix it? > Inquiring minds want to know ...
Don't know if it's threads related but... Sometimes setting LD_ASSUME_KERNEL to 2.4.1 in the environment solves this kind of problems. There are other possible values, you can have a look at: http://people.redhat.com/drepper/assumekernel.html Best regards, Rafael -- Dr. Rafael R. Pappalardo Dept. Physical Chemistry, Univ. de Sevilla (Spain) e-mail: [EMAIL PROTECTED] _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf