Dear folks,

I am trying to establish a clustermatic 5 setup on a 2.6.9 custom built
kernel backported to a stock Mandriva 2006 build (with all of the latest
patches applied as of Saturday)

No problem on the headnode kernel or the CM5 host utils booting.

However, the slaves *intermittently* do not properly copy the libs over
and I get 

vmadump: mmap failed: /lib/tls/libc.so.6

or

vmadump: mmap failed: /lib/tls/libc-2.3.5.so

Now, the first is a symlink to the other. 

Also, strace on a simple binary (e.g. mkdir, shows that it is indeed
trying to load *that* version of the C lib 1st.)

I've messed around with taking that out of the path and linking
libc.so.6 to various other libc*so*'s in /usr/lib or /lib, with the same
results. It will sometimes boot, sometimes not.

This looks like a random library ordering issue.

Or, perhaps a timing issue where something that is being called in the C
lib is causing vmadump to burp.

It's happening in the node_up stage tho' if it happens.

*Sometimes* the nodes will boot OK.

----

Note: I have a happily running CM5 setup on several other machines with
FC4 as the core OS and basically the same custom CM5 kernel on top -
it's something funky with the M2006 C libraries AFAICS. Threading
perhaps? Not sure. 

I have other reasons for going with M2006.

I didn't fancy backporting the basic bproc code to a 2.6.12* or 2.6.15
kernel, so I simply used (custom rebuilt the same as on the FC4
clusters) the 2.6.9 kernel from CM5.

Do let me know if you have any ideas.

Thanks!

Kind regards

Derek Jones.



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to