Re: [Beowulf] PVM on wireless...

Robert G. Brown Fri, 08 Feb 2008 12:28:05 -0800

On Fri, 8 Feb 2008, [EMAIL PROTECTED] wrote:

I'm already getting lots of practice explaining how to get this stuff
to work for 3 separate PVM users...  :)


Just record the results as you do so for reworking into the HOWTO...;-)

 > At some point at your convenience in the future when
 > you have all kinds of time to metaphorically sit down and REALLY work
 > over PVM...

Ahhh...  Lemme picture that moment...  :-D

:-)

 > I have about 800 specific suggestions for bringing it up to
 > current and modern and everything.  Just a wee list.  You know:

 >   * Purge aimk for all time, die die die

Ha ha ha...  You don't like "aimk"...?  :-)

Yeah, PVM was originally pre-autoconf...  Too bad, eh...?  :)


Who, me?  I love aimk.  Well, I loved aimk.  Back in oh, 1994 or
thereabouts.  I was at that time managing a unix network with creeping
inhomogeneity and a mix of SySV and BSD related Unices.  I took aimk,
cut out its system-identifying heart, and incorporated it into the most
complex set of .files for my shell that you ever saw, that would
automagically look for things on any system I happened to have a copy of
my home directory copied to when I logged in, determine the
architecture, set all paths, and alias the hell out of everything so
that Unix worked -- for me -- pretty much the same on AIX, NeXT, SGI,
Sun, Ultrix...

However, the world has changed.  It has shrunk to pretty much Linux,
FreeBSD, and Solaris as surviving Unices, Windoze, and MacOSX as a
newbie Unix.  All the Unices share a large fraction of their code base
at this point.  People have invented POSIX and (gasp) standards.  And
frankly, the end result of all of this is that most of the complexity in
aimk is totally, totally obsolete and merely obstructs the ease of
working on or building the application.

One of SGE's worst features is that it is built on top of #!*@ aimk.
PVM has an excuse (it preceeded the GBT).  What is SGE's?

aimk die.  Standard Unix build (patched/ifdef'd as needed for WinXX, the
only remaining real maverick) live.

 >   * Actually use the FSH so e.g. apropos pvm works.

I'm assuming you don't mean FSH="Follicle Stimulating Hormone";
did you mean "SSH", or am I clueless...?


Sigh.  Sorry, I'm the brainless one, transposed the H and S:

  http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

There are basically a couple of ways one can think about installing pvm
in an FHS-compliant way.  One is just like it is now, but in /opt, as
that is what it is for.  The other (better) solution is to install the
binary, include, libraries and man pages native on their correct paths,
which would be /usr/bin, /usr/include, /usr/lib, and /usr/share/man
respectively IIRC.  Documentation in /usr/share/doc/pvmwhatever.  Any
shared DATA (e.g. shared configuration data) into either /etc/pvm* if it
is likely to vary per system or /usr/share/pvm if it is really
crossmountable in an architecture independent way.  Putting architecture
specific libraries and binaries in /usr/share is just wrong.

The Gnu Build Tools will manage all of this mostly for you if you make
the effort to port the build process over to them (which I freely admit
is likely to be a daunting chore and might require rethinking aspects of
PVM itself and how to run it in a heterogeneous environment).

 >   * Document the hell out of everything

Yes!  :D


<smile>

 >   * Rewrite the network back end in a way that openly encourages high
 > end network vendors to contribute reusable non-IP native drivers

Ha ha ha...  Tried to cater to vendors many times.  See all those funny
arch subdirs in pvm3/src...?  Yeah, been there, done that...

(Though I agree that building on top of some generic "standardized"
networking layer would be "nice" - there are so many to choose from... :)


Right, but note that MPI (at least some MPIs) support Real Cluster
Networks.  PVM supports them only using IP.  That statement means that
perhaps 1/4 to 1/3 of all possible parallel programmers reject PVM right
out of the box.  Another 1/3 are gone because embarrassingly parallel
programs usually don't quite justify even the complexity of PVM (even
though it is "perfect" for master-slave and EP projects in a lot of
ways).  The 1/3 that are left MIGHT use PVM, but if they think that
their code EVER might run on a high-end cluster, or if they want to pad
their resume with experience that might one day let them code something
new on a high-end cluster, they have to think hard about the PVM vs MPI
choice.  The result is that I'm guessing that there are 5 MPI
programmers for every 1 PVM programmer (and a lot of the latter, like
me, learned PVM back in the early 90's when it was the only game in
town).

So one possible solution is to build PVM in such a way that it can share
drivers with e.g. MPICH right out of the box, so to speak.  But this, of
course, also means altering other things, and agreeing on something of
an ABI for each hardware network device you have available.

On the good side, it would let you add a native ethernet channel that
didn't use (even) UDP/IP, good only on completely non-routed flat
switched networks.

 >   * Add a (possibly macro-driven) middle layer that makes PVM into MPI
 > as well -- one set of actual message-passing functions, two conformally
 > mapped call interfaces.

You mean like "PVMPI"...?

 http://www.netlib.org/utk/papers/pvmpi/paper.html

Or its offspring "MPI-Glue"...?

 http://www.scientific-computing.de/people/rabenseifner/projects/mpi_glue.html

Or do you mean something completely different...?  :)


No, like that, sure -- or the even older papers on the PVM website
(unless these are they).  But actually done and distributed in PVM, so
one doesn't actually need PVM -- and -- LAM on a system, especially
given that LAM is a lot like PVM except where it's not.  Possibly not as
good, I'd even say, but I'm not enough of an MPI user to be able to
fairly judge.

 >   * Make Ctrl-C work so one can break out of the annoying timeout on add
 > hosts when things don't work.

Yeah, bummer eh?  :)  Where did Bob Manchek go to anyway...?

(He's the real culprit behind the majority of PVM code, btw,
I merely "inherited" the maintenance job... :)


I know how that goes.  And it is always a tradeoff, too.  For just ME,
it only wastes time in three or four minute chunkies, every now and
then.  It would take days, weeks, to recover the time required to fix
it.  But then you multiply "me" by an actual user base, and you come to
realize that stuff like this costs a huge amount of distributed
productivity and it's insane not to fix it.  Except that (naturally) you
aren't getting PAID to fix it so it's hours of YOUR time for minutes of
benefit to save person-weeks of everybody ELSE'S time.

Still, it is harmless to suggest it so that you MIGHT add it to that
eternally optimistic opportunity cost labor queue against the day you
finish a three month project you're being paid for in three days and
need to pretend to be busy for 87 days...;-)

 >   * Make the console capable of cleaning up after a crash or
 > interruption.

We talked about things we could do there, e.g. to clean up old
leftover /tmp/pvmd.* files, etc, but it was always easier to
just remove the files by hand...!  ;)


Well, or not.  It depends on how often you have to do it.  Same
computation as above -- for any single person yeah, the hassle of coding
a robust solutions isn't worth it, but distribute that hassle over a
user base of even a hundred people and suddenly it is a lot of aggregate
time, especially for novice users and support.

Remember, NO NOVICE USER is going to understand that the reason that PVM
isn't working is because they somehow exited or killed or rebooted the
master host/process and left behind tag zombie pvmd's (or worse, just
the lockfiles) on all the nodes.  I at one time wrote scripts I could
run to clean up just because if there are more than a very few nodes,
this can get really painful!  If the nodes are widely distributed on an
enterprise LAN (one thing PVM is very good for) doubly so.

So again, you lose some fraction of the novices because they get
frustrated and (correctly) view such behavior as "broken", and you at
least annoy even the tried and true PVM programmer because nobody LIKES
having to go kill a whole bunch of processes and remove all those
lockfiles by hand, only to learn that they missed one.  It isn't fun
work, and it could be automated SO easily.

If I were going to write the PVM console over myself from scratch, I
would actually parallelize it to really facilitate stateful control.  By
that I mean I would separate out the interpreter loop as an absolutely
trivial, impossible to block object, and fork off one or more slave
tasks to do the actual things you are trying to do, OR I'd make all
tasks rigorously interruptible with minimal loss of state information
(or really, both).  That way you can always get to the console, and if
you can get to the console you can always execute a reset for whatever
VM you've defined.  Right now the only way to SIMULATE this behavior is
by breaking out of a hang to the originating shell with Ctrl-Z and then
performing all sorts of violence by hand without access even to the list
of currently configured hosts.  Ug-ly...

I'd probably also leave systems in the VM (and conf display) even if
they actually failed to add or added and then died, and just mark them
down.  Add a command to restart the downed ones (or even a way of
polling and doing it automatically, along with suitable signals
returnable to a master process.

There are a zillion things one could do with such a console and
signalling system.  Gather statistics from real-time console calls (e.g.
total number of messages, total number of bytes sent, per communications
pair).  Reset an entire cluster.  Take over a running cluster and
computation from a different master so that one can reboot the master
safely.  "Stop" the computation and migrate a node task ditto.

If the console were really NICELY written, with most of the console
functions actually tied up in a library, you'd make it (relatively)
trivial to write gpvm, the ultimate gnome PVM console.

The console is one of the nicest things about PVM, and it and the
ancient but still lovely xpvm sort-of-GUI are one thing that keeps it
alive as a teaching tool if nothing else.  It is just fabulous to be
able to watch a PVM computation develop as lots of little lines and
icons and so on.  But it could be a lot better, especially more robust
and easier on novices.  And with network support that could once again
compete with MPI on the high end, I think it would experience a bit of a
resurgence because it IS a good match for many kinds of tasks.

   rgb


Good suggestions, though.  I'll add them to my "to do" list,
along with any others that may come up...?  :-)


Thanks, Man!

        Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)

 > that kind of thing...;-)

 >    rgb

 >>
 >>       Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
 >>
 >>  On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote:
 >>  > On Thu, 7 Feb 2008, [EMAIL PROTECTED] wrote:
 >>
 >>  >> I admit this may be an antiquated cynical mentality, and I
 >>  >> further concur that PVMNETSOCKPORT is an obvious omission
 >>  >> in the basic documentation/faq...
 >>
 >>  > As they say, you can't RTFM if there ain't no FM... (or if the solution
 >>  > exists but isn't there).
 >>
 >>  > One is reminded of Dr. Strangelove, where the president (Peter Sellers)
 >>  > has just learned that if the maverick B52 piloted by Slim Pickens gets
 >>  > through, a doomsday device that is supposed to deter first nuclear
 >>  > strikes will go off that will destroy the world.  Unfortunately, the
 >>  > Soviet Union didn't actually tell us that it was built.  Dr.
 >>  > Strangelove (Peter Sellers), after musing for a moment on the
 >> brilliance
 >>  > of the concept, turns and says in an increasingly shrill voice:
 >>
 >>  >   But...the whole point of the Doomsday Machine...is lost...if you keep
 >>  >   it a SECRET. Why didn't you tell the world, eh?
 >>
 >>  > Hmmm...;-)
 >>
 >>  >    rgb
 >>
 >>  >> Thanks for your suggested text!  (And the suggestion to
 >>  >> enhance our coverage of rsh/ssh usage... :-)
 >>
 >>  > Ya, well.  Just now finished telling the umptieth would-be PVM user how
 >>  > to go about it in an email message, augmenting further online docs such
 >>  > as this one:
 >>
 >>  >   http://www.uow.edu.au/~suresh/web/cfamily/pvm.html
 >>
 >>  > which is actually pretty decent, although I generally use the ssh
 >>  > default dsa instead of rsa since on linux boxes it invariably works.
 >>  > But better than forcing each user to employ google to snarf out
 >>  > solutions to each problem they encounter, how much better to write a
 >>  > really nice "Getting Started with PVM" or perhaps better still, a "PVM
 >>  > HOWTO" on tldp.org.  Publish there, and be sure to include a copy in
 >>  > plain sight in /usr/share/pvm3/PVM_HOWTO.
 >>
 >>  > Truthfully, good documentation, especially a walkthrough tutorial on
 >>  > getting started (including sample code or links to sample code) that
 >>  > takes a would-be user from "yum install pvm\*" to executing a Real
 >>  > Parallel Program (however trivial) on a two node cluster would really
 >>  > encourage the use of the library.  Adding a bit more (such as a PVM
 >>  > program development template) would be only icing on the cake, so to
 >>  > speak.
 >>
 >>  > If I had the time I'd write it myself.  I've already got a project_pvm
 >>  > program template up on the web, but it is sadly underdocumented through
 >>  > the setup of PVM itself.
 >>
 >>  >    rgb
 >>
 >>  >>
 >>  >> All the Best,
 >>  >>
 >>  >>     Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
 >>  >>
 >>  >>  On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote:
 >>  >>  >>  > It would really, really help if man pvm (or man pvmd or man
 >>  >> pvm_intro)
 >>  >>  >>  > documented a suitable firewall setting that will let PVM
 >> function
 >>  >>  >>  > without just turning off the firewall altogether.  There is no
 >> pvm
 >>  >>  >> setup
 >>  >>  >>  > in /etc/services, for example, no pvm checkbox in the panels
 >>  >> managed by
 >>  >>  >>  > system-config-firewall in the latest Fedoras, no suggestion as
 >> to
 >>  >> what
 >>  >>  >>  > what protected port(s) or ranges one has to enable explicitly.
 >> In
 >>  >> fact
 >>  >>  >>  > for once even google is failing me -- I'm not finding a lot of
 >>  >>  >>  > documentation or remarks by ANYONE on what ports pvm needs open
 >>  >>  >> (besides
 >>  >>  >>  > ssh, which obviously is open and works).  Usually as long as
 >> the
 >>  >>  >>  > spawning of a network application itself works using an enabled
 >>  >>  >>  > protected port (in this case, I would have expected ssh), the
 >>  >> secondary
 >>  >>  >>  > ports opened in unprotected space just work.  Am I wrong in
 >> this?
 >>  >> Do I
 >>  >>  >>  > need to explicitly open more ports somewhere?
 >>  >>  >>
 >>  >>  >> Ah Yes.  O.K., so I wish it was that simple, but alas PVM can use
 >> as
 >>  >>  >> many ports as you have machines in your cluster, or could use just
 >> 1.
 >>  >> :-}
 >>  >>  >>
 >>  >>  >> Normally, the master pvmd creates/accepts connections over a small
 >>  >>  >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a
 >> PVM
 >>  >>  >> application, then a myriad of direct-connection socket links are
 >>  >>  >> created, to link whichever machines the local PVM application
 >> tasks
 >>  >>  >> communicate with, on a demand-driven basis...
 >>  >>  >>
 >>  >>  >> So it's not generally possible to specify an explicit "range" of
 >>  >> ports.
 >>  >>  >> However, it _is_ possible to set the "starting" port for this
 >>  >> collection,
 >>  >>  >> using the aforementioned "$PVMNETSOCKPORT" environment variable.
 >>  >>
 >>  >>  > OK, I'm giving this a try.  Although I'd have to ask why pvmd
 >> doesn't
 >>  >> do
 >>  >>  > the fork thing and clone a single open port on which it listens
 >> into a
 >>  >>  > dynamically allocated port that inherits from the open one.  In
 >>  >>  > principle one only needs a single port to be open to connect to
 >> pretty
 >>  >>  > much any network based application, or so I had thought.  At least,
 >> I
 >>  >> do
 >>  >>  > that in xmlsysd and never have to punch more than one porthole
 >> through
 >>  >> a
 >>  >>  > firewall.
 >>  >>
 >>  >>  > Hmmm, it's working sort of -- looks like I need to open UPD ports,
 >>  >>  > right, not TCP?  Having trouble on one host where I've punched the
 >> hole
 >>  >>  > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying
 >> again
 >>  >>  > with the local environment variable set.
 >>  >>
 >>  >>  > Yup, that works.
 >>  >>
 >>  >>  > So I'm guessing that pvmd reads it as it starts up wherever.  Why
 >> does
 >>  >>  > it need to do this on a client?  Can't the port(s) be passed from
 >> the
 >>  >>  > master when it starts up pvmd?
 >>  >>
 >>  >>  >> This sets the first port that PVM will try to use, and all
 >> subsequent
 >>  >>  >> ports will usually be consecutive positive increments of that
 >> starting
 >>  >>  >> port (i.e. PVMNETSOCKPORT++... :-).
 >>  >>  >>
 >>  >>  >> So in most cases, you could probably plan on opening up a 100 or
 >> 1000
 >>  >>  >> ports _somewhere_ in your firewall, depending on your needs, and
 >> then
 >>  >>  >> just tell PVM where to start, using $PVMNETSOCKPORT...
 >>  >>  >>
 >>  >>  >> I've always considered this solution a bit of a kludge, which is
 >> why
 >>  >>  >> it doesn't show up in the man pages, but if it works well enough
 >> to
 >>  >>  >> save people lots of hassle, then I can add some commentary on
 >> it...?
 >>  >>
 >>  >>  > Kludge or not, how can you have an environment variable in an
 >>  >>  > application and not provide knowledge of it or instructions on its
 >> use
 >>  >>  > in the man page?  Something like:
 >>  >>
 >>  >>  >  PVM requires open ports on target hosts to function.  Many hosts
 >> are
 >>  >>  >  installed with strong firewall rules by default.  If you install
 >> pvm
 >>  >> on
 >>  >>  >  a slave and pvm appears to hang when you attempt to add it,
 >> eventually
 >>  >>  >  timing out without success, consider adding the following to your
 >>  >> local
 >>  >>  >  personal or system environment (in, for example, ~/.bash_profile
 >> on
 >>  >> all
 >>  >>  >  hosts):
 >>  >>
 >>  >>  >    PVMNETSOCKPORT=10000
 >>  >>  >    export PVMNETSOCKPORT
 >>  >>
 >>  >>  >  Then configure your firewall(s) to open a range of udp ports
 >> starting
 >>  >>  >  at this value, such as 10000-11024 (which need be any larger than
 >> the
 >>  >>  >  largest number of machines you expect to have in your virtual
 >>  >> machine).
 >>  >>
 >>  >>  > However a better solution still is to have the daemon fork on a
 >> single
 >>  >>  > "permanent" port address > 1024, e.g. 10000, and get a negotiated
 >>  >>  > connection in the upper (non-protected) port space that way.
 >>  >>
 >>  >>  >> It may depend on the firewall settings, but a nice "Connection
 >>  >>  >> Refused" would usually go a long way toward diagnosing things,
 >>  >>  >> whereas the more secure firewall alternative of simply
 >>  >>  >> "no response" would only result in a "timed out" PVM message...
 >>  >>  >>
 >>  >>  >> I'm open to suggestions on ways to identify or diagnose the
 >>  >> problem...!
 >>  >>
 >>  >>  > As I said, document EVERYTHING in the man page(s).  It is what it
 >> is
 >>  >> for.
 >>  >>  > Lots of users do, in fact, RTFM but get frustrated and give up when
 >>  >> they
 >>  >>  > try something and it just doesn't work and they can't see why.
 >>  >>
 >>  >>  > On the same line, a perennial problem with PVM is getting it to
 >> work
 >>  >>  > with rsh and ssh.  In fact, half the problems I help people with
 >> who
 >>  >>  > randomly write me is getting it to work with one or the other.  The
 >>  >>  > internal diagnostics are certainly very helpful, at this point, but
 >> it
 >>  >>  > would also be worth adding a new man page like pvm_rsh that does
 >>  >> nothing
 >>  >>  > but walk users through the ritual of setting PVM_RSH and
 >> establishing
 >>  >>  > appropriate e.g. ssh keys.
 >>  >>
 >>  >>  > Just a thought or two.
 >>  >>
 >>  >>  >    rgb
 >>  >>
 >>  >>  >>
 >>  >>  >> Thanks Much for your interest and feedback!
 >>  >>  >>
 >>  >>  >> All the Best,
 >>  >>  >>
 >>  >>  >>   Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;)
 >>  >>  >>
 >>  >>  >>  > I actually help a lot of people get started with PVM (they
 >> write me
 >>  >>  >>  > offline because I have a template PVM tarball up on my personal
 >>  >>  >> website)
 >>  >>  >>  > and the more I know, the better I can help them...;-)
 >>  >>  >>
 >>  >>  >>  >    rgb
 >>  >>  >>
 >>  >>  >>  > --
 >>  >>  >>  > Robert G. Brown                            Phone(cell):
 >>  >> 1-919-280-8443
 >>  >>  >>  > Duke University Physics Dept, Box 90305
 >>  >>  >>  > Durham, N.C. 27708-0305
 >>  >>  >>  > Web: http://www.phy.duke.edu/~rgb
 >>  >>  >>  > Book of Lilith Website:
 >>  >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
 >>  >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
 >>  >>  >>
 >>  >>  >>
 >>  >>
 >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:
 >>  >>  >>
 >>  >>  >>   James Arthur "Jeeembo" Kohl, Ph.D.     "Da Blooos Brathas?!
 >> They
 >>  >>  >>   Oak Ridge National Laboratory              still owe you money,
 >>  >> Fool!"
 >>  >>  >>   [EMAIL PROTECTED]
 >>  >>  >>   http://www.csm.ornl.gov/~kohl/          Long Live Curtis
 >> Blues!!!
 >>  >>  >>
 >>  >>  >>
 >>  >>
 >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):)
 >>  >>  >>
 >>  >>
 >>  >>  > --
 >>  >>  > Robert G. Brown                            Phone(cell):
 >> 1-919-280-8443
 >>  >>  > Duke University Physics Dept, Box 90305
 >>  >>  > Durham, N.C. 27708-0305
 >>  >>  > Web: http://www.phy.duke.edu/~rgb
 >>  >>  > Book of Lilith Website:
 >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
 >>  >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
 >>  >>
 >>
 >>  > --
 >>  > Robert G. Brown                            Phone(cell): 1-919-280-8443
 >>  > Duke University Physics Dept, Box 90305
 >>  > Durham, N.C. 27708-0305
 >>  > Web: http://www.phy.duke.edu/~rgb
 >>  > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
 >>  > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
 >>
 >> _______________________________________________
 >> Beowulf mailing list, Beowulf@beowulf.org
 >> To change your subscription (digest mode or unsubscribe) visit
 >> http://www.beowulf.org/mailman/listinfo/beowulf
 >>

 > --
 > Robert G. Brown                            Phone(cell): 1-919-280-8443
 > Duke University Physics Dept, Box 90305
 > Durham, N.C. 27708-0305
 > Web: http://www.phy.duke.edu/~rgb
 > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
 > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


--
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] PVM on wireless...

Reply via email to