On Fri, 8 Feb 2008, [EMAIL PROTECTED] wrote:
I'm already getting lots of practice explaining how to get this stuff to work for 3 separate PVM users... :)
Just record the results as you do so for reworking into the HOWTO...;-)
> At some point at your convenience in the future when > you have all kinds of time to metaphorically sit down and REALLY work > over PVM... Ahhh... Lemme picture that moment... :-D
:-)
> I have about 800 specific suggestions for bringing it up to > current and modern and everything. Just a wee list. You know: > * Purge aimk for all time, die die die Ha ha ha... You don't like "aimk"...? :-) Yeah, PVM was originally pre-autoconf... Too bad, eh...? :)
Who, me? I love aimk. Well, I loved aimk. Back in oh, 1994 or thereabouts. I was at that time managing a unix network with creeping inhomogeneity and a mix of SySV and BSD related Unices. I took aimk, cut out its system-identifying heart, and incorporated it into the most complex set of .files for my shell that you ever saw, that would automagically look for things on any system I happened to have a copy of my home directory copied to when I logged in, determine the architecture, set all paths, and alias the hell out of everything so that Unix worked -- for me -- pretty much the same on AIX, NeXT, SGI, Sun, Ultrix... However, the world has changed. It has shrunk to pretty much Linux, FreeBSD, and Solaris as surviving Unices, Windoze, and MacOSX as a newbie Unix. All the Unices share a large fraction of their code base at this point. People have invented POSIX and (gasp) standards. And frankly, the end result of all of this is that most of the complexity in aimk is totally, totally obsolete and merely obstructs the ease of working on or building the application. One of SGE's worst features is that it is built on top of #!*@ aimk. PVM has an excuse (it preceeded the GBT). What is SGE's? aimk die. Standard Unix build (patched/ifdef'd as needed for WinXX, the only remaining real maverick) live.
> * Actually use the FSH so e.g. apropos pvm works. I'm assuming you don't mean FSH="Follicle Stimulating Hormone"; did you mean "SSH", or am I clueless...?
Sigh. Sorry, I'm the brainless one, transposed the H and S: http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard There are basically a couple of ways one can think about installing pvm in an FHS-compliant way. One is just like it is now, but in /opt, as that is what it is for. The other (better) solution is to install the binary, include, libraries and man pages native on their correct paths, which would be /usr/bin, /usr/include, /usr/lib, and /usr/share/man respectively IIRC. Documentation in /usr/share/doc/pvmwhatever. Any shared DATA (e.g. shared configuration data) into either /etc/pvm* if it is likely to vary per system or /usr/share/pvm if it is really crossmountable in an architecture independent way. Putting architecture specific libraries and binaries in /usr/share is just wrong. The Gnu Build Tools will manage all of this mostly for you if you make the effort to port the build process over to them (which I freely admit is likely to be a daunting chore and might require rethinking aspects of PVM itself and how to run it in a heterogeneous environment).
> * Document the hell out of everything Yes! :D
<smile>
> * Rewrite the network back end in a way that openly encourages high > end network vendors to contribute reusable non-IP native drivers Ha ha ha... Tried to cater to vendors many times. See all those funny arch subdirs in pvm3/src...? Yeah, been there, done that... (Though I agree that building on top of some generic "standardized" networking layer would be "nice" - there are so many to choose from... :)
Right, but note that MPI (at least some MPIs) support Real Cluster Networks. PVM supports them only using IP. That statement means that perhaps 1/4 to 1/3 of all possible parallel programmers reject PVM right out of the box. Another 1/3 are gone because embarrassingly parallel programs usually don't quite justify even the complexity of PVM (even though it is "perfect" for master-slave and EP projects in a lot of ways). The 1/3 that are left MIGHT use PVM, but if they think that their code EVER might run on a high-end cluster, or if they want to pad their resume with experience that might one day let them code something new on a high-end cluster, they have to think hard about the PVM vs MPI choice. The result is that I'm guessing that there are 5 MPI programmers for every 1 PVM programmer (and a lot of the latter, like me, learned PVM back in the early 90's when it was the only game in town). So one possible solution is to build PVM in such a way that it can share drivers with e.g. MPICH right out of the box, so to speak. But this, of course, also means altering other things, and agreeing on something of an ABI for each hardware network device you have available. On the good side, it would let you add a native ethernet channel that didn't use (even) UDP/IP, good only on completely non-routed flat switched networks.
> * Add a (possibly macro-driven) middle layer that makes PVM into MPI > as well -- one set of actual message-passing functions, two conformally > mapped call interfaces. You mean like "PVMPI"...? http://www.netlib.org/utk/papers/pvmpi/paper.html Or its offspring "MPI-Glue"...? http://www.scientific-computing.de/people/rabenseifner/projects/mpi_glue.html Or do you mean something completely different...? :)
No, like that, sure -- or the even older papers on the PVM website (unless these are they). But actually done and distributed in PVM, so one doesn't actually need PVM -- and -- LAM on a system, especially given that LAM is a lot like PVM except where it's not. Possibly not as good, I'd even say, but I'm not enough of an MPI user to be able to fairly judge.
> * Make Ctrl-C work so one can break out of the annoying timeout on add > hosts when things don't work. Yeah, bummer eh? :) Where did Bob Manchek go to anyway...? (He's the real culprit behind the majority of PVM code, btw, I merely "inherited" the maintenance job... :)
I know how that goes. And it is always a tradeoff, too. For just ME, it only wastes time in three or four minute chunkies, every now and then. It would take days, weeks, to recover the time required to fix it. But then you multiply "me" by an actual user base, and you come to realize that stuff like this costs a huge amount of distributed productivity and it's insane not to fix it. Except that (naturally) you aren't getting PAID to fix it so it's hours of YOUR time for minutes of benefit to save person-weeks of everybody ELSE'S time. Still, it is harmless to suggest it so that you MIGHT add it to that eternally optimistic opportunity cost labor queue against the day you finish a three month project you're being paid for in three days and need to pretend to be busy for 87 days...;-)
> * Make the console capable of cleaning up after a crash or > interruption. We talked about things we could do there, e.g. to clean up old leftover /tmp/pvmd.* files, etc, but it was always easier to just remove the files by hand...! ;)
Well, or not. It depends on how often you have to do it. Same computation as above -- for any single person yeah, the hassle of coding a robust solutions isn't worth it, but distribute that hassle over a user base of even a hundred people and suddenly it is a lot of aggregate time, especially for novice users and support. Remember, NO NOVICE USER is going to understand that the reason that PVM isn't working is because they somehow exited or killed or rebooted the master host/process and left behind tag zombie pvmd's (or worse, just the lockfiles) on all the nodes. I at one time wrote scripts I could run to clean up just because if there are more than a very few nodes, this can get really painful! If the nodes are widely distributed on an enterprise LAN (one thing PVM is very good for) doubly so. So again, you lose some fraction of the novices because they get frustrated and (correctly) view such behavior as "broken", and you at least annoy even the tried and true PVM programmer because nobody LIKES having to go kill a whole bunch of processes and remove all those lockfiles by hand, only to learn that they missed one. It isn't fun work, and it could be automated SO easily. If I were going to write the PVM console over myself from scratch, I would actually parallelize it to really facilitate stateful control. By that I mean I would separate out the interpreter loop as an absolutely trivial, impossible to block object, and fork off one or more slave tasks to do the actual things you are trying to do, OR I'd make all tasks rigorously interruptible with minimal loss of state information (or really, both). That way you can always get to the console, and if you can get to the console you can always execute a reset for whatever VM you've defined. Right now the only way to SIMULATE this behavior is by breaking out of a hang to the originating shell with Ctrl-Z and then performing all sorts of violence by hand without access even to the list of currently configured hosts. Ug-ly... I'd probably also leave systems in the VM (and conf display) even if they actually failed to add or added and then died, and just mark them down. Add a command to restart the downed ones (or even a way of polling and doing it automatically, along with suitable signals returnable to a master process. There are a zillion things one could do with such a console and signalling system. Gather statistics from real-time console calls (e.g. total number of messages, total number of bytes sent, per communications pair). Reset an entire cluster. Take over a running cluster and computation from a different master so that one can reboot the master safely. "Stop" the computation and migrate a node task ditto. If the console were really NICELY written, with most of the console functions actually tied up in a library, you'd make it (relatively) trivial to write gpvm, the ultimate gnome PVM console. The console is one of the nicest things about PVM, and it and the ancient but still lovely xpvm sort-of-GUI are one thing that keeps it alive as a teaching tool if nothing else. It is just fabulous to be able to watch a PVM computation develop as lots of little lines and icons and so on. But it could be a lot better, especially more robust and easier on novices. And with network support that could once again compete with MPI on the high end, I think it would experience a bit of a resurgence because it IS a good match for many kinds of tasks. rgb
Good suggestions, though. I'll add them to my "to do" list, along with any others that may come up...? :-)
Thanks, Man! Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;) > that kind of thing...;-) > rgb >> >> Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;) >> >> On Fri, Feb 08, 2008 at 05:35:31AM -0500, Robert G. Brown wrote: >> > On Thu, 7 Feb 2008, [EMAIL PROTECTED] wrote: >> >> >> I admit this may be an antiquated cynical mentality, and I >> >> further concur that PVMNETSOCKPORT is an obvious omission >> >> in the basic documentation/faq... >> >> > As they say, you can't RTFM if there ain't no FM... (or if the solution >> > exists but isn't there). >> >> > One is reminded of Dr. Strangelove, where the president (Peter Sellers) >> > has just learned that if the maverick B52 piloted by Slim Pickens gets >> > through, a doomsday device that is supposed to deter first nuclear >> > strikes will go off that will destroy the world. Unfortunately, the >> > Soviet Union didn't actually tell us that it was built. Dr. >> > Strangelove (Peter Sellers), after musing for a moment on the >> brilliance >> > of the concept, turns and says in an increasingly shrill voice: >> >> > But...the whole point of the Doomsday Machine...is lost...if you keep >> > it a SECRET. Why didn't you tell the world, eh? >> >> > Hmmm...;-) >> >> > rgb >> >> >> Thanks for your suggested text! (And the suggestion to >> >> enhance our coverage of rsh/ssh usage... :-) >> >> > Ya, well. Just now finished telling the umptieth would-be PVM user how >> > to go about it in an email message, augmenting further online docs such >> > as this one: >> >> > http://www.uow.edu.au/~suresh/web/cfamily/pvm.html >> >> > which is actually pretty decent, although I generally use the ssh >> > default dsa instead of rsa since on linux boxes it invariably works. >> > But better than forcing each user to employ google to snarf out >> > solutions to each problem they encounter, how much better to write a >> > really nice "Getting Started with PVM" or perhaps better still, a "PVM >> > HOWTO" on tldp.org. Publish there, and be sure to include a copy in >> > plain sight in /usr/share/pvm3/PVM_HOWTO. >> >> > Truthfully, good documentation, especially a walkthrough tutorial on >> > getting started (including sample code or links to sample code) that >> > takes a would-be user from "yum install pvm\*" to executing a Real >> > Parallel Program (however trivial) on a two node cluster would really >> > encourage the use of the library. Adding a bit more (such as a PVM >> > program development template) would be only icing on the cake, so to >> > speak. >> >> > If I had the time I'd write it myself. I've already got a project_pvm >> > program template up on the web, but it is sadly underdocumented through >> > the setup of PVM itself. >> >> > rgb >> >> >> >> >> All the Best, >> >> >> >> Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;) >> >> >> >> On Thu, Feb 07, 2008 at 04:42:21PM -0500, Robert G. Brown wrote: >> >> >> > It would really, really help if man pvm (or man pvmd or man >> >> pvm_intro) >> >> >> > documented a suitable firewall setting that will let PVM >> function >> >> >> > without just turning off the firewall altogether. There is no >> pvm >> >> >> setup >> >> >> > in /etc/services, for example, no pvm checkbox in the panels >> >> managed by >> >> >> > system-config-firewall in the latest Fedoras, no suggestion as >> to >> >> what >> >> >> > what protected port(s) or ranges one has to enable explicitly. >> In >> >> fact >> >> >> > for once even google is failing me -- I'm not finding a lot of >> >> >> > documentation or remarks by ANYONE on what ports pvm needs open >> >> >> (besides >> >> >> > ssh, which obviously is open and works). Usually as long as >> the >> >> >> > spawning of a network application itself works using an enabled >> >> >> > protected port (in this case, I would have expected ssh), the >> >> secondary >> >> >> > ports opened in unprotected space just work. Am I wrong in >> this? >> >> Do I >> >> >> > need to explicitly open more ports somewhere? >> >> >> >> >> >> Ah Yes. O.K., so I wish it was that simple, but alas PVM can use >> as >> >> >> many ports as you have machines in your cluster, or could use just >> 1. >> >> :-} >> >> >> >> >> >> Normally, the master pvmd creates/accepts connections over a small >> >> >> set of ports, possibly 1, but if PvmRouteDirect is enabled in a >> PVM >> >> >> application, then a myriad of direct-connection socket links are >> >> >> created, to link whichever machines the local PVM application >> tasks >> >> >> communicate with, on a demand-driven basis... >> >> >> >> >> >> So it's not generally possible to specify an explicit "range" of >> >> ports. >> >> >> However, it _is_ possible to set the "starting" port for this >> >> collection, >> >> >> using the aforementioned "$PVMNETSOCKPORT" environment variable. >> >> >> >> > OK, I'm giving this a try. Although I'd have to ask why pvmd >> doesn't >> >> do >> >> > the fork thing and clone a single open port on which it listens >> into a >> >> > dynamically allocated port that inherits from the open one. In >> >> > principle one only needs a single port to be open to connect to >> pretty >> >> > much any network based application, or so I had thought. At least, >> I >> >> do >> >> > that in xmlsysd and never have to punch more than one porthole >> through >> >> a >> >> > firewall. >> >> >> >> > Hmmm, it's working sort of -- looks like I need to open UPD ports, >> >> > right, not TCP? Having trouble on one host where I've punched the >> hole >> >> > but didn't >>locally<< set PVMNETSOCKPORT to match, so I'm trying >> again >> >> > with the local environment variable set. >> >> >> >> > Yup, that works. >> >> >> >> > So I'm guessing that pvmd reads it as it starts up wherever. Why >> does >> >> > it need to do this on a client? Can't the port(s) be passed from >> the >> >> > master when it starts up pvmd? >> >> >> >> >> This sets the first port that PVM will try to use, and all >> subsequent >> >> >> ports will usually be consecutive positive increments of that >> starting >> >> >> port (i.e. PVMNETSOCKPORT++... :-). >> >> >> >> >> >> So in most cases, you could probably plan on opening up a 100 or >> 1000 >> >> >> ports _somewhere_ in your firewall, depending on your needs, and >> then >> >> >> just tell PVM where to start, using $PVMNETSOCKPORT... >> >> >> >> >> >> I've always considered this solution a bit of a kludge, which is >> why >> >> >> it doesn't show up in the man pages, but if it works well enough >> to >> >> >> save people lots of hassle, then I can add some commentary on >> it...? >> >> >> >> > Kludge or not, how can you have an environment variable in an >> >> > application and not provide knowledge of it or instructions on its >> use >> >> > in the man page? Something like: >> >> >> >> > PVM requires open ports on target hosts to function. Many hosts >> are >> >> > installed with strong firewall rules by default. If you install >> pvm >> >> on >> >> > a slave and pvm appears to hang when you attempt to add it, >> eventually >> >> > timing out without success, consider adding the following to your >> >> local >> >> > personal or system environment (in, for example, ~/.bash_profile >> on >> >> all >> >> > hosts): >> >> >> >> > PVMNETSOCKPORT=10000 >> >> > export PVMNETSOCKPORT >> >> >> >> > Then configure your firewall(s) to open a range of udp ports >> starting >> >> > at this value, such as 10000-11024 (which need be any larger than >> the >> >> > largest number of machines you expect to have in your virtual >> >> machine). >> >> >> >> > However a better solution still is to have the daemon fork on a >> single >> >> > "permanent" port address > 1024, e.g. 10000, and get a negotiated >> >> > connection in the upper (non-protected) port space that way. >> >> >> >> >> It may depend on the firewall settings, but a nice "Connection >> >> >> Refused" would usually go a long way toward diagnosing things, >> >> >> whereas the more secure firewall alternative of simply >> >> >> "no response" would only result in a "timed out" PVM message... >> >> >> >> >> >> I'm open to suggestions on ways to identify or diagnose the >> >> problem...! >> >> >> >> > As I said, document EVERYTHING in the man page(s). It is what it >> is >> >> for. >> >> > Lots of users do, in fact, RTFM but get frustrated and give up when >> >> they >> >> > try something and it just doesn't work and they can't see why. >> >> >> >> > On the same line, a perennial problem with PVM is getting it to >> work >> >> > with rsh and ssh. In fact, half the problems I help people with >> who >> >> > randomly write me is getting it to work with one or the other. The >> >> > internal diagnostics are certainly very helpful, at this point, but >> it >> >> > would also be worth adding a new man page like pvm_rsh that does >> >> nothing >> >> > but walk users through the ritual of setting PVM_RSH and >> establishing >> >> > appropriate e.g. ssh keys. >> >> >> >> > Just a thought or two. >> >> >> >> > rgb >> >> >> >> >> >> >> >> Thanks Much for your interest and feedback! >> >> >> >> >> >> All the Best, >> >> >> >> >> >> Jeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeem ;) >> >> >> >> >> >> > I actually help a lot of people get started with PVM (they >> write me >> >> >> > offline because I have a template PVM tarball up on my personal >> >> >> website) >> >> >> > and the more I know, the better I can help them...;-) >> >> >> >> >> >> > rgb >> >> >> >> >> >> > -- >> >> >> > Robert G. Brown Phone(cell): >> >> 1-919-280-8443 >> >> >> > Duke University Physics Dept, Box 90305 >> >> >> > Durham, N.C. 27708-0305 >> >> >> > Web: http://www.phy.duke.edu/~rgb >> >> >> > Book of Lilith Website: >> >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> >> >> > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> >> >> >> >> >> >> >> >> (:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(:(: >> >> >> >> >> >> James Arthur "Jeeembo" Kohl, Ph.D. "Da Blooos Brathas?! >> They >> >> >> Oak Ridge National Laboratory still owe you money, >> >> Fool!" >> >> >> [EMAIL PROTECTED] >> >> >> http://www.csm.ornl.gov/~kohl/ Long Live Curtis >> Blues!!! >> >> >> >> >> >> >> >> >> :):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):):) >> >> >> >> >> >> >> > -- >> >> > Robert G. Brown Phone(cell): >> 1-919-280-8443 >> >> > Duke University Physics Dept, Box 90305 >> >> > Durham, N.C. 27708-0305 >> >> > Web: http://www.phy.duke.edu/~rgb >> >> > Book of Lilith Website: >> http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> >> > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> >> >> >> > -- >> > Robert G. Brown Phone(cell): 1-919-280-8443 >> > Duke University Physics Dept, Box 90305 >> > Durham, N.C. 27708-0305 >> > Web: http://www.phy.duke.edu/~rgb >> > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > -- > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
-- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf