On Fri, 31 Jul 2009, Marian Marinov wrote: > Hello list, > do you know if this project is still alive ? Or replaced/renamed ? > > http://bproc.sourceforge.net/
That's actually a long-dead branch of BProc. Even when it was current, it had significant flaws, frequently changed interfaces, and never worked reliably with x86_64 clusters. It was started by a former employee who made a complete copy of our internal development servers to his home machine in the hours before he quit without notice, and then used the unpublished development tree and build system to compete against us. He used the justification that we released almost all of our production code under Open Sources licenses soon after releasing our commercial product. While this was a self-serving rationalization on his part, it was the major reason that we stopped doing open publication of our source code as soon as the commercial version was released. Alas, that means you won't be able to download the current BProc from a public web page. Nor many of the other innovative tools that we developed and used to publish. Now most of published contributions are part of other projects. Of course we continue use and improve BProc. Or more accurately a BProc kernel interface, as the code has been re-written several times to match newer kernels and add features. Over time we have made it more scalable, and have added features such as multiple and fail-over masters. Our customers still have access to the current source code, but over several web site re-writes even the old web pages for BProc (as well as our other innovative subsystems) have been moved and become unreachable. I really, really wish the situation were different. The people that have worked on BProc in the past nine year since have done a great job in keeping it working in the face of kernel re-writes, using new kernel facilities to simplify its code, and making it reliable with large-scale installations. All while keeping the interface the same so that the much larger infrastructure around it would continue to work. They have done the hard work, even while much more attention and money was spent on LANL's failed-and-abandoned attempt to build clusters around the stolen source code. To end on a positive and technical note, while BProc was a cornerstone in our efficient, single-system-image cluster system, it's not the only way to do things. You can get many of its benefits by without re-implementing it. BProc is based around directed process migration -- a more efficient technique than the common transparent process migration. You can do many cool things with process migration, but with experience we found that the costly parts weren't really the valuable ones. What you really want is the guarantee that running a program *over there* returns the expected results -- the same results as running it *here*. That means more than knowing the command line. You want the same executable, linked with exactly the same library versions in the same order, with the same environment and parameters. You can get that consistency without implementing transparent migration. And if you are willing to give up single-process-space monitoring and control, without even doing migration and thus being dependent on kernel features. You just need to send the right information when you start a remote job. That means finding the current executable on the host system, looking at the link information (essentially running 'ldd' but occasionally doing a partial link) to find the initial libraries, and making sure that those exact versions are installed or cached on the remote system. When you start up the process on the remote system, using the copied environment and command line, you get most of the consistency that BProc offers. People often give "BProc" the credit for light-weight, quick-booting nodes. In reality BProc has little to do with that -- it's role is only process creation, monitoring and control. The real innovation was the ability to dynamically cache, and update when needed, just the elements needed to run a process. (You also need services such as BeoNSS and access to a reference master... the devil is in the details.) That lets you start with almost nothing and incrementally build an environment to support the programs that are to be run. As you can extrapolate from above, "a cornerstone" doesn't mean "the only way to do it". There is much more I could write about benefits, trade-offs, and implementation details. Is there a specific area that you wanted to know about? -- Donald Becker bec...@scyld.com Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf