[[[ Hmmmm, OK, I seem to have moderation-approved pretty much a repeat of a wide-spread posting. So I'll answer with the response I was planning a few days ago. ]]
On Tue, 4 Dec 2007, Ehsan Mousavi wrote: > C-Sharifi Cluster Engine: The Second Success Story on "Kernel-Level > Paradigm" for Distributed Computing Support > > Contrary to two school of thoughts in providing system software support for > (like MPI, Kerrighed and Mosix), Dr. Mohsen Sharifi hypothesized another > school of thought as his thesis in 1986 that believes all distributed > systems software requirements and supports can be and must be built at the > Kernel Level of existing operating systems; In 1986 I had been working for a few years on shared memory systems with a hefty proportion of custom-designed hardware. I learned from that experience. That's why I now work on distributed memory systems based on off-the-shelf commodity hardware. I also think that there are some important aspects of cluster infrastructure that (at present) can only be implemented by tweaking the kernel. But most of the features to make a cluster easy to use don't need special kernel support, and indeed can't be implemented inside the kernel at all. You might initially think "you can put any program inside the kernel, therefore you can do everything inside the kernel". But as a counter-example consider name services. Essentially all programs use the standard library interface to name services, which in turn uses the Name Service Switch. You can add a bunch of really powerful feature by using a cluster-specific name service. And this can only be done by working with the existing user-level library code. (Well, unless you build a new library within your kernel.) This argument almost misses the main point: Cluster systems exist for to simplify the system for the end users. When you think in terms of kernel modifications, most of the changes end up being tricks to prove to other developers how clever you are, not features that make the system easier to use (example: Plan 9). And most of the clever tricks end up getting in the way of the developer, rather than speeding up the application or really simplifying the programming model. DSM / Distributed Shared Memory (which I prefer to call NVM, Network Virtual Memory) is a prefect example of this. It certainly doesn't help the end user. The only aspect an end user or system administrator sees is that NVM causes cascading system failures when one machine drops out of the cluster. The programmer doesn't benefit either. They initially think that NVM gives them an easy to use shared memory model. They quickly find that it only appears to be normal memory. To get even barely acceptable performance they have to treat the shared memory very differently than regular memory. Variables written by different processes have to be segregated into different pages. Writes have to grouped. You have to think about when to manually cache structures to avoid a re-read that might trigger a network page fault, but refresh that structure when you need potentially updated values. Many independent attempts have concluded that most application ports take a long time to tune for NVM, and almost all end up using NVM as a stylized message passing mechanism. -- Donald Becker [EMAIL PROTECTED] Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf