On Tuesday 02 May 2006 14:02, Bill Broadley wrote:
> Mark Hahn said:
> > moving it, stripped them out as I didn't need them. (I _do_ always
> > require net-IPMI on anything newly purchased.) I've added more nodes
> > to the cluster
>
> Net-IPMI on all hardware? Why? Running a second (or 3rd) net
On Fri, Apr 28, 2006 at 09:27:19AM +0200, Jaime Perea wrote:
> >From my point of view, the big problem there is the IO, we installed
> on our small cluster the pvfs2 system, it works well, using the
> myrinet gm for the passing mechanism, the pvfs2 is only a solution
> for parallel IO, since mpi ca
On Tuesday 02 May 2006 17:49, mg wrote:
> I use MPICH-1.2.5.2 to generate and run an FEM parallel application.
>
> During a parallel run, one process can crash, leaving the other
> processes run and OS commands have to be used for kill these zombies.
> So, does someone have a solution to avoid zom
I think IMPI sounds pretty worthwhile, although I don't have any first
hand experience with it yet. The abilities to reboot a hung system or
get decent statistics about this, that and the other thing, seems worth
the cost in many cases, and my management has decided to require it on
all of our ne
At 01:17 PM 5/2/2006, Bill Broadley wrote:
> blower, rather than a bunch of 40mm axial/muffin fans. a much larger
cluster
> I'm working on now (768 nodes) has 14 40mm muffin fans in each node! while
> I know I can rely on the vendor (HP) to replace failures promptly and
without
> complaint,
Vincent,
So, I just get back from vacation today and I find this post in my huge
mailbox. Reason would tell me to not waste time and ignore it, but I
can't resist such a treat.
Diepeveen wrote:
With so many nodes i'd go for either infiniband or quadrics, assuming
the largest partition also gets
mg <[EMAIL PROTECTED]> wrote:
> I use MPICH-1.2.5.2 to generate and run an FEM parallel application.
>
> During a parallel run, one process can crash, leaving the other
> processes run and OS commands have to be used for kill these zombies.
> So, does someone have a solution to avoid zombies aft
> > moving it, stripped them out as I didn't need them. (I _do_ always require
> > net-IPMI on anything newly purchased.) I've added more nodes to the cluster
>
> Net-IPMI on all hardware? Why? Running a second (or 3rd) network isn't
> a trivial amount of additional complexity, cables, or cost.
IPMI nowadays comes for free on the mainboard, and if you don't want to
run a separate
infrastructure for the light weight control traffic, then you don't even
need to add
ports/cables/switches. In case of a scyld beowulf cluster the compute
nodes are on their
own private network switch anyways so
> > in the cluster above, I choose a chassis (AIC) which has a large centrifugal
>
> Which? I noticed some of their designs redirect all heat from the powersupply
> into the side of the rack.
>
> > blower, rather than a bunch of 40mm axial/muffin fans. a much larger
> > cluster
> > I'm working
Mark Hahn said:
> moving it, stripped them out as I didn't need them. (I _do_ always require
> net-IPMI on anything newly purchased.) I've added more nodes to the cluster
Net-IPMI on all hardware? Why? Running a second (or 3rd) network isn't
a trivial amount of additional complexity, cables, o
> in the cluster above, I choose a chassis (AIC) which has a large centrifugal
Which? I noticed some of their designs redirect all heat from the powersupply
into the side of the rack.
> blower, rather than a bunch of 40mm axial/muffin fans. a much larger cluster
> I'm working on now (768 nodes)
I don't have a solution for your case, but here's an idea: MPICH-GM (MPICH
for the Myrinet GM protocol) has an option to mpirun.ch_gm that would do
what you want, if you were running Myrinet/GM:
--gm-killKill all processes seconds after the first exits.
Other than that, a resource man
You are invited to
participate in our first a web cast discussing Crosswalk Inc.'s innovative
approach to storage for grid & HPC environments. Specifically, iGrid, an
Intelligent Storage Grid System, which provides a scalable architectural fabric
enabling any application server to reach any
On Mon, 2006-05-01 at 17:04 -0400, Mark Hahn wrote:
> (also, I agree that the clumsiest level is shoulder-ish high...)
In which case, one might use a small platform to elevate oneself and
one's shoulders.
___
Beowulf mailing list, Beowulf@beowulf.org
Hi all,
I use MPICH-1.2.5.2 to generate and run an FEM parallel application.
During a parallel run, one process can crash, leaving the other
processes run and OS commands have to be used for kill these zombies.
So, does someone have a solution to avoid zombies after a failed
parallel run: can
16 matches
Mail list logo