I put these machines into production in Aug '08. Within a month we had
the first machine go bad. They hang with a amber LED and the
what's the term of the warranty?
logging-module clearly logs an error of the sort: "Voltage sensor
(VCORE) critical error. State asserted CPU2". Machine needs a
power-cycle physically from back-plane to restart
well, I think it's worth asking whether you're sure your power feed
is in good shape.
Do others face similar vendor issues? If 6 out of 23 machines go bad
within 8 months of an order can I expect the vendor to exchange the
rest too?
IMO, no. not without some indication that the fault is well reproducable
and actually fault is theirs...
And a single bad machine causes larger problems since it usually
results in disrupting jobs that run spanning across a bunch of nodes
too.
well, if you bought it as a cluster, not just some nodes,
then you might have a case that the cluster is not working.
the problem with replicability is that it permits fingerpointing.
Just wanting to hear more about how I can best resolve this issue. For
our future purchases would changing vendors help? Is there any trend
buying an extended warranty might help. buying a shrink-wrapped cluster
might help too.
behind the quality of services from different vendors? I have only
been exposed to Dell and its frustrating customer-service so far; are
HP / IBMd or any others better or worse or uncorrelated?Of course, I
my organization has been an HP shop, more or less, since inception in 2001,
for reasons I won't go into. I believe they've done well by us - I could
criticize prices, some hardware design issues, etc, but they're quite
responsible and responsive to problems.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf