"Vincent Diepeveen" <[EMAIL PROTECTED]> writes:
> I wouldn't rule out that linux kernel simply has bugs there. The testing of
> those kernels is total amateuristic.
No the testing it is totally open.
Just because you can't see how a process works doesn't make it better.
Eric
___
On Tue, 5 Dec 2006 at 7:03am, Joshua Baker-LePain wrote
Sure, there are many less than good Tier 1s out there, so caveat
^^
*sigh* That should read 'Tier 2s', of course. That'll teach me to post
before coffee.
beowulfer. But you can some who, IMH
On Mon, 4 Dec 2006 at 7:01pm, Robert G. Brown wrote
This is really the basic difference between tier 1 and tier 2. You can
save short term money with the latter, but have to do things like just
plain throw out hardware -- after sweating over it for a long time,
nagging your tier 2 vendor, getti
were doing the running. Any toplevel AMD exec would do anything to
crank up quality control rather than have to endure the sight of an old
AMD and our system vendor (HP) did a good job replacing >1500 opteron 252's
in my cluster this year. this was the result of a "test escape", which
did ev
On Mon, 4 Dec 2006, Jim Lux wrote:
At 10:59 AM 12/4/2006, Robert G. Brown wrote:
On Mon, 4 Dec 2006, Jim Lux wrote:
Processors are a high dollar item for something quite compact, they're
sort of commodity (at least as far as the end user is concerned), so
they're ripe for all the fiddles tha
At 10:59 AM 12/4/2006, Robert G. Brown wrote:
On Mon, 4 Dec 2006, Jim Lux wrote:
Processors are a high dollar item for something quite compact,
they're sort of commodity (at least as far as the end user is
concerned), so they're ripe for all the fiddles that have been used
on such items for m
On Mon, 4 Dec 2006, Jim Lux wrote:
Processors are a high dollar item for something quite compact, they're sort
of commodity (at least as far as the end user is concerned), so they're ripe
for all the fiddles that have been used on such items for millenia. Hey,
didn't Archimedes get famous for
At 07:15 AM 12/4/2006, Tim Moore wrote:
Update to node drop-off:
The AMD engineer with whom I talked was amazed that such CPUs made
it beyond quality control. He also suggested that the vendor may
have inadvertently mixed returned (previously fetermined to be
flawed processors) with the new
essage -
From: "Tim Moore" <[EMAIL PROTECTED]>
To:
Sent: Monday, December 04, 2006 4:15 PM
Subject: Re: [Beowulf] Node Drop-Off
Update to node drop-off:
I wrote a few weeks ago to ask about node drop-off. A quick note...I
had a cluster run for 3 years without failure and
Tony
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Tim Moore
Sent: Monday, December 04, 2006 10:16 AM
To: beowulf@beowulf.org
Subject: Re: [Beowulf] Node Drop-Off
Update to node drop-off:
I wrote a few weeks ago to ask about node drop-off. A quick note...I
Update to node drop-off:
I wrote a few weeks ago to ask about node drop-off. A quick note...I
had a cluster run for 3 years without failure and I upgraded the Opteron
240 CPUs to 250s. The upgrade required a BIOS upgrade and while I was
at it, upgraded the OS and security. Some readers prov
On 11/12/06, Tim Moore <[EMAIL PROTECTED]> wrote:
Hello All -
I have a compute node that has started dropping off. When I say drop
off, I mean the node (while running a job) will lose all connectivity
and the machine does not respond. I have viewed the logs and can find
no reason for the node
On Sunday 12 November 2006 16:13, Tim Moore wrote:
> Has anyone ever seen such behavior?
Others have mentioned about attaching consoles, etc, but it's also worth
trawling through any logs in /var/log to see if anything is showing up there
too.
Check dmesg whilst the node is under load, if you'
Mark Hahn wrote:
we (and the vendor) regard this as grounds for repair (usually
the power supply).
I backup what Mark says.
a) attach a console to the machine, either a serial line or a
monitor/keyboard
b) run memtest on it, followed by CPUburn or some other
compute-intensive task for a da
I have a compute node that has started dropping off. When I say drop off, I
mean the node (while running a job) will lose all connectivity and the
machine does not respond. I have viewed the logs and can find no reason for
the node to cease functioning.
if you connect a console to such a nod
Hello All -
I have a compute node that has started dropping off. When I say drop
off, I mean the node (while running a job) will lose all connectivity
and the machine does not respond. I have viewed the logs and can find
no reason for the node to cease functioning. Let me state that this
b
16 matches
Mail list logo