Mark Hahn wrote:
> there's a semi-recent kernel feature which allows the kernel to avoid
> user-space by putting console traffic onto the net directly
> see Documentation/networking/netconsole.txt
Now that looks very interesting. Thanks for the pointer!
Cheers
Carsten
_
Robert G. Brown wrote:
>
> "putting a cheap monitor on a suspect or crashed node"
>
One monitor to > 1300 1U server is not practical :)
> Or even after a crash. If the primary graphics card is being used as a
> console, the frame buffer will probably retain the last kernel oops
> written to
Hi all Lawrence Stewart wrote:
> [...]
> A month or two later, the department calls in to inquire "Where's the
> numbers
> report?" After some confusion back and forth, it seems that the department
> had been dutifully filing the abend dumps in a row of file cabinets, and
> wanted
> to know why th
On Sep 9, 2008, at 7:41 PM, Robert G. Brown wrote:
On Tue, 9 Sep 2008, David Mathog wrote:
word. In the old days some of those crash events spewed garbage to
the
printer, and that resulted in a ream of nonsense on the floor, and
more
often than not, the paper mashed into an accordian behi
On Tue, 9 Sep 2008, David Mathog wrote:
word. In the old days some of those crash events spewed garbage to the
printer, and that resulted in a ream of nonsense on the floor, and more
often than not, the paper mashed into an accordian behind a pinfeed jam.
Nobody said it was EASY back then, ri
"Robert G. Brown" <[EMAIL PROTECTED]> wrote:
>
> One last method (from back in the dark ages):
>
> "putting a tty-output printer on as a console printer"
Better yet, set up the serial port as a console, then attach another
machine via a serial line, and just have the 2nd machine log everythin
could be we don't know how to ask; I'm not aware of HP actually
offering such a kit. or how much we'd be willing to pay.
it is an interesting question: not just how much does downtime cost you,
but what are the kinds of failures you see and expect? our clusters
have been remarkably robust,
On Tue, Sep 09, 2008 at 06:41:01PM -0400, Mark Hahn wrote:
>> You don't have your own spares kit? For big clusters like yours, it
>> doesn't cost much.
>
> could be we don't know how to ask; I'm not aware of HP actually offering
> such a kit. or how much we'd be willing to pay.
Well, I always b
I _do_ wish it was a bit more common to have onsite spares. not sure
why vendors (HP at least) don't like to do this. maybe just that it
might
get kicked around or otherwise abused...
You don't have your own spares kit? For big clusters like yours, it
doesn't cost much.
could be we don't kno
On Tue, 9 Sep 2008, Mark Hahn wrote:
for small sites or individuals, it make a lot of sense (for the vendor)
to try to filter out some of the randomness of support calls before
committing a person. of course, a good CRM system would help this - perhaps
that's why RGB gets satisfaction from Dell
> Again, I'm not picking on Dell specifically. I've seen this behavior
> with other large vendors. My point is that "on-site support" usually
> isn't always, so don't believe the hype.
I think highly of HP service and HP hardware in general. we always spec
onsite/NBD support. at first, we spe
On Tue, Sep 09, 2008 at 05:46:50PM -0400, Mark Hahn wrote:
> I _do_ wish it was a bit more common to have onsite spares. not sure
> why vendors (HP at least) don't like to do this. maybe just that it
> might
> get kicked around or otherwise abused...
You don't have your own spares kit? For b
Again, I'm not picking on Dell specifically. I've seen this behavior
with other large vendors. My point is that "on-site support" usually
isn't always, so don't believe the hype.
I think highly of HP service and HP hardware in general. we always spec
onsite/NBD support. at first, we spent a l
Robert G. Brown wrote:
> On Mon, 8 Sep 2008, Greg Lindahl wrote:
>
>> On Mon, Sep 08, 2008 at 02:58:36PM -0400, Prentice Bisbal wrote:
>>
>>> I think these trends have more to do with the cheap cost of Dell
>>> Hardware and Dell's sales force and marketing to upper management than
>>> they do w
On Tue, 9 Sep 2008, Carsten Aulbert wrote:
We did get a few messages, albeit not from the kernel when an error
happened. I'll have another look today, maybe I did something wrong.
If your kernel is out and out crashing, you might not get anything at
all. In that case, let me add:
"putting a
On Tue, Sep 09, 2008 at 02:12:02PM -0400, Robert G. Brown wrote:
> If I buy e.g. a Dell
> laptop (as I have for six or seven years now) I pay a single, easily
> budgeted price and if it breaks (as it has six or seven times now over
> the years -- I USE my laptop, run hard and put up wet), a nice m
On Tue, 9 Sep 2008, Carsten Aulbert wrote:
My question now, is there a cute little way to gather all the console
outputs of > 1000 nodes? The nodes don't have physical serial cables
attached to them - nor do we want to use many concentrators to achieve
this - but the off-the-shelf Supermicro box
On Mon, 8 Sep 2008, Greg Lindahl wrote:
On Mon, Sep 08, 2008 at 02:58:36PM -0400, Prentice Bisbal wrote:
I think these trends have more to do with the cheap cost of Dell
Hardware and Dell's sales force and marketing to upper management than
they do with any technical advantages Dell has over
Jeff Johnson wrote:
>
>> A Xeon is a Xeon is a Xeon.
>>
> This is a very true statement.
>
> Unfortunately for many, the commonality ends where the processor and
> socket meet. There is a great deal of deviation in motherboard designs.
> Some are much better than others and it is not always ba
We did get a few messages, albeit not from the kernel when an error
happened. I'll have another look today, maybe I did something wrong.
there's a semi-recent kernel feature which allows the kernel to
avoid user-space by putting console traffic onto the net directly
see Documentation/networkin
Carsten Aulbert wrote:
[server console management for many servers with conserver]
>
We use conserver to get serial console access to almost all our machines.
Below is the forwarded answer to your messages from my coworker who's in
charge of this.
The tools he created for interfacing IPMI and con
Carsten Aulbert wrote:
> Hi all,
>
> I would tend to guess this problem is fairly common and many solutions
> are already in place, so I would like to enquirer about your solutions
> to the problem:
>
> In our large cluster we have certain nodes going down with I/O hard disk
> errors. We have some
Carsten Aulbert <[EMAIL PROTECTED]> writes:
> For the time being we are experimenting with using "script" in many
> "screen" environment which should be able to monitor ipmitool's SoL
> output, but somehow that strikes me as inefficient as well.
First, you should probably never want script+screen
Hi,
Geoff Galitz wrote:
> You can also configure any standard (distribution shipped) syslog to log
> remotely to your head node or even a seperate logging master. Anything that
> gets reported to the syslog facility can be reported/archived in this
> manner, you just need to dig into the document
>Does this capture (almost) everything what happens to a machine? w have
>not yet looked into syslog-ng but a looks into your config files would
>be very nice.
You can also configure any standard (distribution shipped) syslog to log
remotely to your head node or even a seperate logging master.
Hi
thanks for the reply
Reuti wrote:
> I setup syslog-ng on the nodes to log to the headnode. There each node
> will have a distinct file e.g. "/var/log/nodes/node42.messages". If you
> are interested, I could post my configuration files for headnode and
> clients.
Does this capture (almost) eve
Hi,
Am 09.09.2008 um 09:53 schrieb Carsten Aulbert:
Hi all,
I would tend to guess this problem is fairly common and many solutions
are already in place, so I would like to enquirer about your solutions
to the problem:
In our large cluster we have certain nodes going down with I/O hard
disk
Hi all,
I would tend to guess this problem is fairly common and many solutions
are already in place, so I would like to enquirer about your solutions
to the problem:
In our large cluster we have certain nodes going down with I/O hard disk
errors. We have some suspicion about the causes but would
28 matches
Mail list logo