On Oct 26, 2009, at 10:55 AM, beowulf-requ...@beowulf.org wrote:

Message: 6
Date: Mon, 26 Oct 2009 10:50:26 -0500
From: Rahul Nabar <rpna...@gmail.com>
Subject: Re: [Beowulf] any creative ways to crash Linux?: does a
        shared NIC      IMPI always remain responsive?
To: Bogdan Costescu <bcoste...@gmail.com>
Cc: Beowulf Mailing List <beowulf@beowulf.org>
Message-ID:
        <c4d69730910260850w5daf7de0ue26340adf8589...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Oct 26, 2009 at 8:11 AM, Bogdan Costescu <bcoste...@gmail.com> wrote:
On Sat, Oct 24, 2009 at 11:13 PM, Rahul Nabar <rpna...@gmail.com> wrote:
What surprised me was that even if I take down my eth interface with a
ifdown the IPMI still works. How does it do that ?

The IPMI traffic is IP (UDP) based and by inspecting the IP header one
can make a difference between packets with the same MAC and different
IPs.

Actually, the MAC is different too. I have one NIC but it responds to
two MACs. I guess one is transparent to the OS and the other is
handled by the BMC.
Correct. In some blades they used to share the mac, but I don't think anyone does that anymore. The BMC MAC/IP is hidden and functional regardless of the OS state. IPMI drivers can talk to the chip through the OS if need be by starting the appropriate service or kernel modules, but that's usually only fun for configuring the card, since you'll use the Network interface in most situations.




taken down, it's the Linux networking stack that doesn't see any
packet coming in, however the BMC's network stack will still be
active. That's the whole point of the BMC being a separate entity from
the main system, so that its functionality remains undisturbed when
something bad happens to the main system.

I see. So I assume the BMC's network stack is something that's
hardware or firmware implemented. It's funny that in spite of this the
IPMI gets hung sometimes (like Gerry says in his reply). I guess I can
just attribute that to bad firmware coding in the BMC.
"A Rich feature set" includes these issues :)



Another mysterious observation was this: Whenever I took eth down via the OS there is a latent period when the IPMI stops responding but then somehow it magically resurrects itself and starts working again.

Without claiming that this is the best explanation: it's possible that
the Linux driver talks to the hardware and takes down the link at the
physical level. The BMC driver then detects this and brings the link
back up so that it can continue to receive the IPMI packets.

You are probably right. THe explanation sounds reasonable to me. A
similar observation is for accessing the BIOS as well. The BMC stack
is not responsive right from the power-up. It does become responsive
for a bit but then the system drags it down (maybe when the BIOS hands
over to PXE). If I manage to "ipmitool sol activate" within this
correct window then I am able to see the BIOS. But that's pretty much
trial and error.
You will probably also notice the BMC only brings the link up at 100Mb but the OS brings it up to 1Gb. Switches can add some lag here too, if Spanning tree is enabled. Turning off Spanning Tree or turning on "Port Fast" will help. Otherwise there is a period of up to about 40 seconds that the link is "up" but the switch hasn't started passing traffic (as it checks to make sure there's no ethernet loop). This has caused many Cluster Deployments hours of head banging.

Cheers!
Greg

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to