This is a feature. ;) If you have mounted a volume on two or more nodes, the expectation is that the private interconnect will always remain up. If you shutdown the network on a node, the cluster stack will have to kill a node. It does so inorder to prevent hangs in cluster operations.
In a 2 node setup, the higher node number will fence. I would imagine Node A is the higher number. But I am not sure why Node B fenced on restart. The "eeeeeee" message does not ring a bell. If you want to get to the bottom of this, setup a netconsole server to capture the logs. Or, remember to shut down the cluster before switching to single user mode. Sunil John McNulty wrote: > Hello, > > I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN > with dual FC cards, dual switches and an HP EVA on the back end. All > SAN disks are multipathed. Installed software is: > > Redhat 5.3 > ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5 > ocfs2-tools-1.4.2-1.el5 > ocfs2console-1.4.2-1.el5 > Oracle RAC 11g ASM > Oracle RAC 11g Clusterware > Oracle RAC 10g databases > > OCFS2 isn't being used by RAC, we're using ASM for that, but OCFS2 is > used to provide a shared /usr/local, /home and /apps. > > Yesterday I discovered something very unexpected. I shutdown node B > to single user mode, and immediately node A crashed. The only message > on the console was SysRq Resetting. Node A then rebooted normally. > I then exit single user mode on node B to jump back up to run level 3 > the system started up ok, but no sooner had I got the login prompt on > the console when it too crashed with SysRq Resetting. > > I repeated the steps for a second time and it did exactly the same > thing all over again. It appears to be repeatable. > > The only thing that jumped out at me watching the consoles when this > was going on was that node B fails to stop the OCFS2 service on > shutdown, even going to far as to tell me after the fact with an > "eeeeeee" message. I assume that's bad ! > > There were no other console messages to give me a clue, so this is my > starting point. Anyone got any ideas? > > Oh, there's one other thing that may or may not be relevant. On this > cluster, and another identical cluster, mounted.ocfs2 -f always shows > the node B cluster member as "Unknown" instead of the system name. As > far as I'm aware I've followed the OCFS2 setup to the letter (it's not > complicated) and "o2cb_ctl -It node" on either node shows both systems > with all the correct details. Both nodes mount the cluster > filesystems ok and work just fine. > > I've not had chance to try my single user test on the other identical > cluster yet as I've not been able to get a downtime window for it. If > I do, then I will. > > Rgds, > > John > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
