Package: ocfs2-tools
Version: 1.4.1-1
Severity: important

When unmounting an OCFS2 filesystem, unmount completes but ocfs2_hb_ctl 
segfaults before stopping the heartbeat
which causes a situation that the OCFS2 cluster cannot be stopped before 
rebooting.

Steps to reproduce this situation:

1. Boot the system.
2. Test the /etc/init.d/ocfs2 and /etc/init.d/o2cb scripts by starting and 
stopping them. They stop successfully.
This is what happens when an OCFS2 filesystem has not been mounted since the 
system has been booted up:

server3:~# /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) OK

server3:~# /etc/init.d/o2cb stop
Stopping O2CB cluster ocfs2-www: OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unloading module "ocfs2_stack_o2cb": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK

3. Start both ocfs2 and o2cb again and mount an OCFS2 filesystem.

server3:~# mount /usr/ocfs2
Feb 24 11:36:50 server3 kernel: [  529.764322] o2net: accepted connection from 
node server4 (num 3) at 192.168.1.4:7777
Feb 24 11:36:53 server3 kernel: [  534.669832] OCFS2 1.5.0
Feb 24 11:36:53 server3 kernel: [  534.693645] ocfs2_dlm: Nodes in domain 
("7F6283D7CA09400B9ACB687AC1B70088"): 2 
Feb 24 11:36:53 server3 kernel: [  534.711619] kjournald starting.  Commit 
interval 5 seconds
Feb 24 11:36:53 server3 kernel: [  534.714368] ocfs2: Mounting device (254,8) 
on (node 2, slot 0) with ordered data mode.

4. Unmount the filesystem

server3:~# umount /usr/ocfs2

Feb 24 11:37:10 server3 kernel: [  556.882125] ocfs2_hb_ctl[4552]: segfault at 
0 ip 7fd38e033a90 sp 7fff96729328 error 4 in libc-2.7.so[7fd38dfb9000+14a000]
Feb 24 11:37:10 server3 kernel: [  556.884886] ocfs2: Unmounting device (254,8) 
on (node 2)

While the filesystem is unmounted correctly, the heartbeat could not be 
stopped. So trying to stop the cluster fails.

server3:~# /etc/init.d/ocfs2 stop
Stopping Oracle Cluster File System (OCFS2) OK

server3:~# /etc/init.d/o2cb stop
Stopping O2CB cluster ocfs2-www: Failed
Unable to stop cluster as heartbeat region still active

As heartbeat could not be stopped, when the server is rebooted, other nodes 
print error messages until the server is up.
An example is below:

Feb 24 11:26:49 server4 kernel: [119051.392388] o2net: connection to node 
server3 (num 2) at 192.168.1.3:7777 has been idle for 10.0 seconds, shutting it 
down.
Feb 24 11:26:49 server4 kernel: [119051.392448] (0,4):o2net_idle_timer:1468 
here are some times that might help debug the situation: (tmr 1235467599.500913 
now 1235467609.500099 dr 1235467599.500901 adv 
1235467599.500918:1235467599.500918 func (dbc4763b:502) 
1235467367.658695:1235467367.658701)
Feb 24 11:26:49 server4 kernel: [119051.392573] o2net: no longer connected to 
node server3 (num 2) at 192.168.1.3:7777
Feb 24 11:26:59 server4 kernel: [119067.797445] 
(3802,4):o2net_connect_expired:1629 ERROR: no connection established with node 
2 after 10.0 seconds, giving up and returning errors.

I think the problem is in libc-2.7.so, because I see similar segfaulting when I 
stop multipath daemon too (which prints
a similar message including libc-2.7.so) In any case, ocfs2 operation has 
problems on lenny.

-- System Information:
Debian Release: 5.0
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages ocfs2-tools depends on:
ii  debconf [debconf-2.0]     1.5.24         Debian configuration management sy
ii  libc6                     2.7-18         GNU C Library: Shared libraries
ii  libcomerr2                1.41.3-1       common error description library
ii  libglib2.0-0              2.16.6-1       The GLib library of C routines
ii  libncurses5               5.7+20081213-1 shared libraries for terminal hand
ii  libreadline5              5.2-3.1        GNU readline and history libraries
ii  libuuid1                  1.41.3-1       universally unique id library

ocfs2-tools recommends no packages.

Versions of packages ocfs2-tools suggests:
pn  ocfs2console                  <none>     (no description available)

-- debconf information:
  ocfs2-tools/heartbeat_threshold: 31
  ocfs2-tools/reconnect_delay: 2000
  ocfs2-tools/init: false
  ocfs2-tools/keepalive_delay: 2000
  ocfs2-tools/clustername: ocfs2
  ocfs2-tools/idle_timeout: 30000



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to