Package: ocfs2-tools
Version: 1.4.1-1
Severity: critical
Justification: causes high load, reboot on both nodes (2-node cluster)


Log details on node1:

--
Jan 19 09:49:33 kernel: [5665461.514795] (6894,14):dlm_drop_lockres_ref:2224 
ERROR: while dropping ref on 
A35DE40B6A044A4A873B96E2F2DE42B2:M000000000000000112401200000000 (maste
r=0) got -22.
Jan 19 09:49:33 kernel: [5665461.805602] lockres: 
M00000000000000011240120000000, owner=0, state=64
Jan 19 09:49:33 kernel: [5665461.932077]   last used: 5332038594, refcnt: 3, on 
purge list: yes
Jan 19 09:49:33 kernel: [5665462.148475]   on dirty list: no, on reco list: no, 
migrating pending: no
Jan 19 09:49:33 kernel: [5665462.274649]   inflight locks: 0, asts reserved: 0
Jan 19 09:49:33 kernel: [5665462.274649]   refmap nodes: [ ], inflight=0
Jan 19 09:49:33 kernel: [5665462.274649]   granted queue:
Jan 19 09:49:33 kernel: [5665462.274649]   converting queue:
Jan 19 09:49:33 kernel: [5665462.274649]   blocked queue:
Jan 19 09:49:33 kernel: [5665462.274649] ------------[ cut here ]------------
Jan 19 09:49:33 kernel: [5665462.274649] kernel BUG at 
fs/ocfs2/dlm/dlmmaster.c:2226!
Jan 19 09:49:33 kernel: [5665462.274649] invalid opcode: 0000 [1] SMP Jan 19 09:49:33 kernel: [5665462.274649] CPU 14 Jan 19 09:49:33 kernel: [5665462.274649] Modules linked in: nls_utf8 cifs nls_base ip_vs_rr xt_connlimit nfs ocfs2 ip_vs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipt_LOG xt_limit nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables dm_rdac qla2xxx bnx2 firmware_class usbhid uhci_hcd thermal sr_mod snd_pcm snd_timer snd_page_alloc snd soundcore shpchp sg sd_mod scsi_transport_fc scsi_tgt processor pcspkr pci_hotplug meg
araid_sas loop ipv6 ide_pci_generic ide_core i2c_i801 i2c_core hid ff_memless 
fan thermal_sys ext3 jbd mbcache evdev ehci_hcd dm_round_robin dm_multipath 
dm_mod cdrom cdc_ether usbne
t mii button ata_piix ata_generic libata scsi_mod dock
Jan 19 09:49:33 kernel: [5665462.274649] Pid: 6894, comm: dlm_thread Not 
tainted 2.6.26-2-amd64 #1
Jan 19 09:49:33 kernel: [5665462.274649] RIP: 0010:[<ffffffffa038c381>]  
[<ffffffffa038c381>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1dd/0x1f0
Jan 19 09:49:33 kernel: [5665462.274649] RSP: 0018:ffff810875ceddd0  EFLAGS: 
00010202
Jan 19 09:49:33 kernel: [5665462.274649] RAX: ffff8105364e8888 RBX: 
0000000000000000 RCX: 00000000031a9f89
Jan 19 09:49:33 kernel: [5665462.274649] RDX: 0000000000000000 RSI: 
0000000000000034 RDI: 0000000000000282
Jan 19 09:49:33 kernel: [5665462.274649] RBP: 000000000000001f R08: 
0000000000000000 R09: ffff810875ced900
Jan 19 09:49:33 kernel: [5665462.274649] R10: 0000000000000000 R11: 
0000000000000000 R12: ffff8105364e8840
Jan 19 09:49:33 kernel: [5665462.274649] R13: ffff81086ddd7800 R14: 
ffff81070616bb80 R15: 00000000000000b5
Jan 19 09:49:33 kernel: [5665462.274649] FS:  0000000000000000(0000) 
GS:ffff81107cf981c0(0000) knlGS:0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] CS:  0010 DS: 0018 ES: 0018 CR0: 
000000008005003b
Jan 19 09:49:33 kernel: [5665462.274649] CR2: 0000000002694000 CR3: 
0000000000201000 CR4: 00000000000006e0
Jan 19 09:49:33 kernel: [5665462.274649] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] DR3: 0000000000000000 DR6: 
00000000ffff0ff0 DR7: 0000000000000400
Jan 19 09:49:33 kernel: [5665462.274649] Process dlm_thread (pid: 6894, 
threadinfo ffff810875cec000, task ffff81086e521770)
Jan 19 09:49:33 kernel: [5665462.274649] Stack:  000000000000001f 
ffff81070616bb80 ffff810500000000 00000000ffffffea
Jan 19 09:49:33 kernel: [5665462.274649]  1f01000000000000 303030303030304d 
3030303030303030 3032313034323131
Jan 19 09:49:33 kernel: [5665462.274649]  0030303030303030 0000000000000000 
0000000000000000 0000000000000000
Jan 19 09:49:33 kernel: [5665462.274649] Call Trace:
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffffa0381860>] ? 
:ocfs2_dlm:dlm_thread+0x237/0x1107
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff802461a5>] ? 
autoremove_wake_function+0x0/0x2e
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffffa0381629>] ? 
:ocfs2_dlm:dlm_thread+0x0/0x1107
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8024607f>] ? 
kthread+0x47/0x74
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff802300ed>] ? 
schedule_tail+0x27/0x5c
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8020cf38>] ? 
child_rip+0xa/0x12
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8021a866>] ? 
lapic_next_event+0xf/0x13
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff80246038>] ? 
kthread+0x0/0x74
Jan 19 09:49:33 kernel: [5665462.274649]  [<ffffffff8020cf2e>] ? 
child_rip+0x0/0x12
Jan 19 09:49:33 kernel: [5665462.274649] Jan 19 09:49:33 kernel: [5665462.274649] Jan 19 09:49:33 kernel: [5665462.274649] Code: 8b 14 25 24 00 00 00 48 c7 c1 e0 89 39 a0 89 d2 4c 89 74 24 08 89 44 24 10 31 c0 89 2c 24 e8 2c 90 ea df 4c 89 e7 e8 32 43 ff ff <0f> 0b eb fe 48 83 c4 70 89 d8 5b 5d 41 5c 41 5d 41 5e c3 41 54 Jan 19 09:49:33 kernel: [5665462.274649] RIP [<ffffffffa038c381>] :ocfs2_dlm:dlm_drop_lockres_ref+0x1dd/0x1f0
Jan 19 09:49:33 kernel: [5665462.274649]  RSP <ffff810875ceddd0>
Jan 19 09:49:33 kernel: [5665462.422453] ---[ end trace ee1657d875d4e1f1 ]---
--

--
Jan 19 09:54:05 kernel: [5665830.740248] o2net: connection to node XXX (num 0) 
at x.x.x.x:xxxx has been idle for 30.0 seconds, shutting it down.
Jan 19 09:54:05 kernel: [5665831.043225] (0,12):o2net_idle_timer:1468 here are 
some times that might help debug the situation: (tmr 1295427215.500604 now 
1295427245.497577 dr 12
95427215.497446 adv 1295427215.500636:1295427215.500637 func (8737b25e:500) 
1295427215.500605:1295427215.500635)
Jan 19 09:54:05 kernel: [5665831.247064] o2net: no longer connected to node XXX 
(num 0) at x.x.x.x:xxxx
Jan 19 09:54:25 kernel: [5665831.427022] (22635,0):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6482,12):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (4102,7):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] 
(21432,1):dlm_send_remote_unlock_request:359 ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423471] (5686,4):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (7690,8):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (21552,15):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6810,14):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6910,9):dlm_drop_lockres_ref:2219 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7049,11):dlm_drop_lockres_ref:2219 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (8202,3):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427022] (7005,10):dlm_drop_lockres_ref:2219 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7932,2):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.651713] (7770,13):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.423470] (6159,5):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6482,12):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (4102,7):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423471] (5686,4):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (7690,8):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (21552,15):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7295,1):dlm_drop_lockres_ref:2219 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (24522,6):dlm_do_master_request:1342 
ERROR: link to 0 went down!
Jan 19 09:54:25 kernel: [5665831.427378] (6810,14):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427432] (6910,9):dlm_purge_lockres:190 ERROR: 
status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (7049,11):dlm_purge_lockres:190 ERROR: 
status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (8202,3):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427022] (7005,10):dlm_purge_lockres:190 ERROR: 
status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7932,2):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.651713] (7770,13):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.423470] (6159,5):dlm_get_lock_resource:919 
ERROR: status = -112
Jan 19 09:54:25 kernel: [5665831.427378] (4172,12):dlm_do_master_request:1342 
ERROR: link to 0 went down!
--


Problem reflections on the node2:

--
Jan 19 09:54:36 kernel: [5414618.046127] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046133] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046138] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046165] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046170] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046180] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046184] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046208] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046213] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046255] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046259] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046277] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046281] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046317] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046322] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046363] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046367] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046374] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046379] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046394] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046399] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046405] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.046410] (20385,2):dlm_send_remote_unlock_request:359 ERROR: status = -107 Jan 19 09:54:36 kernel: [5414618.147644] o2net: accepted connection from node XXX (num 1) at x.x.x.x:xxxx
Jan 19 09:54:36 kernel: [5414620.867740] (31712,1):dlm_do_master_request:1342 
ERROR: link to 1 went down!
Jan 19 09:54:36 kernel: [5414620.867740] (31712,1):dlm_get_lock_resource:919 ERROR: status = -112 Jan 19 09:54:36 kernel: [5414620.871982] (31124,9):dlm_do_master_request:1342 ERROR: link to 1 went down! Jan 19 09:54:36 kernel: [5414620.871982] (31124,9):dlm_get_lock_resource:919 ERROR: status = -112 --

Technical investigations resulted that it was not caused by network problem.


-- System Information:
Debian Release: 5.0.7
 APT prefers stable
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26-26lenny1
CPU: Intel(R) Xeon(R) CPU E5620

Versions of packages ocfs2-tools depends on:
ii libc6 2.7-18lenny6 ii libcomerr2 1.41.3-1 ii libglib2.0-0 2.16.6-3 ii libncurses5 5.7+20081213-1 ii libreadline5 5.2-3.1 ii libuuid1 1.41.3-1

Versions of packages ocfs2-tools suggests:
ii ocfs2console 1.4.1-1
/etc/default/o2cb values:
O2CB_HEARTBEAT_THRESHOLD=31
O2CB_IDLE_TIMEOUT_MS=30000
O2CB_KEEPALIVE_DELAY_MS=2000
O2CB_RECONNECT_DELAY_MS=2000


Regards,
Szabolcs JANOSI






--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to