Bug#698225: linux-image-2.6.32-5-686-bigmem: split-brain when running "drbdadm primary $DEV" with dual primary setup in Sec/Sec state

Bastian Blank Tue, 15 Jan 2013 07:03:24 -0800

Control: tags -1 moreinfo

On Tue, Jan 15, 2013 at 03:18:11PM +0100, b...@bc-bd.org wrote:
> Issue
>       drbdadm primary $DEV
> On both nodes at the same time (either via cluster resource manager, or mssh) 
> will lead to a split brain:


This does not match the kernel log, as far as I understand it.

> [ 5067.503912] block drbd0: Split-Brain detected, dropping connection!

This is correct according to the log.

> [ 5034.677693] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams 
> -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
> [ 5034.690089] block drbd0: conn( WFBitMapT -> WFSyncUUID ) 
> [ 5034.691786] block drbd0: helper command: /sbin/drbdadm 
> before-resync-target minor-0
> [ 5034.692927] block drbd0: helper command: /sbin/drbdadm 
> before-resync-target minor-0 exit code 0 (0x0)
> [ 5034.692931] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate 
> -> Inconsistent ) 
> [ 5034.692934] block drbd0: Began resync as SyncTarget (will sync 0 KB [0 
> bits set]).
> [ 5035.242347] block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> [ 5035.242355] block drbd0: conn( SyncTarget -> Connected ) disk( 
> Inconsistent -> UpToDate ) 
> [ 5035.242362] block drbd0: helper command: /sbin/drbdadm after-resync-target 
> minor-0
> [ 5035.243575] block drbd0: helper command: /sbin/drbdadm after-resync-target 
> minor-0 exit code 0 (0x0)

A sync from the remote disk.

> [ 5046.639518] block drbd0: peer( Secondary -> Unknown ) conn( Connected -> 
> TearDown ) pdsk( UpToDate -> DUnknown ) 
> [ 5046.639942] block drbd0: meta connection shut down by peer.

Remote was shut down.

> [ 5046.639993] block drbd0: asender terminated
> [ 5046.639994] block drbd0: Terminating drbd0_asender
> [ 5046.641094] block drbd0: conn( TearDown -> Disconnecting ) 
> [ 5046.659176] block drbd0: Connection closed
> [ 5046.659182] block drbd0: conn( Disconnecting -> StandAlone ) 
> [ 5046.659217] block drbd0: receiver terminated
> [ 5046.659218] block drbd0: Terminating drbd0_receiver
> [ 5046.659221] block drbd0: disk( UpToDate -> Diskless ) 
> [ 5046.659296] block drbd0: drbd_bm_resize called with capacity == 0
> [ 5046.659305] block drbd0: worker terminated
> [ 5046.659307] block drbd0: Terminating drbd0_worker

Device is gone.

> [ 5067.155466] block drbd0: Starting worker thread (from cqueue [2337])
> [ 5067.155541] block drbd0: disk( Diskless -> Attaching ) 
> [ 5067.207081] block drbd0: conn( Unconnected -> WFConnection ) 

Device enabled again and trying to connect.

> [ 5067.208501] block drbd0: role( Secondary -> Primary ) 
> [ 5067.212759] block drbd0: Creating new current UUID

Set to primary.

> [ 5067.503518] block drbd0: Handshake successful: Agreed network protocol 
> version 91
> [ 5067.503525] block drbd0: conn( WFConnection -> WFReportParams ) 

Connection established _after_ it was promoted to primary.

> [ 5067.503888] block drbd0: drbd_sync_handshake:
> [ 5067.503894] block drbd0: self 
> D88E7AD12FFEA493:49D971C9C18FC2FE:167E069D45704F1A:F1C0D4200B9792F4 bits:0 
> flags:0
> [ 5067.503899] block drbd0: peer 
> DD932456670DF62F:49D971C9C18FC2FE:167E069D45704F1A:F1C0D4200B9792F4 bits:0 
> flags:0

The remote device was also promoted to primary before the connection was
established.

You have to wait until both machines are connected before promoting them
to primary. The init script does this.

Bastian

-- 
Behind every great man, there is a woman -- urging him on.
                -- Harry Mudd, "I, Mudd", stardate 4513.3


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#698225: linux-image-2.6.32-5-686-bigmem: split-brain when running "drbdadm primary $DEV" with dual primary setup in Sec/Sec state

Reply via email to