Dag Wanvik <[email protected]> writes: > On 11.06.2013 18:50, benrahman wrote: > > /Master derby.log/ > > ---- BEGIN REPLICATION ERROR MESSAGE (6/5/13 3:35 PM) ---- > Exception occurred during log shipping. > java.net.SocketException: Connection reset by peer: socket write error > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > Looks like the socket the master uses to ship records to slave stopped > working; hard to say what's the issue here. Do you see anything > in the slave's log file at this time instant? > > Later replication error messages in the master's log file show that the > buffer grows full (since it can't send): > >> ---- BEGIN REPLICATION ERROR MESSAGE (6/6/13 5:46 PM) ---- >> Exception occurred during log shipping. >> org.apache.derby.impl.store.replication.buffer.LogBufferFullException >> at >> org.apache.derby.impl.store.replication.buffer.ReplicationLogBuffer.switchDirtyBuffer(Unknown > > Not sure why the slave doesn't fail over; maybe the master process needs to > be stopped (crash) before it will happen.. > It is probably right that it doesn't happen when you first see the socket > write error; it could be due to a intermittent network error.
That's right. It is supposed to try to reconnect until there's no more space in the replication log buffers, according to http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailures.html. > But I believe the slave and master have a keep-alive protocol to enable the > slave to fail over when the master is not longer seen to be > alive. I think the slave never fails over automatically, even if it detects that it has lost contact with the master. It has to be told to do so. See http://db.apache.org/derby/docs/10.10/adminguide/cadminreplicfailover.html, which says: There is no automatic failover or restart of replication after one of the instances has failed. -- Knut Anders
