Peter Rossbach wrote:
Hi Filip,
OK, but second is a real problem and frist you fix ;-)
Can you fix it as we call checkExpire at the RecoveryThread?
I don't know about this one, I could call checkExpire, but if the
datagram socket is down, then is the expiration real?
I guess this should be done, to still guarantee correct notifications
according to how it works.
In a situation like this, your cluster will be out of sync, since once
the network card is backup, no state transfer is initiated again.
what are your thoughts?
Filip
Peter
Am 17.08.2007 um 21:11 schrieb Filip Hanik - Dev Lists:
There are a few drawbacks to my current implementation that I need to
think about, these are
1. I also reset the membership map, this should probably not be done
at all
2. During a failure, since I invoked stop, to reset the thread, I am
no longer sending out "member disappared" messages, as the service is
not running
Filip
Filip Hanik - Dev Lists wrote:
hi Peter,
here is the SVN link
http://svn.apache.org/viewvc?view=rev&revision=567104
basically what I do, in the receiver/sender thread, if an error
happens, I increment a counter.
this counter also gets decremented upon success.
after X number of consecutive failures, I launch a new thread,
called a RecoveryThread
this thread simply invokes stop->init->start until it succeeds.
The recovery thread is setup as a singleton, ie, only one can run at
any point in time.
I think you'll find that the solution in 6, is much simpler, as I
don't have to change any code in the existing membership stuff.
I had to pull out some initialization from the constructor into the
init() method, but after that I could use stop/init/start
without changing the sender or receiver threads.
I also changed the logging a little bit, only logging the error once
(after that log at debug ) to avoid filling up the logs.
the recovery thread will log every 5 seconds.
So to really answer your question after all my bla bla,
Yes, the only option is to shut down the socket and start a new one.
But to get it done right, I rely on the McastServiceImpl to do the
right thing during stop() and start(),
instead of recoding that into a new method
Filip
Peter Rossbach wrote:
HI Filip,
can you explain your 6.0.x fix
((http://issues.apache.org/bugzilla/show_bug.cgi?id=40042).) a
little bit, please?
I think we hava only a chance to recover membership after cluster
membership send failure, to reopen the socket.
Here my current cluster 5.5 fix:
==
public class SenderThread extends Thread {
long time;
McastServiceImpl service ;
public SenderThread(long time, McastServiceImpl service) {
this.time = time;
this.service = service ;
setName("Cluster-MembershipSender");
}
public void run() {
long retry = 0 ;
while ( doRun ) {
try {
send();
retry = 0;
} catch ( Exception x ) {
// FIXME: Only increment as network is really
down: NoRouteToHostException or BindException
retry++ ;
log.warn("Unable to send mcast message.",x);
}
if(retry > 0) {
if(retry * time < timeToExpiration ) {
try {
Thread.sleep(time);
} catch ( Exception ignore ) {}
restartHeartbeat(retry);
} else {
long recover = retry % 10 ;
try {
Thread.sleep((recover+1)*time);
} catch ( Exception ignore ) {}
if( recover == 0) {
restartHeartbeat(retry) ;
}
}
}
}
}
private void restartHeartbeat(long retry) {
try {
socket.leaveGroup(address);
} catch (IOException ignore) {}
try {
log.warn("Restarting membership heartbeat after
send failure (number of recovery " + retry + ")");
service.setupSocket();
socket.joinGroup(address);
} catch (IOException ignore) {}
}
}//class SenderThread
===
peter
Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
Rainer Jung wrote:
Looks like an active weekend then ;)
I'm sorry, I just reread friday. Friday next week is totally fine.
No one should have to work on a weekend.
also, for the mcast problem, I'm implementing a fix in 6.0 and
6.x, you should be able to copy that one
Filip
I think that will suffice.
Regards,
Rainer
Filip Hanik - Dev Lists wrote:
sounds good, lets shoot for Tue or Wed next week then
Filip
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG Free Edition. Version: 7.5.484 / Virus Database:
269.12.0/957 - Release Date: 8/16/2007 1:46 PM
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]