Re: Rolling 5.5.25?

Peter Rossbach Fri, 17 Aug 2007 12:21:13 -0700

Hi Filip,

OK, but second  is a real problem and frist you fix ;-)
Can you fix it as we call checkExpire at the RecoveryThread?


Peter


Am 17.08.2007 um 21:11 schrieb Filip Hanik - Dev Lists:

There are a few drawbacks to my current implementation that I needto think about, these are
1. I also reset the membership map, this should probably not bedone at all2. During a failure, since I invoked stop, to reset the thread, Iam no longer sending out "member disappared" messages, as theservice is not running
Filip

Filip Hanik - Dev Lists wrote:
hi Peter,
here is the SVN link
http://svn.apache.org/viewvc?view=rev&revision=567104
basically what I do, in the receiver/sender thread, if an errorhappens, I increment a counter.
this counter also gets decremented upon success.
after X number of consecutive failures, I launch a new thread,called a RecoveryThread
this thread simply invokes stop->init->start until it succeeds.
The recovery thread is setup as a singleton, ie, only one can runat any point in time.
I think you'll find that the solution in 6, is much simpler, as Idon't have to change any code in the existing membership stuff.I had to pull out some initialization from the constructor intothe init() method, but after that I could use stop/init/start
without changing the sender or receiver threads.
I also changed the logging a little bit, only logging the erroronce (after that log at debug ) to avoid filling up the logs.
the recovery thread will log every 5 seconds.

So to really answer your question after all my bla bla,
Yes, the only option is to shut down the socket and start a newone. But to get it done right, I rely on the McastServiceImpl todo the right thing during stop() and start(),
instead of recoding that into a new method

Filip

Peter Rossbach wrote:
HI Filip,
can you explain your 6.0.x fix ((http://issues.apache.org/bugzilla/show_bug.cgi?id=40042).) a little bit, please?I think we hava only a chance to recover membership after clustermembership send failure, to reopen the socket.
Here my current cluster 5.5 fix:

==
    public class SenderThread extends Thread {
        long time;
        McastServiceImpl service ;
        public SenderThread(long time, McastServiceImpl service) {
            this.time = time;
            this.service = service ;
            setName("Cluster-MembershipSender");

        }
        public void run() {
            long retry = 0 ;
            while ( doRun ) {
                try {
                    send();
                    retry = 0;
                } catch ( Exception x ) {
// FIXME: Only increment as network is reallydown: NoRouteToHostException or BindException
                    retry++ ;
                    log.warn("Unable to send mcast message.",x);
                }

                if(retry > 0) {
                    if(retry * time < timeToExpiration ) {
                        try {
                            Thread.sleep(time);
                        } catch ( Exception ignore ) {}
                       restartHeartbeat(retry);
                    } else {
                        long recover = retry % 10 ;
                        try {
                            Thread.sleep((recover+1)*time);
                        } catch ( Exception ignore ) {}
                        if( recover == 0) {
                            restartHeartbeat(retry) ;
                        }
                    }
                }
            }
        }

        private void restartHeartbeat(long retry) {
            try {
                socket.leaveGroup(address);
            } catch (IOException ignore) {}
            try {
log.warn("Restarting membership heartbeat aftersend failure (number of recovery " + retry + ")");
                service.setupSocket();
                socket.joinGroup(address);
            } catch (IOException ignore) {}
        }

    }//class SenderThread
===
peter



Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
Rainer Jung wrote:
Looks like an active weekend then ;)
I'm sorry, I just reread friday. Friday next week is totallyfine. No one should have to work on a weekend.also, for the mcast problem, I'm implementing a fix in 6.0 and6.x, you should be able to copy that one
Filip
I think that will suffice.

Regards,

Rainer

Filip Hanik - Dev Lists wrote:
sounds good, lets shoot for Tue or Wed next week then

Filip
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG Free Edition. Version: 7.5.484 / Virus Database:269.12.0/957 - Release Date: 8/16/2007 1:46 PM
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Rolling 5.5.25?

Reply via email to