Vadim:

The next time you see this, is it possible to check that the replicas
showing different index versions have the same documents? Actually, it
should be sufficient to verify that they have the same segments in
their data/index directory, and they should match the segments on the
leader _assuming_ you're not actively indexing and you stopped
indexing more than the polling interval ago.

If you are actively indexing, it should be sufficient to check that
the questionable replica's index files are changing over time, that
would mean that replication is happening.

And what's your commit interval? The polling interval on the followers is:
1> 1/2 the hard commit interval if defined to be > -1. If not
2> 1/2 the soft commit interval if defined to be > -1. If not
3> 3000ms

There are two possibilities here as I see it.
1> this is just a reporting error, which we should still address but
doesn't worry me much.
2> the TLOG/PULL replication process has some bug and the indexes are,
indeed different
2a> when you reloaded the collection, it's possible that the startup
progress kicked off a replication
       and if there's really a bug reloading just masked it.

Best,
Erick
On Sun, Nov 11, 2018 at 2:34 AM Vadim Ivanov
<vadim.iva...@spb.ntk-intourist.ru> wrote:
>
> Reload collection helps !
> After reloading collection  generation and indexversion returned by 
> Replicationhandler  catch up with the leader
>
>
> > -----Original Message-----
> > From: Vadim Ivanov [mailto:vadim.iva...@spb.ntk-intourist.ru]
> > Sent: Sunday, November 11, 2018 1:09 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Replicationhandler with TLOG replicas
> >
> > Thanks, Shawn
> > I have anticipated the answer about information returned by
> > ReplicationHandler.
> > What baffled me is that usually on most of replicas indexversion and 
> > generation
> > returned by ReplicationHandler is right and it increases with commits.
> > But on some replicas it's not - it stops changing at some moment in the past
> > forever.
> > For example, I have 5 TLOG replicas:
> > For leader(and all good 3 replicas)
> > http://host_n:8983/solr/core_n/replication?command=indexversion returnes
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":0},
> >   "indexversion":1541885907200,
> >   "generation":1704}
> >
> > But for one replica:
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":0},
> >   "indexversion":1540842454653,
> >   "generation":1216}
> >
> > Could it be sign of some hidden issue? Where that information stored and why
> > it stops changing at some moment?
> > No indexing is going on of that collection at the moment of request. I'm
> > "deltaimporting" that collection ones per hour and only if needed.
> > So usually there is only 5-10 commits per day.
> > It's not a crucial issue for my use case as I have adequate information of
> > indexversion
> > and generation returned by mbeans, just curious of that strange behavior.
> >
> > > -----Original Message-----
> > > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > > Sent: Saturday, November 10, 2018 6:46 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Replicationhandler with TLOG replicas
> > >
> > > On 11/10/2018 8:05 AM, Vadim Ivanov wrote:
> > > > Seems, the latter gets some wrong information as indexversion and
> > > generation
> > > > is far behind then leader.
> > > > But core index seems up to date and healthy.
> > > > Why such things could happen on some replicas? (Most of the replicas
> > > retuned
> > > > the same information by both commands)
> > > > Is information returned  by Replicationhandler  not applicable to 
> > > > tlog/pull
> > > > replicas and is not reliable ?
> > >
> > > SolrCloud does not use the replication handler in the same way that
> > > master/slave replication does.  It "manually" initiates any replication
> > > that takes place -- the replication handler is not in charge.  You
> > > cannot be sure that the indexes the replication handler thinks are
> > > master and slave are in fact the indexes that will be replicated next.
> > > Just ignore anything that the replication handler tells you.  It may
> > > have absolutely no bearing on what's happening.
> > >
> > > Was indexing happening when you looked, or was it entirely stopped?  If
> > > indexing is ongoing, you may have seen the difference in the index
> > > versions in between data being indexed on the leader and the time that
> > > the replication is initiated.
> > >
> > > Thanks,
> > > Shawn
>

Reply via email to