Hmmm ... I was following this discussion but then got confused when Lisheng said to change Solr to "compromise consistency in order to increase availability" when your concern is "how long replica is behind leader". Seems you want more consistency vs. less in this case? One of the reasons behind Solr's leader election approach is to achieve low-latency eventual consistency (Mark's term from the linked to discussion).
Un-committed docs are only visible if you use real-time get, in which case the request is served by the shard leader (or replica) from its update log. I suppose there's a chance of a few millis between the leader having the request in its tlog and the replica having the doc it its tlog but that seems like the nature of the beast. Meaning that Solr never promised to be 100% consistent at millisecond granularity in a distributed model - any small time-window between what a leader has and replica are probably network latency which you should solve outside of Solr. I suspect you could direct all your real-time get requests to leaders only using some smart client like CloudSolrServer if it mattered that much. Otherwise, all other queries require the document to be committed to be visible. I suppose there is a very small window when a new searcher is open on the leader and the new searcher is not yet open on the replica. However, with soft-commits, that too seems like a milli or two based on network latency. @Shawn - yes, I've actually seen this work in my cluster. We lose replicas from time-to-time and indexing keeps on trucking. On Thu, Apr 11, 2013 at 4:51 PM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > Hi Otis, > > Thanks very much for helps, your explanation is very clear. > > My main concern is not the return status for indexing calls (although > which is > also important), my main concern is how long replica is behind the leader > (or > putting in your way, how consistent search picture is to client A and B). > > Our application requires clients see same result whether he hits leader or > replica, so it seems we do have a problem here. If no better solution I may > consider to change solr4 a little (I have not read solr4x fully yet) to > compromise > consistency (C) in order to increase availability (A), on a high level do > you see > serious problems in this approach (I am familiar with lucene/solr code to > some > extent)? > > Thanks and best regards, Lisheng > > -----Original Message----- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Thursday, April 11, 2013 2:50 PM > To: solr-user@lucene.apache.org > Subject: Re: SolrCloud leader to replica > > > But note that I misspoke, which I realized after re-reading the thread > I pointed you to. Mark explains it nicely there: > * the index call returns only when (and IF!) indexing to all replicas > succeeds > > BUT, that should not be mixed with what search clients see! > Just because the indexing client sees the all or nothing situation > depending on whether indexing was successful on all replicas does NOT > mean that search clients will always see a 100% consistent picture. > Client A could hit the leader and see a newly indexed document, while > client B could query the replica and not see that same document simply > because the doc hasn't gotten there yet, or because soft commit hasn't > happened just yet. > > Otis > -- > Solr & ElasticSearch Support > http://sematext.com/ > > > > > > On Thu, Apr 11, 2013 at 4:39 PM, Zhang, Lisheng > <lisheng.zh...@broadvision.com> wrote: > > Thanks very much for your helps! > > > > -----Original Message----- > > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > > Sent: Thursday, April 11, 2013 1:23 PM > > To: solr-user@lucene.apache.org > > Subject: Re: SolrCloud leader to replica > > > > > > Yes, I *think* that is the case. Some distributed systems have the > > option to return success to caller only after data has been > > added/indexed to N other nodes, but I think Solr doesn't have this > > yet. Somebody please correct me if I'm wrong. > > > > See: http://search-lucene.com/?q=eventually+consistent&fc_project=Solr > > > > Otis > > -- > > Solr & ElasticSearch Support > > http://sematext.com/ > > > > > > > > > > > > On Thu, Apr 11, 2013 at 12:51 PM, Zhang, Lisheng > > <lisheng.zh...@broadvision.com> wrote: > >> Hi Otis, > >> > >> Thanks very much for the quick help! We are considering to upgrade > >> from solr 3.6 to 4x and use solrCloud, but we are concerned about > >> performance related to replica? In this scenario it seems that the > >> replica would be a few seconds beyond leader because replica would > >> start indexing only afer leader finishes his? > >> > >> Thanks and best regards, Lisheng > >> > >> -----Original Message----- > >> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > >> Sent: Thursday, April 11, 2013 8:11 AM > >> To: solr-user@lucene.apache.org > >> Subject: Re: SolrCloud leader to replica > >> > >> > >> I believe it indexes locally on leader first. Otherwise one could end > >> up with a situation where indexing to replica(s) succeeds and indexing > >> to leader fails, which I suspect might create a mess. > >> > >> Otis > >> -- > >> Solr & ElasticSearch Support > >> http://sematext.com/ > >> > >> > >> > >> > >> > >> On Thu, Apr 11, 2013 at 2:53 AM, Zhang, Lisheng > >> <lisheng.zh...@broadvision.com> wrote: > >>> Hi, > >>> > >>> In solr 4x solrCloud, suppose we have only one shard and > >>> two replica, when leader receives the indexing request, > >>> does it immediately forward request to two replicas or > >>> it first indexes request itself, then sends request to its > >>> two replica? > >>> > >>> Thanks very much for helps, Lisheng > >>> > >>> >