Re: unstable results on refresh

Giovanni Bricconi Thu, 23 Oct 2014 01:45:34 -0700

My user interface shows some boxes to describe results categories. After
half a day of small updates and delete I noticed with various queries that
the boxes started swapping while browsing.
For sure I relied too much in getting the same results on each call, now
I'm keeping the categories order in request parameters to avoid the blink
effect while browsing.


The optimize process is really slow, and I can't use it. Since I have many
other parameters that should be carried along the request to make sure that
the navigation is consistent, I would like to understand if is there a
setup that can limit the idf change and keep it low enough

I tried with

<indexConfig>

<mergeFactor>5</mergeFactor>

</indexConfig>
In solrconfig but this morning /solr/admin/cores?action=STATUS still
reports a number of segments above ten for all cores of the shard. (I'm
sure I have reloaded each core after changing the value)

Now I'm trying with expungeDeletes called from solrj, but still I don't see
the segment count decrease

UpdateRequest commitRequest = new UpdateRequest();

  commitRequest.setAction

//(action, waitFlush, waitSearcher, maxSegments, softCommit, expungeDeletes)

   ( ACTION.COMMIT, true, true, 10, false, true);

  commitRequest.process(solrServer);



2014-10-22 15:48 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:

> I would rather ask whether such small differences matter enough to
> do this. Is this something users will _ever_ notice? Optimization
> is quite a heavyweight operation, and is generally not recommended
> on indexes that change often, and 5 minutes is certainly below
> the recommendation for optimizing.
>
> There is/has been work done on "distributed IDF", but I don't quite
> know the current status that should address this (I think).
>
> But other than in a test setup, is it worth the effort?
>
> Best,
> Erick
>
> On Wed, Oct 22, 2014 at 3:54 AM, Giovanni Bricconi
> <giovanni.bricc...@banzai.it> wrote:
> > I have made some small patch to the application to make this problem less
> > visible, and I'm trying to perform the optimize once per hour, yesterday
> it
> > took 5 minutes to perform it, this morning 15 minutes. Today I will
> collect
> > some statistics but the publication process sends documents every 5
> > minutes, and I think the optimize is taking too much time.
> >
> > I have no default mergeFactor configured for this collection, do you
> think
> > that setting it to a small value could improve the situation? If I have
> > understood well having to merge segments will keep similar stats on all
> > nodes. It's ok to have the indexing process a little bit slower.
> >
> >
> > 2014-10-21 18:44 GMT+02:00 Erick Erickson <erickerick...@gmail.com>:
> >
> >> Giovanni:
> >>
> >> To see how this happens, consider a shard with a leader and two
> >> followers. Assume your autocommit interval is 60 seconds on each.
> >>
> >> This interval can expire at slightly different "wall clock" times.
> >> Even if the servers started perfectly in synch, they can get slightly
> >> out of sync. So, you index a bunch of docs and these replicas close
> >> the current segment and re-open a new segment with slightly different
> >> contents.
> >>
> >> Now docs come in that replace older docs. The tf/idf statistics
> >> _include_ deleted document data (which is purged on optimize). Given
> >> that doc X an be in different segments (or, more accurately, segments
> >> that get merged at different times on different machines), replica 1
> >> may have slightly different stats than replica 2, thus computing
> >> slightly different scores.
> >>
> >> Optimizing purges all data related to deleted documents, so it all
> >> regularizes itself on optimize.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi
> >> <giovanni.bricc...@banzai.it> wrote:
> >> > I noticed again the problem, now I was able to collect some data. in
> my
> >> > paste http://pastebin.com/nVwf327c you can see the result of the same
> >> query
> >> > issued twice, the 2nd and 3rd group are swapped.
> >> >
> >> > I pasted also the clusterstate and the core state for each core.
> >> >
> >> > The logs did'n show any problem related to indexing, only some
> malformed
> >> > query.
> >> >
> >> > After doing an optimize the problem disappeared.
> >> >
> >> > So, is the problem related to documents that where deleted from the
> >> index?
> >> >
> >> > The optimization took 5 minutes to complete
> >> >
> >> > 2014-10-21 11:41 GMT+02:00 Giovanni Bricconi <
> >> giovanni.bricc...@banzai.it>:
> >> >
> >> >> Nice!
> >> >> I will monitor the index and try this if the problem comes back.
> >> >> Actually the problem was due to small differences in score, so I
> think
> >> the
> >> >> problem has the same origin
> >> >>
> >> >> 2014-10-21 8:10 GMT+02:00 lboutros <boutr...@gmail.com>:
> >> >>
> >> >>> Hi Giovanni,
> >> >>>
> >> >>> we had this problem as well.
> >> >>> The cause was that the different nodes have slightly different idf
> >> values.
> >> >>>
> >> >>> We solved this problem by doing an optimize operation which really
> >> remove
> >> >>> suppressed data.
> >> >>>
> >> >>> Ludovic.
> >> >>>
> >> >>>
> >> >>>
> >> >>> -----
> >> >>> Jouve
> >> >>> France.
> >> >>> --
> >> >>> View this message in context:
> >> >>>
> >>
> http://lucene.472066.n3.nabble.com/unstable-results-on-refresh-tp4164913p4165086.html
> >> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>>
> >> >>
> >> >>
> >>
>

Re: unstable results on refresh

Reply via email to