Re: Issue paging when sorting on a Date field

Shawn Heisey Tue, 20 May 2014 11:21:12 -0700

On 5/19/2014 2:05 PM, Bryan Bende wrote:
> Using Solr 4.6.1 and in my schema I have a date field storing the time a
> document was added to Solr.
> 
> I have a utility program which:
> - queries for all of the documents in the previous day sorted by create date
> - pages through the results keeping track of the unique document ids
> - compare the total number of unique doc ids to the numFound to see if it
> they match
> 
> I've noticed that if I use a page size larger than the number of documents
> for the given day (aka get everything in one query), then everything works
> as expected (results sorted correctly, unique doc ids size == numFound).
> 
> However, when I use a smaller page say, say 10 rows per page, I randomly
> see cases where the last document of a page will be duplicated as the first
> document of the next page, even though the "start" and "rows" parameters
> increased correctly. So I might see something like numFound=100 but unique
> doc ids is 97, and then I see three occurrences where the last doc id on a
> page was also the first on the next page.


This *sounds* like a situation where you have a sharded index that has
the same uniqueKey value in more than one shard.  This situation will
cause Solr to behave in a way that looks completely unpredictable.

There is no way for Solr to deal with this problem in a way that would
not consume large amounts of real time, CPU time, and RAM ... so Solr
does not do anything for dealing with this problem other than removing
duplicates from the actual results returned -- which is actually how the
discrepancies occur.

If you are absolutely sure that you are not running into the duplicate
document problem I described, then I am not sure what's going on.  It
might be related to the sort, and if that's true, adding a second sort
parameter using your uniqueKey field might be a solution.

Thanks,
Shawn

Re: Issue paging when sorting on a Date field

Reply via email to