I worked on something similar a couple of years ago, but didn’t continue work on it in the end.
I've included the text of my original mail. If you're interested, I could try to find the sources I was working on at the time Luc In Solr 4.7 an exciting new feature was added that allows one to page through a complete result set without having to worry about missing or double results at page boundaries while keeping resource utilization low. I have a common use case that has similar performance and consistency problems that could be solved by extending the way CursorMarks work: A. The user executes a search and obtains thousands of results of which he sees the first 'page'. Apart from scrolling through the list he also has a scrollbar (or paging controls) to jump to anywhere in the list. B. The user uses the scrollbar to jump to an arbitrary place in the list. C. The user scrolls down a bit (but past the current 'page') to find what he's looking for. D. The user realizes he's too far down and scrolls up a bit again (but before the current 'page' again...) (Yes, I know that users should be educated to refine their search, but unfortunately, if the client for which the application is developed specifies that it should be possible to use it this way...) For the moment this is implemented by using the start/rows parameters to get the appropriate ‘page’ and this has the disadvantages that cursorMark solves: - Solr (actually I use Lucene directly, but that doesn’t matter here) needs to store *all* documents up to document (start+rows) to be able to returns just the rows requested. Except for step A (where start==0), this may be a huge performance hit. - If the index is modified concurrently (especially when using NRT), jumping to the next/previous page can cause documents being repeated or skipped at page boundaries (as explained in https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results) Here's the way an extension to the cursorMark system could solve the problem: A. Solr/Lucene executes the search and returns the total number of hits and the requested number of top documents. start=0, rows=n, cursorMark=* B. start=x, rows=n, cursorMark=*: Here Solr should allow combining both start!=0 and cursorMark=*. It should execute a normal request using start=x and rows=n and add two cursorMarks : on corresponding to the sort values of the first document and one corresponding to the sort values of the last document C. Use cursorMark to get the 'next' pages: This is the same way cursorMark works for the moment: the user passes the cursorMark corresponding to the sort values of the last document. D. Use the cursorMark corresponding to the sort values of the first document to get the 'previous' pages. a In terms of implementing these changes, I've been looking at the source code and already did the easy ones :) - If a cursorMark is passed (either cursorMark=* or a 'real' value), Solr should return two cursorMarks in the result: nextCursorMark as before and prevCursorMark corresponding to the sort values of the first document. Done. - start!=0 and cursorMark=* should no longer be mutually exclusive (but start!=0 and cursorMark!=* should). Done. - When returning a result using a cursorMark, the start value returned should correspond to the actual position of the first document in the full result set. For the next page, this equals to the number of documents skipped during processing, but unfortunately I didn't see a way (yet) to pass that information along everywhere. This start value, together with the (possibly changed) numFound value can be used in the GUI to adjust the position of the scrollbar or the paging controls accordingly without having to estimate it. - Implementing reverse paging could actually be easier than it sounds by internally reversing the sort order (really reversing, not just reversing ASC/DESC!) using the cursor as in the normal case and afterwards reversing the obtained list of documents. I've updated PagingFieldCollector in TopFieldCollector.java by negating the values in reverseMul and overriding topDocs(start, howMany), but have to check everywhere partial results are merged as well... - Implement a corresponding amount of test cases for the paging up case as that exist for the paging down case (help! :) While working on the code, I thought of another use case as well: refreshing the current page: Instead of passing the same start value again, the prevCursorMark could be passed, but with a hint that the document on or after this cursorMark should be returned. Which brings me to the question of how to specify the new behavior to Solr without affecting the current behavior. I propose that prevCursorMark and nextCursorMark simply encode the sort values for the first and last document (as nextCursorMark does now) and that a simple prefix is used when cursorMark should be used differently: ">": documents after the cursor position: use with nextCursorMark to get the next page of results ">=": documents after or on the cursor position: use with prevCursorMark to refresh the same page keeping the same sort position for the first document "<": documents before the cursor position: use with prevCursorMark to get the previous page of results "<=": documents before or on the cursor position: use with nextCursorMark to get the same page keeping the same sort position for the last document (for completeness, useful?) So if prevCursorMark was "ABC" and nextCursorMark was "DEF", - "<ABC" would return the previous page - ">DEF" or "DEF" would return the next page - ">=ABC" would return the same page (but with 'fresh' values/documents), keeping 'visual' position the same I'd appreciate any comments on this or if anyone else has already started work on similar changes. In the meantime I'll continue working on what I have and check how I can make my changes available (through a patch attached to a new issue in Jira?) Luc Vanlerberghe -----Original Message----- From: Steve Rowe [mailto:sar...@gmail.com] Sent: dinsdag 22 maart 2016 16:37 To: solr-user@lucene.apache.org Subject: [Possibly spoofed] Re: Paging and cursorMark Hi Tom, There is an outstanding JIRA issue to directly support what you want (with a patch even!) but no work on it recently: <https://issues.apache.org/jira/browse/SOLR-6635>. If you’re so inclined, please pitch in: bring the patch up-to-date, test it, contribute improvements, etc. -- Steve www.lucidworks.com > On Mar 22, 2016, at 10:27 AM, Tom Evans <tevans...@googlemail.com> wrote: > > Hi all > > With Solr 5.5.0, we're trying to improve our paging performance. When > we are delivering results using infinite scrolling, cursorMark is > perfectly fine - one page is followed by the next. However, we also > offer traditional paging of results, and this is where it gets a > little tricky. > > Say we have 10 results per page, and a user wants to jump from page 1 > to page 20, and then wants to view page 21, there doesn't seem to be a > simple way to get the nextCursorMark. We can make an inefficient > request for page 20 (start=190, rows=10), but we cannot give that > request a cursorMark=* as it contains start=190. > > Consequently, if the user clicks to page 21, we have to continue along > using start=200, as we have no cursorMark. The only way I can see to > get a cursorMark at that point is to omit the start=200, and instead > say rows=210, and ignore the first 200 results on the client side. > Obviously, this gets more and more inefficient the deeper we page - I > know that internally to Solr, using start=200&rows=10 has to do the > same work as rows=210, but less data is sent over the wire to the > client. > > As I understand it, the cursorMark is a hash of the sort values of the > last document returned, so I don't really see why it is forbidden to > specify start=190&rows=10&cursorMark=* - why is it not possible to > calculate the nextCursorMark from the last document returned? > > I was also thinking a possible temporary workaround would be to > request start=190&rows=10, note the last document returned, and then > make a subsequent query for q=id:"<last doc id>"&rows=1&cursorMark=*. > This seems to work, but means an extra Solr query for no real reason. > Is there any other problem to doing this? > > Is there some other simple trick I am missing that we can use to get > both the page of results we want and a nextCursorMark for the > subsequent page? > > Cheers > > Tom