AW: Navigation/Paging

Sebastian Riemer Wed, 14 Mar 2018 00:56:45 -0700

Dear Shawn,

thank you so much for taking the time for this detailed answer! It helps me 
very much and I'm very grateful.

1) As you've suggested, we already load the data for detail pages from our 
relational db, just using the documentId from Solr to look it up. 
2) Our index size won't ever reach millions of records as it is common in other 
users' scenarios. Having 60000 Documents as search result is currently the 
maximum as single client can ever get when not specifying _any_ filter 
criterias. 

-> I'll have to think about whether to prevent the user from deep paging into 
big search results, or just take a possible performance hit (as you've pointed 
out, usually a typical user won't page further than a couple of pages).  The 
same goes for jumping to the very end of a search result. Currently I kind of 
like this feature so I'll try to keep it in.

For retrieving the previous/next documentId if I'm on the start/end of the 
current page, I'll use the approach you (and Rick) suggested -thanks!

Best wishes,

Sebastian

-----Ursprüngliche Nachricht-----
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Mittwoch, 14. März 2018 00:19
An: solr-user@lucene.apache.org
Betreff: Re: Navigation/Paging

On 3/13/2018 10:26 AM, Sebastian Riemer wrote:
> However, now we want to introduce a similar navigation in our detail views, 
> where only ever one document is displayed. Again, the navigation bar looks 
> like this:
>
> << First   < Prev   1 - 15 of 62181   Next 
> ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   
> Last 
> >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
> But now, Prev / Next shall open up the previous / next _document_ instead of 
> the next page. The same goes for First and Last, it shall open the first / 
> last _document_ not the page.
>
> Our first approach to this was to simply add the param "fl=id" so we only get 
> the IDs of documents and set page size to ALL (i.e. no restriction on param 
> "rows"). That way, it was easy to extract the current document id from the 
> result list, and check which id was preceding and succeeding the current id, 
> as well as getting the very first id and the very last id, in order to render 
> the navigation bar.
>
> This lead to solr being heavily under load since it must load 62181 documents 
> (in this example) in order to return the ids. I somehow thought this would be 
> easy for solr to do, but it isn't.

This will indeed be very slow.  And you only have 62181 documents in your 
result set, which is pretty easy for Solr to handle.  For a search that has 100 
million results, this approach is *impossible*.  I do have searches like this 
on my index, and my index is not all that big compared to some of the indexes 
that the community has built.

> Our second approach was, to simply keep the same value for params "start" and 
> "rows" since the user is always selecting a document from the list - thus the 
> selected document already is within the page. However, the edge cases are, 
> the selected document is the very first on the page or the very last one, 
> thus the previous or next document id is not within the page result from solr 
> -> I guess this we could handle by simply checking and sending a second query 
> where the param "start" would be adjusted accordingly.

Detail pages often include information that you do not want to store in Solr.  
A well-tuned Solr install will have responses that contain everything that the 
application needs to build a search result grid, but for really detailed 
information, the application should probably be using the id information 
received from Solr to go to the main data repository and retrieve full details.

Additionally, you should not allow the user to navigate to the last page or to 
navigate to the last document, or even a page/document anywhere near the end of 
the resultset.  The reason for this is that really high start values are a 
serious performance killer.  61K is definitely a start value high enough to see 
performance drops.  If the user tries to page too deeply into results, your 
application should simply refuse to go any further.  For comparison purposes -- 
the last time I checked how deeply Google would let me go into a search result, 
I could get to page 39, but no further.  The number of results for my search 
was MILLIONS, but Google wouldn't let me view them all.  The performance issues 
for deep paging are universal for search engines, especially when it is 
possible to jump to an arbitrary page number.

I recommend limiting how many results a user can page through to about
5000 or 10000.  If there are 50 results per page, this allows them to get to at 
least page 99.  In general, most users of search engines will never go deeper 
than about page 3.  There are some kinds of applications where a typical user 
might visit the first few dozen pages ... but anything deeper is NOT common.  
If you have an atypical user, they are probably prepared for large page numbers 
to take a lot longer to load. The main reason you should be limiting how deep 
users can go is that when one user is going thousands of documents into a 
result set, performance of the other queries on the system CAN drop 
dramatically.

> However I would not know how to retrieve the id of the very first 
> document and the very last document (except for executing separate 
> queries with I guess start=0, rows=1 and start=62181 and rows=1)

When you display a page of results, your application already has N document IDs 
received from Solr to display a page of results.  Using that information, you 
can navigate through the documents one at a time. Then if you reach the end of 
what you have on that page, you can issue another query for the next page or 
the previous page.  If you are restricting how deep a user can go, the 
performance of this approach should be pretty good.

> For any query and a documentId (of which it is known it is within the query 
> result), what is a simple and efficient enough way, to get the following 
> navigational information:
>
> -          Previous document Id
>
> -          Next document id
>
> -          First document id
>
> -          Last document id

Having this information available is nearly impossible.

The values for each document will depend on the sort you use.  Change the sort, 
and all the values will be wrong.  And if you delete documents or add 
documents, those values will likely change, and the values for an individual 
document could change several times per second.  Solr cannot automatically 
provide this information, and it is pretty much impossible to have accurate and 
up to date information if you calculate it at index time and add it yourself.

Side note:  When sorting by relevance score, which is the default sort order, 
changing the query also changes the sort.

----

Note that there *is* a Solr solution for the performance problems of deep 
paging ... but cursorMark (the name of the feature) does not support jumping 
directly to an arbitrary page number.  If you want page
25000 when using cursorMark, you have to retrieve the first 24999 pages before 
you will have the cursor value for page 25000.  But once you HAVE that value, 
retrieving page 25000 will be just as fast as page 1, which is definitely not 
the case when using start/rows to get pages.

https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Newer versions of Solr also have things like the export handler and streaming 
expressions, which are designed to provide REALLY large result sets without 
putting major load on the server.  Very large result sets do still take a lot 
of TIME, so they're only usable for offline activities like research and data 
mining, not live usage in an application.  But they won't kill the server when 
they are used.  I do not know how to use these features, but information is 
available in the Solr Reference Guide.

Thanks,
Shawn

AW: Navigation/Paging

Reply via email to