Re: SolrCloud & Paging on large indexes

Erick Erickson Mon, 22 Dec 2014 07:30:48 -0800

Have you read Hossman's blog here?
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl

And how to use it here?
http://wiki.apache.org/solr/CommonQueryParameters#Deep_paging_with_cursorMark

Because if you're trying this and _still_ getting bad performance we
need to know.

Bram:
One minor pedantic clarification.. The first round-trip only returns
the id and sort criteria (score by default), not the whole document,
although the effect is the same, as you page N into the corpus, the
default implementation returns N * (pageNum + 1) entries. Even worse,
each node itself has to _sort_ that many entries.... Then a second
call is made to get the page-worth of docs...

About telling your users not to page past N... up to you, especially
if the deep paging stuff works as advertised (and I have no reason to
believe it doesn't).

That said, though, its pretty easy to argue that the 500th page is
pretty useless, nobody will ever hit the "next page" button 499 times.

The different use-case, though, is when people want to return the
entire corpus for whatever reason and _must_ page through to the
end....

Best,
Erick

On Mon, Dec 22, 2014 at 5:03 AM, Bram Van Dam <bram.van...@intix.eu> wrote:
> On 12/22/2014 12:47 PM, heaven wrote:
>>
>> I have a very bad experience with pagination on collections larger than a
>> few
>> millions of documents. Pagination becomes very and very slow. Just tried
>> to
>> switch to page 76662 and it took almost 30 seconds.
>
>
> Yeah that's pretty much my experience, and I think SolrCloud would only
> exacerbate the problem (due to increased complexity of sorting). If there's
> no silver bullet to be found, I guess I'll just have to disable paging on
> large data sets -- which is fine, really, who the hell browses through 50
> billion documents anyway? That's what search is for, right?
>
> Thx,
>
>  - Bram
>

Re: SolrCloud & Paging on large indexes

Reply via email to