OK, that makes sense then.

I don't think we've mentioned streaming as an alternative. It has some
restrictions (it can only export docValues), and frankly I don't
really remember how much of it was in 5.5 so you'll have to check.

Streaming is designed exactly to, well, stream the entire result set
out. There's some setup cost, so your use case where most cases have
not have all that many hits the setup may be too onerous but I thought
I'd mention it.

Best,
Erick

On Mon, Jul 2, 2018 at 5:14 AM, David Frese <david.fr...@active-group.de> wrote:
> Am 29.06.18 um 17:42 schrieb Erick Erickson:
>>
>> bq. It basically cuts down the search time in half in the usual case
>> for us, so it's an important 'feature'.
>>
>> Wait. You mean that the "extra" call to get back 0 rows doubles your
>> query time? That's surprising, tell us more.
>>
>> How many times does your "usual" use case call using CursorMark? My
>> off-the-cuff explanation would be that
>> you usually get all the rows in the first call.
>>
>> CursorMark is intended to help with the "deep paging" problem, i.e.
>> where start=some_big_number to allow
>> returning large results sets in chunks, say through 10s of K rows.
>> Part of our puzzlement is that in that
>> case the overhead of the last call is minuscule compared to the rest.
>>
>> There's no reason that it can't be used for small result sets, those
>> are just usually handled by setting the
>> start parameter. Up through, say, 1,000 or so the extra overhead is
>> pretty unnoticeable. So my head was
>> in the "what's the problem with 1 extra call after making the first 50?".
>>
>> OTOH, if you make 100 successive calls to search with the CursorMark
>> and call 101 takes as long as
>> the previous 100, something's horribly wrong.
>
>
> Hi,
>
> I use it in a server application where I need to process all results in
> every case, which can be between 0 and 100's of thousands. We use pagination
> to have a boundary on the required memory on "our" side by processing
> page-after-page.
>
> Most cases will fit into one page though - a few hundred results. Our Solr
> cluster takes about 5 to 10 seconds (*) for the first 'filled' page _and_
> about the _same time_ again for the second empty page. So if I have the
> guarantee that the second page is always empty, that helps a lot.
>
> Solr 5.5 that is, btw.
>
> (*) If it could be faster then 5 seconds is a different issue. But the query
> is quite complex with a lot of AND/OR and BlockJoins too, and I have no idea
> if memory is large enough to hold the indices and things like that. Not
> really optimized yet.
>
>
> David.
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber

Reply via email to