Re: CursorMark, batch size/speed

Mikhail Khludnev Wed, 12 Jun 2019 15:27:04 -0700

Every cursorMark request goes through full results. Previous results just
bypass scoring heap. So, reducing number of such request should reasonably
reduce wall-clock time exporting all results.


On Wed, Jun 12, 2019 at 11:59 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> One of our collections hates CursorMark, it really does. When under very
> heavy load the nodes can occasionally consume GBs additional heap for no
> clear reason immediately after downloading the entire corpus.
>
> Although the additional heap consumption is a separate problem that i hope
> anyone can shed some light on, there is another strange behaviour i would
> like to see explained.
>
> When under little load and with a batch size of just a few hundred, the
> download speed creeps at at most 150 doc/s. But when i increase batch size
> to absurd numbers such as 20k, the speed jumps to 2.5k docs/s. Changing
> total time from days to just a few hours.
>
> We see the heap and the speed differences only really with one big
> collection of millions of small documents. They are just query, click and
> view logs with additional metadata fields such as time, digests, ranks,
> dates, uids, view time etc.
>
> Is there someone here to shed some light on these vague subjects?
>
> Many thanks,
> Markus
>


-- 
Sincerely yours
Mikhail Khludnev

Re: CursorMark, batch size/speed

Reply via email to