Every cursorMark request goes through full results. Previous results just bypass scoring heap. So, reducing number of such request should reasonably reduce wall-clock time exporting all results.
On Wed, Jun 12, 2019 at 11:59 PM Markus Jelsma <markus.jel...@openindex.io> wrote: > Hello, > > One of our collections hates CursorMark, it really does. When under very > heavy load the nodes can occasionally consume GBs additional heap for no > clear reason immediately after downloading the entire corpus. > > Although the additional heap consumption is a separate problem that i hope > anyone can shed some light on, there is another strange behaviour i would > like to see explained. > > When under little load and with a batch size of just a few hundred, the > download speed creeps at at most 150 doc/s. But when i increase batch size > to absurd numbers such as 20k, the speed jumps to 2.5k docs/s. Changing > total time from days to just a few hours. > > We see the heap and the speed differences only really with one big > collection of millions of small documents. They are just query, click and > view logs with additional metadata fields such as time, digests, ranks, > dates, uids, view time etc. > > Is there someone here to shed some light on these vague subjects? > > Many thanks, > Markus > -- Sincerely yours Mikhail Khludnev