Chetas Joshi <chetas.jo...@gmail.com> wrote:
> Thanks for the insights into the memory requirements. Looks like cursor
> approach is going to require a lot of memory for millions of documents.

Sorry, that is a premature conclusion from your observations.

> If I run a query that returns only 500K documents still keeping 100K docs
> per page, I don't see long GC pauses.

500K docs is far less than your worst-case 80*100K. You are not keeping the 
effective page size constant across your tests. You need to do that in order to 
conclude that it is the result set size that is the problem.

> So it is not really the number of rows per page but the overall number of
> docs.

It is the effective maximum number of document results handled at any point 
(the merger really) during the transaction. If your page size is 100K and you 
match 8M documents, then the maximum is 8M (as you indirectly calculated 
earlier). If you match 800M documents, the maximum is _still_ 8M.

(note: Okay, it is not just the maximum number of results as the internal 
structures for determining the result sets at the individual nodes are 
allocated from the page size. However, that does not affect the merging process)

The high number 8M might be the reason for your high GC activity. Effectively 2 
or 3 times that many tiny objects needs to be allocated, be alive at the same 
time, then de-allocated. A very short time after de-allocation, a new bunch 
needs to be allocated, so a guess is that the garbage collector has a hard time 
keeping up with this pattern. One strategy for coping is to allocate more 
memory and hope for the barrage to end, which would explain your jump in heap. 
But I'm in guess-land here.


Hopefully it is simple for you to turn the page size way down - to 10K or even 
1K. Why don't you try that, then see how it affects speed and memory 
requirements?

- Toke

Reply via email to