Chetas Joshi <chetas.jo...@gmail.com> wrote: > Thanks for the insights into the memory requirements. Looks like cursor > approach is going to require a lot of memory for millions of documents.
Sorry, that is a premature conclusion from your observations. > If I run a query that returns only 500K documents still keeping 100K docs > per page, I don't see long GC pauses. 500K docs is far less than your worst-case 80*100K. You are not keeping the effective page size constant across your tests. You need to do that in order to conclude that it is the result set size that is the problem. > So it is not really the number of rows per page but the overall number of > docs. It is the effective maximum number of document results handled at any point (the merger really) during the transaction. If your page size is 100K and you match 8M documents, then the maximum is 8M (as you indirectly calculated earlier). If you match 800M documents, the maximum is _still_ 8M. (note: Okay, it is not just the maximum number of results as the internal structures for determining the result sets at the individual nodes are allocated from the page size. However, that does not affect the merging process) The high number 8M might be the reason for your high GC activity. Effectively 2 or 3 times that many tiny objects needs to be allocated, be alive at the same time, then de-allocated. A very short time after de-allocation, a new bunch needs to be allocated, so a guess is that the garbage collector has a hard time keeping up with this pattern. One strategy for coping is to allocate more memory and hope for the barrage to end, which would explain your jump in heap. But I'm in guess-land here. Hopefully it is simple for you to turn the page size way down - to 10K or even 1K. Why don't you try that, then see how it affects speed and memory requirements? - Toke