On Tue, 2018-07-31 at 11:12 +0200, Fette, Georg wrote: > I agree that receiving too much data in one request is bad. But I > was surprised that the query works with a lower but still very large > rows parameter and that there is a threshold at which it crashes the > server. > Furthermore, it seems that the reason for the crash is not the size > of the actual results because those are only 581.
Under the hood, a priority queue is initialized with room for min(#docs_in_index, rows) document markers. Furthermore that queue is initialized with placeholder objects (called sentinels). This structure becomes heavy when entering millions territory, both in terms of raw memory and in terms of GC overhead due to all the objects. You could have 1 hit and it would still hit OOM. It is possible to optimize that part of the Solr code for larger requests (see https://issues.apache.org/jira/browse/LUCENE-2127 and htt ps://issues.apache.org/jira/browse/LUCENE-6828), but that would just be a temporary fix until even larger indexes are queried. The deep paging or streaming exports that Andrea suggests scales indefinitely in terms of both documents in the index and documents in the result set. I would argue your OOM with small result sets and huge rows is a good thing: You encounter the problem immediately, instead of hitting it at some random time when a match-a-lot query is issued by a user. - Toke Eskildsen, Royal Danish Library