[ http://jira.codehaus.org/browse/MINDEXER-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=261444#action_261444 ]
Tamás Cservenák commented on MINDEXER-14: ----------------------------------------- Agreed with proposal, but a bit of explanation: the (old) "flat" and "grouped" search results are implemented in a bit naive way, they store all the elements (hits) in memory, hence on large sets, they would "eat up" large memory amount (and probably OOM). This is why the IteratorSearchRequest/IteratorSearchResponse was introduced, it was relying on Lucene being smart, fetching Lucene documents "as needed" (while iterating over result) and not keeping more then few ArtifactInfo instances in memory. This not just lessens memory consumption, but lesses IO too (disk bashing), that happens _after_ lucene search was returned, when sequential hit fetches and ArtifactInfo record construction happens (with non-iterator searches). But alas, reviewing the code shows that IteratorSearches probably suffer from same problem as flat and grouped searchs: lucene result is limited to "top 1000" it seems. > FlatSearchResponse.totalHits = 1000 when there are in fact more > --------------------------------------------------------------- > > Key: MINDEXER-14 > URL: http://jira.codehaus.org/browse/MINDEXER-14 > Project: Maven Indexer > Issue Type: Bug > Affects Versions: 4.0.0 > Environment: Ubuntu, JDK 6; cause of: > https://netbeans.org/bugzilla/show_bug.cgi?id=197036 > Reporter: Jesse Glick > > I am running {{SearchEngine.searchFlatPaged}}. When there happen to be more > than 1000 hits in the result, it silently returns just 1000 instead. > Surprising behavior since I did not specify any hit limit. But this is > {{AbstractSearchRequest.UNDEFINED_HIT_LIMIT}}, OK. > Where it gets weirder is that if you set {{resultHitLimit}} to > {{UNDEFINED_HIT_LIMIT}}, you still get 1000 results, contradicting the > apparent meaning of "undefined". Further, if you set it to 999 or 1001, and > there are a few thousand results, you get an empty result and {{totalHits}} > of -1 or {{AbstractSearchResponse.LIMIT_EXCEEDED}} (which by the way looks > like a constant but is not final!), which is completely different than the > behavior for 1000. > And passing in {{Integer.MAX_VALUE}} to begin with does not work, since then > Lucene gets an {{OutOfMemoryError}} trying to allocate a ridiculously large > array or similar. > Expected behavior: by default, on an otherwise unconfigured search request, > the indexer would return all the hits, however many that is (allocating only > a proportional amount of memory). If I set {{resultHitLimit}} to some value, > then that will be used - I will either get a complete set of results, or > {{LIMIT_EXCEEDED}}. > Workaround: set {{resultHitLimit}} to 1001, then go into a loop retrying the > search; if -1 returned for {{totalHits}}, double the {{resultHitLimit}} and > try again. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira