[
https://issues.apache.org/jira/browse/SOLR-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080401#comment-17080401
]
Shalin Shekhar Mangar commented on SOLR-14365:
----------------------------------------------
I just saw this test failure on master which seems related and is reproducible:
{code}
[junit4] 2> NOTE: reproduce with: ant test
-Dtestcase=TestRandomCollapseQParserPlugin
-Dtests.method=testRandomCollpaseWithSort -Dtests.seed=20C0F4D7CBA81876
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=lv
-Dtests.timezone=America/St_Johns -Dtests.asserts=true
-Dtests.file.encoding=ANSI_X3.4-1968
[junit4] FAILURE 7.30s J4 |
TestRandomCollapseQParserPlugin.testRandomCollpaseWithSort <<<
[junit4] > Throwable #1: java.lang.AssertionError: collapseKey too big --
need to grow array?
[junit4] > at
__randomizedtesting.SeedInfo.seed([20C0F4D7CBA81876:257D871EE0002B85]:0)
[junit4] > at
org.apache.solr.search.CollapsingQParserPlugin$SortFieldsCompare.setGroupValues(CollapsingQParserPlugin.java:2702)
[junit4] > at
org.apache.solr.search.CollapsingQParserPlugin$IntSortSpecStrategy.collapse(CollapsingQParserPlugin.java:2544)
[junit4] > at
org.apache.solr.search.CollapsingQParserPlugin$IntFieldValueCollector.collect(CollapsingQParserPlugin.java:1223)
[junit4] > at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:254)
[junit4] > at
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:205)
[junit4] > at
org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
[junit4] > at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:739)
[junit4] > at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:526)
[junit4] > at
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:202)
[junit4] > at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1651)
[junit4] > at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1469)
[junit4] > at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:584)
[junit4] > at
org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1487)
[junit4] > at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:399)
[junit4] > at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
[junit4] > at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:209)
[junit4] > at
org.apache.solr.core.SolrCore.execute(SolrCore.java:2565)
[junit4] > at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
[junit4] > at
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:207)
[junit4] > at
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003)
[junit4] > at
org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1018)
[junit4] > at
org.apache.solr.search.TestRandomCollapseQParserPlugin.testRandomCollpaseWithSort(TestRandomCollapseQParserPlugin.java:158)
[junit4] > at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
{code}
> CollapsingQParser - Avoiding always allocate int[] and float[] with size
> equals to number of unique values
> ----------------------------------------------------------------------------------------------------------
>
> Key: SOLR-14365
> URL: https://issues.apache.org/jira/browse/SOLR-14365
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 8.4.1
> Reporter: Cao Manh Dat
> Assignee: Cao Manh Dat
> Priority: Major
> Attachments: SOLR-14365.patch
>
> Time Spent: 8h 10m
> Remaining Estimate: 0h
>
> Since Collapsing is a PostFilter, documents reach Collapsing must match with
> all filters and queries, so the number of documents Collapsing need to
> collect/compute score is a small fraction of the total number documents in
> the index. So why do we need to always consume the memory (for int[] and
> float[] array) for all unique values of the collapsed field? If the number of
> unique values of the collapsed field found in the documents that match
> queries and filters is 300 then we only need int[] and float[] array with
> size of 300 and not 1.2 million in size. However, we don't know which value
> of the collapsed field will show up in the results so we cannot use a smaller
> array.
> The easy fix for this problem is using as much as we need by using IntIntMap
> and IntFloatMap that hold primitives and are much more space efficient than
> the Java HashMap. These maps can be slower (10x or 20x) than plain int[] and
> float[] if matched documents is large (almost all documents matched queries
> and other filters). But our belief is that does not happen that frequently
> (how frequently do we run collapsing on the entire index?).
> For this issue I propose adding 2 methods for collapsing which is
> * array : which is current implementation
> * hash : which is new approach and will be default method
> later we can add another method {{smart}} which is automatically pick method
> based on comparision between {{number of docs matched queries and filters}}
> and {{number of unique values of the field}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]