[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099437#comment-17099437 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- [~mgibney]- beefing up the randomized testing of the code paths involved with MultiAcc has uncovered 2 bugs - i committed some test changes showing these, but they can also be reproduced fairly easily with {{bin/solr -e techproducts}} ... What both cases have in common is: * limit==-1 to trigger single pass collection * 1 or more "non-sweepable" stats are being collected in addition to relatedness (so MultiAcc can't be completely optimized away) ---- Way to trigger First bug... {noformat} curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 'rows=0&q=inStock:true &back=*:* &fore=popularity:[10 TO *] &json.facet={ hobby : { type : terms, field : cat, limit : -1, facet : { min : "min(price)", skg : { type : func, func : "relatedness($fore,$back)", sweep_collection: false, } } } }' {noformat} ...if sweeping is explicitly disabled, then the "skg" stat completely drops out of the results, probably related to {{MultiAcc}} making some assumptions the {{SweepableAcc}} even when the call to {{foo.registerSweepingAccs(...)}} returned a non null result? {noformat} { "responseHeader":{ "status":0, "QTime":88, "params":{ "q":"inStock:true\n", "json.facet":"{\n hobby : {\n type : terms,\n field : cat,\n limit : -1,\n facet : {\n min : \"min(price)\",\n skg : { type : func,\n func : \"relatedness($fore,$back)\",\n sweep_collection: false,\n }\n } \n }\n}", "back":"*:* \n", "rows":"0", "fore":"popularity:[10 TO *]\n"}}, "response":{"numFound":17,"start":0,"docs":[] }, "facets":{ "count":17, "hobby":{ "buckets":[{ "val":"electronics", "count":8, "min":74.98999786376953}, { "val":"currency", "count":4}, { "val":"memory", "count":3, "min":74.98999786376953}, { "val":"hard drive", "count":2, "min":92.0}, ... {noformat} ---- Way to trigger Second bug... {noformat} curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 'rows=0&q=inStock:true &back=*:* &fore=popularity:[10 TO *] &json.facet={ hobby : { type : terms, field : cat, limit : -1, facet : { skg : { type : func, func : "relatedness($fore,$back)", sweep_collection: true, }, max : "max(price)" min : "min(price)" } } }' {noformat} ...when there are multiple non-sweeping stats in the MultiAcc, we get an AIOOBE (it's possible the order of the stats matters in input, i didn't dig very deep)... {noformat} 2020-05-05 00:25:21.371 ERROR (qtp1839168128-22) [ x:techproducts] o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 3 out of bounds for object array[2] at java.base/java.lang.System.arraycopy(Native Method) at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:221) at org.apache.solr.search.facet.FacetFieldProcessor$MultiAcc.registerSweepingAccs(FacetFieldProcessor.java:777) at org.apache.solr.search.facet.FacetFieldProcessor.registerSweepingAccIfSupportedByCollectAcc(FacetFieldProcessor.java:797) at org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.collectDocs(FacetFieldProcessorByArrayUIF.java:68) at org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:112) at org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:62) at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416) at org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:474) at org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:431) at org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64) at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416) at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:147) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:209) {noformat} > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org