[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17127197#comment-17127197 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- Hey Michael - a couple of questions/thoughts... ---- Re: GIT:61befab60696dc4267ab9c96e36bc266c93a2fc3 This is definitely awkward and tricky, and not at all something i'd really considered in SOLR-14467 when advocating for the introduction of {{SlotContext.isAllBucket()}}. I want to think about this some more... at the moment 2 straw man ideas occur to me, but they feel like baby-bear/papa-bear solutions (too little / too much)... * (baby-bear) - disable sweeping if allBuckets is used. * (papa-bear) - add a SlotContext param to {{CountSlotAcc.incrementCount()}} and rework {{SKGSlotAcc.registerSweepingAccs()}} use custom {{CountSlotAcc}} instances that ~ "skip & track" the allBuckets slot in some way that SweepSKGSlotAcc can later ignore? maybe? As much as a dislike the sort of hackish solution you put in place with the nocommits, it's simple and by comparison both of those "cleaner" ideas seem too heavy handed, in either direction. Although ... An idea that just occured to me as i type this: would it be viable to have {{SKGSlotAcc.registerSweepingAccs()}} actually return the new {{SweepSKGSlotAcc}} instance it registers, and modify the {{collect}} methods of that class to be No-Ops except for noting & tracking the allBucketsSlot? (although as i say that i, again, feel like it may be a heavy handed change to what you have for such a niche special circumstance) ---- Re: a6b1c60e61563535d7ba67c17d74f2bada6f80a2 What motivated this change? I don't think this is a good idea, largely because of the answer to the question you asked in your commit message... {quote}what was the point of the "if countAcc == null"? {quote} ...this is explained in the javadocs of the methods you modified (and in the class level javadocs) {quote}By default, this class assumes subclasses can support sweeping collection unless subclasses initialize <code>countAcc</code> directly in their constructors. {quote} {quote}This impl first initializes <code>countAcc</code> as a \{@link SweepingCountSlotAcc} if null. {quote} {{Although on apache/master it's un-explained in this particular class's/method's javadocs, the general convention of why processors use the " if (null == countAcc)}}{{{ countAcc = .. } }}" pattern is touched on in a comment in FacetFieldProcessor: {code:java} // allow a custom count acc to be used if (countAcc == null) { countAcc = new SlotAcc.CountSlotArrAcc(fcontext, slotCount); {code} ...the fact that FacetFieldProcessor doesn't hard code an assumption about how countAcc works is one of the reasons the current SweepingCountSlotAcc approach works in FacetFieldProcessorByArray. I don't know if it's a good idea to *force* (future) subclasses to use sweeping by wrapping whatever "original" {{countAcc}} they init in a SweepingCountSlotAcc ... if you think there's a reason why it might make sense for a future subclass to use a SweepingCountSlotAcc that wraps an arbitrary CountSlotAcc then i'm all ears, but I'd rather the design _allow_ subclasses to override FacetFieldProcessorByArray's behavior by letting them init a SweepingCountSlotAcc that wraps a CountSlotAcc directly, rather then by _forcing_ the subclass to provide a CountSlotAcc that "gets wrapped" by their superclass. ---- I pushed a small change to replace TestCloudJSONFacetSKGSweep with equivilent improvements to the TestCloudJSONFacetSKGEquiv from upstream in the hopes of doing some heavy test beasting of all the json.facet tests on your branch – but almost immediately ran into a failure from TestCloudJSONFacetSKG... {noformat} [junit4] 2> 5919 ERROR (qtp1172942992-67) [n:127.0.0.1:41317_solr c:org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection s:shard2 r:core_node4 x:org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection_shard2_replica_n2 ] o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: Index 8 out of bounds for length 1 [junit4] 2> at org.apache.solr.search.facet.SlotAcc$CountSlotArrAcc.incrementCount(SlotAcc.java:841) [junit4] 2> at org.apache.solr.search.facet.UnInvertedField.collectDocsGeneric(UnInvertedField.java:462) [junit4] 2> at org.apache.solr.search.facet.UnInvertedField.collectDocs(UnInvertedField.java:437) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.collectDocs(FacetFieldProcessorByArrayUIF.java:70) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:121) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:89) [junit4] 2> at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410) [junit4] 2> at org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477) [junit4] 2> at org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessor.refineBucket(FacetFieldProcessor.java:1001) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessor.refineFacets(FacetFieldProcessor.java:966) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:97) [junit4] 2> at org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:89) [junit4] 2> at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410) [junit4] 2> at org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:477) [junit4] 2> at org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:433) [junit4] 2> at org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:65) [junit4] 2> at org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:410) [junit4] 2> at org.apache.solr.search.facet.FacetModule.process(FacetModule.java:150) [junit4] 2> at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:331) [junit4] 2> at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:210) ... [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudJSONFacetSKG -Dtests.method=testRandom -Dtests.seed=AF0A69795F7EC65 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=fi-FI -Dtests.timezone=America/Atikokan -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.43s | TestCloudJSONFacetSKG.testRandom <<< [junit4] > Throwable #1: java.lang.RuntimeException: init query failed: {main(q=(field_12_multi_ss:58+OR+field_9_multi_ss:57+OR+field_4_multi_sds:29+OR+field_12_multi_ss:60+OR+field_2_multi_sdsS:49+OR+field_13_multi_sds:47)&json.facet={"processEmpty":true,"facet_1":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)","facet_2":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)"},"limit":25,"overrequest":0,"type":"terms","field":"field_14_multi_idsS","refine":true,"domain":{"query":"*:*"}},"facet_3":{"facet":{"processEmpty":true,"skg":"relatedness($fore,$back)"},"limit":27,"overrequest":-1,"allBuckets":true,"perSeg":true,"type":"terms","field":"field_6_multi_ss","refine":true,"domain":{"query":"*:*"}}},"sort":"index+desc","limit":62,"overrequest":919339441,"allBuckets":true,"type":"terms","field":"field_11_multi_sdsS","refine":true,"domain":{"query":"*:*"}}}),extra(rows=0&fore=(field_14_multi_sdsS:64+OR+field_0_multi_ss:12+OR+field_2_multi_sdsS:50+OR+field_2_multi_sdsS:20+OR+field_5_multi_sdsS:42+OR+field_2_multi_sdsS:36+OR+field_9_multi_ss:19)&back=(field_2_multi_sdsS:49+OR+field_9_multi_ss:16+OR+field_10_multi_sds:28+OR+field_0_multi_ss:49+OR+field_11_multi_sdsS:29))}: Error from server at https://127.0.0.1:41317/solr/org.apache.solr.search.facet.TestCloudJSONFacetSKG_collection: Error from server at null: Index 8 out of bounds for length 1 [junit4] > at __randomizedtesting.SeedInfo.seed([AF0A69795F7EC65:78BC839824975A16]:0) [junit4] > at org.apache.solr.search.facet.TestCloudJSONFacetSKG.assertFacetSKGsAreCorrect(TestCloudJSONFacetSKG.java:382) [junit4] > at org.apache.solr.search.facet.TestCloudJSONFacetSKG.testRandom(TestCloudJSONFacetSKG.java:319) {noformat} ...I haven't dug into this yet, but it's the same "refining partial sub-facet when using allBuckets" code path that you fixed in SOLR-14520, and this seed does *not* reproduce on upstream master (nor did i see anything like it when beasting json.faceting tests before committing SOLR-14520) suggesting that this is something related to sweeping. Allthough ... even when i tried manually hacking {{DEFAULT_SWEEP_COLLECTION = false;}} locally it still reproduced? ... so it seems like it must be something else on the branch? Ohhh... actual: i think this is related of the a6b1c60e61563535d7ba67c17d74f2bada6f80a2 change i asked about above? ... HA! ... yeah: the refinement code you fixed in SOLR-14520 now looks like... {code:java} // count is irrelevant, but hardcoded in collect(...), so intercept/mask normal counts. // Set here to prevent createAccs(...) from creating a 1-slot countAcc that will fail with AIOOBE countAcc = SlotAcc.DEV_NULL_SLOT_ACC; createAccs(nDocs, 1); {code} ...but on this branch you modified {{createAccs()}} to always wrap the {{countAcc}} in a {{SweepingCountSlotAcc}} – which does not ignore all the slots it's asked to collect. .... I briefly experimented with revering a6b1c60e61563535d7ba67c17d74f2bada6f80a2 locally, but that just changed the nature of the failure to a class cast exception since methods like {{FacetFieldProcessorByArrayUIF.collectDocs()}} have hard coded casts like: {{registerSweepingAccIfSupportedByCollectAcc((SweepingCountSlotAcc) countAcc);}} ... which makes me think either we need {{SweepingCountSlotAcc}} to be smarter about the possibility it's wrapping something like {{DEV_NULL_SLOT_ACC}} or we need to change the API of {{registerSweepingAccIfSupportedByCollectAcc(...)}} to take in {{CountSlotAcc}} and do an instanceof check internally (i had initially assumed we wouldn't need it since only subclasses that support sweeping would need/want to call that method, but here's an example of a subclass that normally wants to use sweeping, except for the code paths where it's just refining some specific buckets. WDYT? > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org