[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144550#comment-17144550 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- I pushed a few commits to your branch containing some small cleanups/tweaks i noticed while reviewing the code, and making the changes i suggested in my last comment regarding registerSweepingAccIfSupportedByCollectAcc. (Please let me know what you think of these and if you have any concerns) I'n generally I feel pretty good about the branch in it's current state, i think there are really just 2 outstanding questions: * the "what is the allBucketSlotNum when sweeping" problem ** i polished up your existing approach to make it a little more robust and future proof ** this approach has grown on me and I think the trade off of how "hackish" it feels is appropriate given the esoteric-ness of the situation and the "cost" of revamping various APIs to solve it any differently *** if we relatedness() was more meaningful in the context of the allBuckets bucket it might be a differnet story * filterCaching of relatedness() TermQueries in the "non-sweep" situation ** as i mentioned before, i really don't think we should be lumping this change in with adding sweeping... *** https://issues.apache.org/jira/browse/SOLR-13132?focusedCommentId=17105821&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17105821 ** i still believe we should remove these changes from this branch, an revisit this as an independent change in SOLR-13108. *** particularly as a hedge against hte risk that the sweeping changes introduce some bug we haven't thought of: people can always work around by setting {{sweep_collection: false}} to bypass and get the existing behavior, but if we _also_ break the existing behavior via a caching change... ugh. *** the fact that you still have concerns about the approach being taken, and questions about wether using DocsEnumState here would work (i haven't thought about it) just solidifies that opinion – let's not let the sweeping changes get held up / bogged down any further by questions of caching i nthe non-sweeping code paths. > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch, SOLR-13132_testSweep.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org