[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073878#comment-17073878 ]
Michael Gibney commented on SOLR-13132: --------------------------------------- Thanks, this is all quite clear and helpful! It's going to take me a bit more time to respond in full, but I think it makes sense to respond initially to the question about the "{{ // XXXXXX temporary!}}" comment (and associated code). Upon further reflection, that comment is a little bit of all-of-the-above: definitely old, possibly a mistake, misleading, and in any case warrants further discussion. {{SweepSKGSlotAcc}} is indeed not a universal replacement for {{SKGSlotAcc}}. The choice to use {{SKGSlotAcc}} for slot cardinality of 1 is really just an indirect way of determining whether the SlotAcc may be used to process a refinement request (for which sweep accumulation is not applicable -- maybe there's a better/more direct way to determine this?). Given that we need to continue to maintain {{SKGSlotAcc}} to support refinement requests, it's reasonable to continue to support it also as a configurable option for initial-pass faceting (with sufficient filterCache capacity, it could be more efficient for low-cardinality fields -- analogous to {{enum}} method terms faceting). The "with sufficient filterCache capacity" here is crucial, though: if {{SKGSlotAcc}} is supported for initial-pass faceting, the potential for filterCache thrashing is significant (esp. over high-cardinality fields). The {{cacheDf}} logic was initially introduced to mitigate filterCache thrashing over high-cardinality fields, prior to the implementation of sweep collection. A question that occurs to me now is whether it makes sense to remove the {{cacheDf}} logic, and always consult the filterCache (this would require users to take care to configure a sufficiently large filterCache); or leave that logic in place, analogous to {{facet.enum.cache.minDf}}/{{cacheDf}} as used for enum method term faceting _per se_. (As I write this, I think I'm convincing myself in favor of leaving it all in, and removing "{{ // XXXXXX temporary!}}" comment). > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org