[ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051692#comment-17051692 ]
Chris M. Hostetter commented on SOLR-13132: ------------------------------------------- {quote}I will separate out the facet cache as an independent PR associated with SOLR-13807. ... it might be reasonable to treat it as a dependency of this issue. {quote} Awesome ... i really think (hope) having distinct PRs/patches will make it easier for folks to review & digest. Which ever dependency ordering you think makes sense from an "understanding the code" and "building on existing work" perspective is fine – SOLR-13132 can depend on SOLR-13807, or vice/versa if you think it makes the change more clear. {quote}Among the points I hope to revisit/clarify with testing: regarding QueryResultKey, ... I think (queryResultsCache should always have a sort specified? ... {quote} Generally speaking in Solr code if a "Sort" is null it means "use the default sort of 'score desc'" ... you can actually see that logic applied inside QueryResultKey to ensure they are treated quivilently regardless of what Sort was passed to the constructor. It's possible that by the time QueryResultKeys are constructed all nulls have already been replaced with that default, but if that's the provably the case then i would argue that (independent of adding a facet cache) we should harden/simplify QueryResultKeys to remove that null equivalence logic and throw an NPE if someone tries to specify a null Sort – which brings us back to the broader topic of "it would make more sense to refactor the bits you need and not directly compose a QueryResultKey inside of TermFacetCache key" {quote}I have some questions about exactly how to present the facet cache PR ... but I'll ask those in a more deliberate way over at SOLR-13807. {quote} Good plan ... i have a "stub testing" patch you might find useful as a starting point that i'll attach over there as well. > Improve JSON "terms" facet performance when sorted by relatedness > ------------------------------------------------------------------ > > Key: SOLR-13132 > URL: https://issues.apache.org/jira/browse/SOLR-13132 > Project: Solr > Issue Type: Improvement > Components: Facet Module > Affects Versions: 7.4, master (9.0) > Reporter: Michael Gibney > Priority: Major > Attachments: SOLR-13132-with-cache-01.patch, > SOLR-13132-with-cache.patch, SOLR-13132.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate > {{relatedness}} for every term. > The current implementation uses a standard uninverted approach (either > {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain > base docSet, and then uses that initial pass as a pre-filter for a > second-pass, inverted approach of fetching docSets for each relevant term > (i.e., {{count > minCount}}?) and calculating intersection size of those sets > with the domain base docSet. > Over high-cardinality fields, the overhead of per-term docSet creation and > set intersection operations increases request latency to the point where > relatedness sort may not be usable in practice (for my use case, even after > applying the patch for SOLR-13108, for a field with ~220k unique terms per > core, QTime for high-cardinality domain docSets were, e.g.: cardinality > 1816684=9000ms, cardinality 5032902=18000ms). > The attached patch brings the above example QTimes down to a manageable > ~300ms and ~250ms respectively. The approach calculates uninverted facet > counts over domain base, foreground, and background docSets in parallel in a > single pass. This allows us to take advantage of the efficiencies built into > the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids > the per-term docSet creation and set intersection overhead. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org