[
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942117#comment-16942117
]
Michael Gibney commented on SOLR-13807:
---------------------------------------
I initially proposed this idea, along with an implementation over
{{SimpleFacets}}, as (very) tangentially related to
[SOLR-8096|https://issues.apache.org/jira/browse/SOLR-8096?focusedCommentId=15960982#comment-15960982].
As a natural consequence of working to address some performance issues with
full-domain SKG/relatedness (SOLR-13132), I updated the initial facet cache
implementation to be compatible with JSON facets (while maintaining
cross-compatibility with {{SimpleFacets}}).
[PR #751|https://github.com/apache/lucene-solr/pull/751] (associated with
SOLR-13132) incorporates a facet cache that I believe realizes all of the
potential mentioned in the proposal/description above, including being
NRT-friendly/segment-aware ... with the exception of point 5 (the PR does not
leverage the facet cache for distributed refinement; the facet cache itself was
a prerequisite for the SKG/relatedness work, but distributed refinement would
have definitely been out of scope).
In retrospect I would have preferred to submit a separate PR for only the facet
cache; I did not go that route, but only because the facet cache implementation
grew organically out of (and was prerequisite to) the work on SKG/relatedness.
Would people be comfortable (at least initially) evaluating the facet cache
implementation in the context of SOLR-13132? Whether or not I end up having to
extract the facet cache work into a separate PR, I thought it would be worth
opening this separate Jira issue for the facet cache, since its use could
potentially be much more general (beyond SKG/relatedness).
> Caching for term facet counts
> -----------------------------
>
> Key: SOLR-13807
> URL: https://issues.apache.org/jira/browse/SOLR-13807
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Affects Versions: master (9.0), 8.2
> Reporter: Michael Gibney
> Priority: Minor
>
> Solr does not have a facet count cache; so for _every_ request, term facets
> are recalculated for _every_ (facet) field, by iterating over _every_ field
> value for _every_ doc in the result domain, and incrementing the associated
> count.
> This redoes a lot of work, including all associated object allocation, GC,
> etc., and could benefit greatly from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet
> calculation, latency is proportional to the size of the result domain.
> Consequently, one common/clear manifestation of this issue is high latency
> for faceting over an unrestricted domain (e.g., {{*:*}}), as might be
> observed on a top-level landing page that exposes facets. This type of
> "static" case is often mitigated by external (to Solr) caching, either with a
> caching layer between Solr and a front-end application, or within a front-end
> application, or even with a caching layer between the end user and a
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the
> stack (or, for a new user, even being aware of this as a potential issue to
> mitigate), any external caching mitigation is really only appropriate for
> relatively static cases like the "landing page" example described above. A
> Solr-internal facet count cache (analogous to the {{filterCache}}) would
> provide the following additional benefits:
> # ease of use/out-of-the-box configuration to address a common performance
> concern
> # compact (specifically caching count arrays, without the extra baggage that
> accompanies a naive external caching approach)
> # NRT-friendly (could be implemented to be segment-aware)
> # modular, capable of reusing the same cached values in conjunction with
> variant requests over the same result domain (this would support common use
> cases like paging, but also potentially more interesting direct uses of
> facets).
> # could be used for distributed refinement (i.e., if facet counts over a
> given domain are cached, a refinement request could simply look up the
> ordinal value for each enumerated term and directly grab the count out of the
> count array that was cached during the first phase of facet calculation)
> # composable (e.g., in aggregate functions that calculate values based on
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]