[jira] [Commented] (SOLR-13807) Caching for term facet counts

Chris M. Hostetter (Jira) Thu, 05 Mar 2020 09:16:20 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052350#comment-17052350
 ]


Chris M. Hostetter commented on SOLR-13807:
-------------------------------------------

bq. my understanding of CacheHelper.getKey() is that the returned keys ... that 
the types of modifications you mention (deletes, in-place DV updates, etc.) 
should result in the creation of a new cache key. Is that not true?

I don't know ... it's not something i've looked into in depth, if so then false 
alarm (but we should double check, and ideally prove it w/a defensive white box 
test of the regenerator after doing some deletes/in-place updates)

bq. countCacheDf is defined wrt the main domain DocSet.size(), and only affects 
whether the termFacetCache is consulted for a given domain-request combination 
...

Oh, oh OH ! ... ok .... that explains so much about what i was seeing in cache 
stats after various requests.  For some reason I thought it controlled whether 
individual term=counts were being cached -- which reminds me: we need ref-guide 
updates in the PR : )

bq. ...As far as the temporarily tabled concerns about concurrent mutation...

Those concerns were largely related to my mistaken impression that different 
requests w/different {{countCacheDf}} params were causing the original segment 
level cache values to be mutated in place (w/o doing a new "insert" back into 
the cache) because that's what i convinced myself was happening to explain the 
cache stats i was seeing and my vague (missguided) assumptions about how/why 
{{CacheState.PARTIALLY_CACHED}} existed from skimming the code.

Your point about doing a defensive copy of the segment level counts & atomic 
re-insert of the top level entry after updating the counts for the new segments 
makes perfect sense.

> Caching for term facet counts
> -----------------------------
>
>                 Key: SOLR-13807
>                 URL: https://issues.apache.org/jira/browse/SOLR-13807
>             Project: Solr
>          Issue Type: New Feature
>          Components: Facet Module
>    Affects Versions: master (9.0), 8.2
>            Reporter: Michael Gibney
>            Priority: Minor
>         Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with 
> variant requests over the same result domain (this would support common use 
> cases like paging, but also potentially more interesting direct uses of 
> facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a 
> given domain are cached, a refinement request could simply look up the 
> ordinal value for each enumerated term and directly grab the count out of the 
> count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on 
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13807) Caching for term facet counts

Reply via email to