Happy to help! If I'm correctly reading the block of code linked to above, "dvhash" is silently ignored for multi-valued fields. So probably not much performance difference there ;-)
On Fri, Feb 5, 2021 at 2:12 PM ufuk yılmaz <uyil...@vivaldi.net.invalid> wrote: > This is a huge help Mr. Gibney thank you! > > One thing I can add is I tried dvhash with a string multi-valued field, it > worked and didn’t throw any error but I don’t know if it got silently > ignored or just worked. > > Sent from Mail for Windows 10 > > From: Michael Gibney > Sent: 05 February 2021 20:52 > To: solr-user@lucene.apache.org > Subject: Re: Clarification on term facet method dvhash > > Correction!: wrt "dvhash" and numeric types, it looks like I had it exactly > backwards! single-valued numeric types _do_ use (even default to) "dvhash" > ... sorry about that! I stand by the rest of the previous message though, > which applies at a minimum to string-like fields. > > On Fri, Feb 5, 2021 at 12:49 PM Michael Gibney <mich...@michaelgibney.net> > wrote: > > > > Performance and resource is still affected by 30M unique values of T > > right? > > Yes. The main performance issue would be the per-request allocation of a > > 30M-element `long[]` for "dv" or "uif" methods (which are by far the most > > common methods in practice). With low enough request volume and large > > enough heap you might not actually perceive a difference in performance; > > but if you encounter problems for the use case you describe, this array > > allocation would likely be the cause. (also note that the relevant field > > cardinality is the _per-shard_ cardinality, so in a multi-shard > collection > > the size of the allocated arrays might be somewhat less than the overall > > field cardinality) > > > > I'm reasonably sure that "dvhash" is _not_ auto-picked by "smart" at the > > moment, but rather must be specified explicitly: > > > > > https://github.com/apache/lucene-solr/blob/6ff4a9b395a68d9b0d9e259537e3f5daf0278d51/solr/core/src/java/org/apache/solr/search/facet/FacetField.java#L124-L128 > > > > The code snippet above indicates some other restrictions that you're > > probably already aware of (doesn't work with prefixes or mincount==0, or > > for multi-valued or numeric types); otherwise though (for non-numeric > > single-valued field) I think the situation you describe (high-cardinality > > field, known low-cardinality for the particular domain) sounds like a > > perfect use-case for dvhash. > > > > Michael > > > > On Fri, Feb 5, 2021 at 11:56 AM ufuk yılmaz <uyil...@vivaldi.net.invalid > > > > wrote: > > > >> Hello, > >> > >> I’m using Solr 8.4. Very excited about performance improvements in 8.8: > >> http://joelsolr.blogspot.com/2021/01/optimizations-coming-to-solr.html > >> > >> As I understand the main determinator of performance and RAM usage of a > >> terms facet is cardinality of the field in whole collection, but not the > >> cardinality of field in query result. > >> > >> I have a collection with 100M docs, T field has 30M unique values in > >> entire collection. But my query result returns only docs with 2 > different T > >> values, > >> > >> { > >> “q”: “some query”, //whose result has only 2 different T values > >> “facet”: { > >> “type”: “terms”, > >> “field”: “T”, > >> “limit”: 15 > >> } > >> > >> Performance and resource is still affected by 30M unique values of T > >> right? > >> > >> If this is correct, can/how “method”: “dvhash” help in this case? > >> If yes, does the default method “smart” take this into account and use > >> the dvhash, so I shouldn’t to set it explicitly? > >> > >> Nice weekends > >> ~ufuk > >> > > > >