dsmiley commented on a change in pull request #1151: SOLR-13890: Add "top-level" DVTQ implementation URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r364299857
########## File path: solr/solr-ref-guide/src/other-parsers.adoc ########## @@ -1031,13 +1031,17 @@ The field on which to search. This parameter is required. Separator to use when parsing the input. If set to " " (a single blank space), will trim additional white space from the input terms. Defaults to a comma (`,`). `method`:: -An optional parameter used to determine which of several query implementations should be used by Solr. Options are restricted to: `termsFilter`, `booleanQuery`, `automaton`, or `docValuesTermsFilter`. If unspecified, the default value is `termsFilter`. Each implementation has its own performance characteristics, and users are encouraged to experiment to determine which implementation is most performant for their use-case. Heuristics are given below. +An optional parameter used to determine which of several query implementations should be used by Solr. Options are restricted to: `termsFilter`, `booleanQuery`, `automaton`, `docValuesTermsFilterPerSegment`, `docValuesTermsFilterTopLevel` or `docValuesTermsFilter`. If unspecified, the default value is `termsFilter`. Each implementation has its own performance characteristics, and users are encouraged to experiment to determine which implementation is most performant for their use-case. Heuristics are given below. + `booleanQuery` creates a `BooleanQuery` representing the request. Scales well with index size, but poorly with the number of terms being searched for. + `termsFilter` the default `method`. Uses a `BooleanQuery` or a `TermInSetQuery` depending on the number of terms. Scales well with index size, but only moderately with the number of query terms. + -`docValuesTermsFilter` uses doc values data structures to process the request. This method scales well to a large numbers of query terms. It encompasses two implementations or submethods. Solr uses heuristics to choose between these at runtime, but users can also pick explicitly by providing a `submethod` parameter with either `toplevel` or `persegment` as a value. The `persegment` implementation is more general purpose, while `toplevel` is geared for anyone with particularly high numbers of query terms (several hundred to several thousand). The `toplevel` submethod relies on data structures which are lazily populated after each commit. If you use this submethod and commit frequently, you may benefit from adding a static warming query to `solrconfig.xml` so that this is done as a part of the commit, and doesn't slow down user requests. +`docValuesTermsFilter` chooses between the `docValuesTermsFilterTopLevel` and `docValuesTermsFilterPerSegment` methods (see below) using the number of query terms as a rough heuristic. Users should typically use this method instead of using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` directly, unless they've done performance testing to validate one or the other methods on queries of all sizes. Depending on the implementation picked, this method may rely on expensive data structures which are lazily populated after each commit. If you commit frequently, you may benefit from adding a static warming query to `solrconfig.xml` so that this is done as a part of the commit itself and not attached directly to user requests. Review comment: Maybe right up front here first declare that this method is only appropriate when the field has docValues? That's more important and differentiates from terms index based methods. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org