dsmiley commented on a change in pull request #1151: SOLR-13890: Add 
"top-level" DVTQ implementation
URL: https://github.com/apache/lucene-solr/pull/1151#discussion_r364298797
 
 

 ##########
 File path: solr/solr-ref-guide/src/other-parsers.adoc
 ##########
 @@ -1031,13 +1031,17 @@ The field on which to search. This parameter is 
required.
 Separator to use when parsing the input. If set to " " (a single blank space), 
will trim additional white space from the input terms. Defaults to  a comma 
(`,`).
 
 `method`::
-An optional parameter used to determine which of several query implementations 
should be used by Solr.  Options are restricted to: `termsFilter`, 
`booleanQuery`, `automaton`, or `docValuesTermsFilter`.  If unspecified, the 
default value is `termsFilter`.  Each implementation has its own performance 
characteristics, and users are encouraged to experiment to determine which 
implementation is most performant for their use-case.  Heuristics are given 
below.
+An optional parameter used to determine which of several query implementations 
should be used by Solr.  Options are restricted to: `termsFilter`, 
`booleanQuery`, `automaton`, `docValuesTermsFilterPerSegment`, 
`docValuesTermsFilterTopLevel` or `docValuesTermsFilter`.  If unspecified, the 
default value is `termsFilter`.  Each implementation has its own performance 
characteristics, and users are encouraged to experiment to determine which 
implementation is most performant for their use-case.  Heuristics are given 
below.
 +
 `booleanQuery` creates a `BooleanQuery` representing the request.  Scales well 
with index size, but poorly with the number of terms being searched for.
 +
 `termsFilter` the default `method`.  Uses a `BooleanQuery` or a 
`TermInSetQuery` depending on the number of terms.  Scales well with index 
size, but only moderately with the number of query terms.
 +
-`docValuesTermsFilter` uses doc values data structures to process the request. 
 This method scales well to a large numbers of query terms.  It encompasses two 
implementations or submethods.  Solr uses heuristics to choose between these at 
runtime, but users can also pick explicitly by providing a `submethod` 
parameter with either `toplevel` or `persegment` as a value.  The `persegment` 
implementation is more general purpose, while `toplevel` is geared for anyone 
with particularly high numbers of query terms (several hundred to several 
thousand).  The `toplevel` submethod relies on data structures which are lazily 
populated after each commit.  If you use this submethod and commit frequently, 
you may benefit from adding a static warming query to `solrconfig.xml` so that 
this is done as a part of the commit, and doesn't slow down user requests.
+`docValuesTermsFilter` chooses between the `docValuesTermsFilterTopLevel` and 
`docValuesTermsFilterPerSegment` methods (see below) using the number of query 
terms as a rough heuristic.  Users should typically use this method instead of 
using `docValuesTermsFilterTopLevel` or `docValuesTermsFilterPerSegment` 
directly, unless they've done performance testing to validate one or the other 
methods on queries of all sizes. Depending on the implementation picked, this 
method may rely on expensive data structures which are lazily populated after 
each commit.  If you commit frequently, you may benefit from adding a static 
warming query to `solrconfig.xml` so that this is done as a part of the commit 
itself and not attached directly to user requests.
++
+`docValuesTermsFilterTopLevel` uses top-level doc values data structures to 
find results.  These data structures are more efficient as the number of query 
terms grows high (over several hundred). But they are also expensive to build 
and need to be populated lazily after each commit, causing a 
sometimes-noticeable slowdown on the first query after each commit.  If you 
commit frequently, you may benefit from adding a static warming query to your 
`solrconfig.xml` so that this is done as a part of the commit itself and not 
attached directly to user requests.
 
 Review comment:
   The advice about commit frequently and thus add a static warming query is 
very debatable.  "commit frequently" ~= NRT and NRT users want to dial down 
caching, even eliminate.  But but maybe they can tolerate longer latency for 
freshness provided the query response time isn't incurring costs post-commit.  
There's no universal answer.  Yet you did say "you *may* benefit" so I guess 
what you wrote is fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to