:
: I am curious as to why the query parser does any tokenizing? I would think
: you would want control/configure this with your analyzers?
:
: Does anyone know the answer to this. Is there a performance gain or something?
it's not about performance, it's about hte query parser syntax.
whitespace is "markup" as far as the query parser is concerned -- just
like +,-, etc.. whitespace characters are instructions for the query
parsers.
Essentially: unquoted whitespace is the markup that tells the query parser
to create an "OR" query out of the "chunks" of input on either side of hte
space (+ signifies MUST, - signifies PROHIBITED, but there is no markup to
signify "SHOULD")
Also: if the query parser didn't chunk on whitespace queries like this...
aWord aField:anotherWord
...wouldn't work in the standard query parser.
You may think "but i'm using dismax, why does dismax need to worry about
that?" but the key to remember there is that if dismax didn't split on
whitespace prior to analysis, it wouldn't be able to build the
DisjunctionMaxQuery's that it uses to find the max scoring field per
"word" (which is the whole point of hte parser).
-Hoss