Re: keyword query tokenizer

Chris Hostetter Fri, 26 Mar 2010 22:36:03 -0700

: 
: I am curious as to why the query parser does any tokenizing?  I would think
: you would want control/configure this with your analyzers?
: 
: Does anyone know the answer to this. Is there a performance gain or something?


it's not about performance, it's about hte query parser syntax.

whitespace is "markup" as far as the query parser is concerned -- just 
like +,-, etc.. whitespace characters are instructions for the query 
parsers.  

Essentially: unquoted whitespace is the markup that tells the query parser 
to create an "OR" query out of the "chunks" of input on either side of hte 
space (+ signifies MUST, - signifies PROHIBITED, but there is no markup to 
signify "SHOULD")

Also: if the query parser didn't chunk on whitespace queries like this...

        aWord aField:anotherWord

...wouldn't work in the standard query parser.  

You may think "but i'm using dismax, why does dismax need to worry about 
that?" but the key to remember there is that if dismax didn't split on 
whitespace prior to analysis, it wouldn't be able to build the 
DisjunctionMaxQuery's that it uses to find the max scoring field per 
"word" (which is the whole point of hte parser).



-Hoss

Re: keyword query tokenizer

Reply via email to