: : I am curious as to why the query parser does any tokenizing? I would think : you would want control/configure this with your analyzers? : : Does anyone know the answer to this. Is there a performance gain or something?
it's not about performance, it's about hte query parser syntax. whitespace is "markup" as far as the query parser is concerned -- just like +,-, etc.. whitespace characters are instructions for the query parsers. Essentially: unquoted whitespace is the markup that tells the query parser to create an "OR" query out of the "chunks" of input on either side of hte space (+ signifies MUST, - signifies PROHIBITED, but there is no markup to signify "SHOULD") Also: if the query parser didn't chunk on whitespace queries like this... aWord aField:anotherWord ...wouldn't work in the standard query parser. You may think "but i'm using dismax, why does dismax need to worry about that?" but the key to remember there is that if dismax didn't split on whitespace prior to analysis, it wouldn't be able to build the DisjunctionMaxQuery's that it uses to find the max scoring field per "word" (which is the whole point of hte parser). -Hoss