RE: keyword query tokenizer

Jason Chaffee Mon, 29 Mar 2010 10:51:11 -0700

Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to
do.  I do not max scoring field per "word".  Instead, I want it per
"phrase" which may be a single word or multiple words.

-----Original Message-----
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, March 26, 2010 10:35 PM
To: solr-user@lucene.apache.org
Subject: Re: keyword query tokenizer

: 
: I am curious as to why the query parser does any tokenizing?  I would
think
: you would want control/configure this with your analyzers?
: 
: Does anyone know the answer to this. Is there a performance gain or
something?

it's not about performance, it's about hte query parser syntax.

whitespace is "markup" as far as the query parser is concerned -- just 
like +,-, etc.. whitespace characters are instructions for the query 
parsers.  

Essentially: unquoted whitespace is the markup that tells the query
parser 
to create an "OR" query out of the "chunks" of input on either side of
hte 
space (+ signifies MUST, - signifies PROHIBITED, but there is no markup
to 
signify "SHOULD")

Also: if the query parser didn't chunk on whitespace queries like
this...

        aWord aField:anotherWord

...wouldn't work in the standard query parser.  

You may think "but i'm using dismax, why does dismax need to worry about

that?" but the key to remember there is that if dismax didn't split on 
whitespace prior to analysis, it wouldn't be able to build the 
DisjunctionMaxQuery's that it uses to find the max scoring field per 
"word" (which is the whole point of hte parser).

-Hoss

RE: keyword query tokenizer

Reply via email to