Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time

Chris Hostetter Thu, 22 Oct 2009 11:57:52 -0700

: Thank you for the the explanation.
: 
: Let's say product_name_un is not untokenized, but it is tokenized with:
: <tokenizer class="solr.PatternTokenizerFactory" pattern=", *" />
: and the user enters "blue car, big wheels".


Which tokenizer are we talking about? the index or query?

: with greater boost factor for product_name_un. So that if there are products
: "big wheels chair" and "big wheels" in the index, the second one is higher
: in the results when user enters  "blue car, big wheels".

Ok, so i'm asusming you mean you want to use the pattern tokenizer above 
at query time -- the thing you have to remember is that before the query 
time analysis is done, the query parser has to inspect teh raw input and 
decide what is "markup" and what is "input" ... both dismax and the 
standard query parser consider un-escaped/un-quated whitespace to be 
markup, so the text is divided up that way before yout analyzier is ever 
used -- it has to be so that the dismax parser has discrete chunks to 
correlate in the DisjunctionMaxQueries.

If you just want the full input string passed to the analyzer of each qf 
field, then you just need to quote the entire string (or escape every 
shitespace charter in the string with a backslash) so that the entire 
input is considered one chunk -- but then you don't get to use +/-, mm is 
meaninless, real quote characters spcified by your users are meaninless, 
etc...

You have to pick your trade offs.



-Hoss

Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time

Reply via email to