: I confirmed this behavior in trunk with the following query:
: http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat
: 
: The result is that the double quote is dropped:
: +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)
: 
: This seems like it's a bug (rather than by design), but I could be
: wrong... Hoss?

It was by design ... but it could be handled better.  the idea is that if 
the input has balanced quotes (ie: an even number) then leave them alone 
so they are dealt with as phrase delimiters.  If there is an uneven number 
strip them out since we don't know wether they are a mistake (ie: unclosed 
phrase) or intended to be literal.

auto-escaping them probably would have been a better way to go (ie: let 
the analyzer decide wether or not to strip them) ... i'm not sure why i 
didn't do that in the first place (I think at the time the lucene 
QueryParser didn't deal with escaped quotes very well)

the thing to keep in mind, is that even if it did escape them, this still 
wouldn't work if the user input were...

             the 6'2" man dating the 5'3" woman

...because it would assume the even number of double-quote characters mean 
that   " man dating the 5'3"  is a phrase.  i remember spending a day 
going over query loks trying tp figure out a good set of hueristic rules 
for guessing when quote characters in user input should be interpreted as 
phrase delims vs "inch" markers before a coworker smacked me and made me 
realize it was a fairly intractable problem and simple rules would be 
easier to understand anyway.

FYI: this is all happening in 
SolrPluginUtils.stripUnbalancedQuotes(CharSequence) which 
DisMax(RequestHanler) calls before passing the string to 
DisjunctionMaxQueryParser.



-Hoss

Reply via email to