Mark, AFAIK http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis a convenient framework for such juggling. Please also be aware of the good starting point http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett <mbenn...@ideaeng.com> wrote: > Scenario: > > You're submitting a block of text as a query. > > You're content to let solr / lucene handing query parsing and tokenziation, > etc. > > But you'd like to have ALL eventually produced leaf-nodes in the parse tree > to have: > * Boolean .MUST (effectively a + prefix) > * Fuzzy match of ~1 or ~2 > > In a simple application, and if there were no punctuation, you could > preprocess the query, effectively: > * split on whitespace > * for t in tokens: t = "+" + t + "~2" > > But this is ugly, and even then I think things like stop words would be > messed up: > * OK in Solr: the chair (it can properly remove "the") > * But if this: +the~2 +chair~2 (I'm not sure this would work) > > Sure, at the application level you could also remove the stop words in the > "for t in tokens" loop, but then some other weird case would come up. > Maybe one of the field's analyzers has some other token filter you forgot > about, so you'd have to bring that logic forward as well. > > (Long story of why I'd want to do all this... and I know people think > adding ~2 to all tokens will give bad results anyway, trying to fix > inherited code that can't be scrapped, etc) > > -- > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>