Mark,

AFAIK
http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
a convenient framework for such juggling.
Please also be aware of the good starting point
http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html



On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett <mbenn...@ideaeng.com> wrote:

> Scenario:
>
> You're submitting a block of text as a query.
>
> You're content to let solr / lucene handing query parsing and tokenziation,
> etc.
>
> But you'd like to have ALL eventually produced leaf-nodes in the parse tree
> to have:
> * Boolean .MUST (effectively a + prefix)
> * Fuzzy match of ~1 or ~2
>
> In a simple application, and if there were no punctuation, you could
> preprocess the query, effectively:
> * split on whitespace
> * for t in tokens: t = "+" + t + "~2"
>
> But this is ugly, and even then I think things like stop words would be
> messed up:
> * OK in Solr:   the chair    (it can properly remove "the")
> * But if this:    +the~2  +chair~2   (I'm not sure this would work)
>
> Sure, at the application level you could also remove the stop words in the
> "for t in tokens" loop, but then some other weird case would come up.
> Maybe one of the field's analyzers has some other token filter you forgot
> about, so you'd have to bring that logic forward as well.
>
> (Long story of why I'd want to do all this... and I know people think
> adding ~2 to all tokens will give bad results anyway, trying to fix
> inherited code that can't be scrapped, etc)
>
> --
> Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com
> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to