Re: Check my thinking on this, wildcard matching in phrases.

Alexandre Rafalovitch Thu, 13 Mar 2014 18:45:36 -0700

Different but (conceptually) similar?
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/index.html


Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Mar 14, 2014 at 8:38 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> or "why haven't I thought of this before"?
>
> I'm once again being faced with the recurring problem of phrase
> searches with wildcards. It'll lead to index bloat, but that's
> acceptable in this situation, at least until proved not so.
>
> The surround query parser can deal with wildcards and proximith, but
> it doesn't accept anything less than three leading characters, which
> is another problem in this case.
>
> I know the complex phrase query parser is out there, but it's not part
> of the code base.
>
> So I'm thinking of modifying the EdgeNGramFilter, I've coded up a
> prototype that seems to work. Basically, it just appends $ to all the
> grams _except_ the last one. I set maxGramSize to 1000, so we'll
> assume the final gram is the original term.
>
> So, indexing "my dog has fleas" I get
> pos 1 pos 2 pos 3   pos 4
> m$      d$         h$      f$
> my      do$       ha$    fl$
>            dog       has     fle$
>                                     flea$
>                                     fleas
>
>
> Now, when users want to search for "m* fleas" within 5 words, they can
> search for :
> "m$ fleas"~5
> or
> "m$ fle$"~5
> or even
> "m$ do$ fle$"~3
>
>
> and they won't get false matches on something like
> "do ha"
>
> You have to accept some simplifications here, of course. This doesn't
> handle things like "fle*s" and the like.
>
> I'm also not sure this is general-purpose enough to make an option for
> EdgeNGramFilterFactory, the use-case is somewhat restricted. But
> that's a relatively natural fit, a new param like
> 'subGramAppendChar="$" '
>
> Thoughts?

Re: Check my thinking on this, wildcard matching in phrases.

Reply via email to