Different but (conceptually) similar? http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/index.html
Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Mar 14, 2014 at 8:38 AM, Erick Erickson <erickerick...@gmail.com> wrote: > or "why haven't I thought of this before"? > > I'm once again being faced with the recurring problem of phrase > searches with wildcards. It'll lead to index bloat, but that's > acceptable in this situation, at least until proved not so. > > The surround query parser can deal with wildcards and proximith, but > it doesn't accept anything less than three leading characters, which > is another problem in this case. > > I know the complex phrase query parser is out there, but it's not part > of the code base. > > So I'm thinking of modifying the EdgeNGramFilter, I've coded up a > prototype that seems to work. Basically, it just appends $ to all the > grams _except_ the last one. I set maxGramSize to 1000, so we'll > assume the final gram is the original term. > > So, indexing "my dog has fleas" I get > pos 1 pos 2 pos 3 pos 4 > m$ d$ h$ f$ > my do$ ha$ fl$ > dog has fle$ > flea$ > fleas > > > Now, when users want to search for "m* fleas" within 5 words, they can > search for : > "m$ fleas"~5 > or > "m$ fle$"~5 > or even > "m$ do$ fle$"~3 > > > and they won't get false matches on something like > "do ha" > > You have to accept some simplifications here, of course. This doesn't > handle things like "fle*s" and the like. > > I'm also not sure this is general-purpose enough to make an option for > EdgeNGramFilterFactory, the use-case is somewhat restricted. But > that's a relatively natural fit, a new param like > 'subGramAppendChar="$" ' > > Thoughts?