or "why haven't I thought of this before"? I'm once again being faced with the recurring problem of phrase searches with wildcards. It'll lead to index bloat, but that's acceptable in this situation, at least until proved not so.
The surround query parser can deal with wildcards and proximith, but it doesn't accept anything less than three leading characters, which is another problem in this case. I know the complex phrase query parser is out there, but it's not part of the code base. So I'm thinking of modifying the EdgeNGramFilter, I've coded up a prototype that seems to work. Basically, it just appends $ to all the grams _except_ the last one. I set maxGramSize to 1000, so we'll assume the final gram is the original term. So, indexing "my dog has fleas" I get pos 1 pos 2 pos 3 pos 4 m$ d$ h$ f$ my do$ ha$ fl$ dog has fle$ flea$ fleas Now, when users want to search for "m* fleas" within 5 words, they can search for : "m$ fleas"~5 or "m$ fle$"~5 or even "m$ do$ fle$"~3 and they won't get false matches on something like "do ha" You have to accept some simplifications here, of course. This doesn't handle things like "fle*s" and the like. I'm also not sure this is general-purpose enough to make an option for EdgeNGramFilterFactory, the use-case is somewhat restricted. But that's a relatively natural fit, a new param like 'subGramAppendChar="$" ' Thoughts?