I am interested in a new filter type, one that would combine edgengram and ngram. The idea is that it would create all ngrams specified by the min/max size, but the ngrams that happen to be edgengrams (specifically the left side) would get an index-time boost. Optionally the boost would be higher if it came from the first token.

The use case: An automatic autosuggest dropdown that populates as a user types into a search box. The index would have one field and it would be built from a manually produced list of suggested search phrases. The boosts mentioned would make it so that matches from the beginning of a word, and especially from the beginning of the entire suggested phrase, would be returned first.

I could get a similar effect by using a copyfield, analyzing one field with ngrams and the other with edgengrams, then using edismax to put a boost on the edge version. I will start with this method, but using copyfield makes the index bigger, and using dismax makes the ultimate parsed queries more complicated.

If I can avoid the copyfield, the index will be smaller and the queries very simple, which should make for very high speed.

I will take a look at the source code, but I'm a bit of a Java novice. Does anyone have the knowledge, desire, and time to crank this one out quickly? Is it possible someone has already written such a filter?

Thanks,
Shawn

Reply via email to