Combination of edgengram and ngram

Shawn Heisey Tue, 13 Dec 2011 09:25:04 -0800

I am interested in a new filter type, one that would combine edgengramand ngram. The idea is that it would create all ngrams specified by themin/max size, but the ngrams that happen to be edgengrams (specificallythe left side) would get an index-time boost. Optionally the boostwould be higher if it came from the first token.

The use case: An automatic autosuggest dropdown that populates as auser types into a search box. The index would have one field and itwould be built from a manually produced list of suggested searchphrases. The boosts mentioned would make it so that matches from thebeginning of a word, and especially from the beginning of the entiresuggested phrase, would be returned first.

I could get a similar effect by using a copyfield, analyzing one fieldwith ngrams and the other with edgengrams, then using edismax to put aboost on the edge version. I will start with this method, but usingcopyfield makes the index bigger, and using dismax makes the ultimateparsed queries more complicated.

If I can avoid the copyfield, the index will be smaller and the queriesvery simple, which should make for very high speed.

I will take a look at the source code, but I'm a bit of a Java novice.Does anyone have the knowledge, desire, and time to crank this one outquickly? Is it possible someone has already written such a filter?


Thanks,
Shawn

Combination of edgengram and ngram

Reply via email to