Re: Using Shingles to Increase Phrase Search Performance

2008-09-24 Thread Norberto Meijome
On Sat, 16 Aug 2008 15:39:44 -0700 "Chris Harris" <[EMAIL PROTECTED]> wrote: [...] > So finally I modified the Lucene ShingleFilter class to add an > "outputUnigramIfNoNgram option". Basically, if you set that option, > and also set outputUnigrams=false, then the filter will tokenize just > as in

Using Shingles to Increase Phrase Search Performance

2008-08-16 Thread Chris Harris
Mike Klaas suggested last month that I might be able to improve phrase search performance by indexing word bigrams, aka bigram shingles. I've been playing with this, and the initial results are very promising. (I may post some performance data later.) I wanted to describe my technique, which I'm no