On Sat, 16 Aug 2008 15:39:44 -0700
"Chris Harris" <[EMAIL PROTECTED]> wrote:
[...]
> So finally I modified the Lucene ShingleFilter class to add an
> "outputUnigramIfNoNgram option". Basically, if you set that option,
> and also set outputUnigrams=false, then the filter will tokenize just
> as in
Mike Klaas suggested last month that I might be able to improve phrase
search performance by indexing word bigrams, aka bigram shingles. I've
been playing with this, and the initial results are very promising. (I
may post some performance data later.) I wanted to describe my
technique, which I'm no