Re: Word Gram?

Brendan Grainger Wed, 13 Aug 2008 14:28:33 -0700

Hi Ryan,

We do basically the same thing, using a modified ShingleFilter (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html). I have it set up to build 'shingles' of size 2, 3, 4, 5 which Iindex into separate fields. If there is a better way of doing thissort of thing I'd love to know :-)


Brendan

On Aug 13, 2008, at 3:59 PM, Ryan McKinley wrote:

I'm looking for a way to get common word groups within documents.That is, what are the top two, three, ... n word groups within theindex.
I was messing with indexing adjacent words together (sorry about theearlier commit)... is this a reasonable approach? Any other ideasfor pulling out common phrases? Any simple post processing?
ryan

Re: Word Gram?

Reply via email to