Hi Brendan,

What modifications have you made to ShingleFilter?  Can you share them?

Karl Wettin recently contributed ShingleMatrixFilter to Lucene - among other 
things, it can generate shingles of more than one size (check the test cases 
for how to do this):

<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleMatrixFilter.html>

Steve

On 08/13/2008 at 5:27 PM, Brendan Grainger wrote:
> Hi Ryan,
> 
> We do basically the same thing, using a modified ShingleFilter
> (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javado
> c//contrib-analyzers/org/apache/lucene/analysis/shingle/Shingl
> eFilter.html ). I have it set up to build 'shingles' of size 2, 3, 4, 5
> which I index into separate fields. If there is a better way of doing
> this sort of thing I'd love to know :-)
> 
> Brendan
> 
> On Aug 13, 2008, at 3:59 PM, Ryan McKinley wrote:
> 
> > I'm looking for a way to get common word groups within documents.
> > That is, what are the top two, three, ... n word groups within the
> > index.
> > 
> > I was messing with indexing adjacent words together (sorry about the
> > earlier commit)... is this a reasonable approach?  Any other ideas for
> > pulling out common phrases?  Any simple post processing?
> > 
> > ryan
> 
>

 

Reply via email to