Hi xavier, It's not clear to me what you want. Is the "edge" you're referring to the beginning of a field? E.g. raw text "one two three four" with EdgeShingleFilter configured to produce unigrams, bigrams and trigams would produce "one", "one two", and "one two three", but nothing else?
If so, I suspect writing a LimitTokenPositionFilter (which would stop emitting tokens after the token position exceeds a specified limit) would be better, rather than subclassing ShingleFilter. You could use LimitTokenCountFilter as a model, especially its "comsumeAllTokens" option. I think this would make a nice addition to Lucene. Also, what do you plan to use this for? Steve On Mar 16, 2013, at 5:02 PM, xavier jmlucjav <jmluc...@gmail.com> wrote: > Hi, > > I need to use shingles but only keep the ones that start from the edge. > > I want to confirm there is no way to get this feature without subclassing > ShingleFilter, cause I thought someone would have already encountered this > use case.... > > thanks > xavier