Hi xavier,

It's not clear to me what you want.  Is the "edge" you're referring to the 
beginning of a field? E.g. raw text "one two three four" with EdgeShingleFilter 
configured to produce unigrams, bigrams and trigams would produce "one", "one 
two", and "one two three", but nothing else?

If so, I suspect writing a LimitTokenPositionFilter (which would stop emitting 
tokens after the token position exceeds a specified limit) would be better, 
rather than subclassing ShingleFilter.  You could use LimitTokenCountFilter as 
a model, especially its "comsumeAllTokens" option.  I think this would make a 
nice addition to Lucene.

Also, what do you plan to use this for?

Steve

On Mar 16, 2013, at 5:02 PM, xavier jmlucjav <jmluc...@gmail.com> wrote:
> Hi,
> 
> I need to use shingles but only keep the ones that start from the edge.
> 
> I want to confirm there is no way to get this feature without subclassing
> ShingleFilter, cause I thought someone would have already encountered this
> use case....
> 
> thanks
> xavier

Reply via email to