In short, no. I don't think you want to use the shingle filter on a token stream that has multiple tokens at the same position, otherwise, you will get confused "suggestions", as you've encountered.

-- Jack Krupansky

-----Original Message----- From: Rounak Jain
Sent: Friday, May 03, 2013 7:34 AM
To: solr-user@lucene.apache.org
Subject: Configure Shingle Filter to ignore ngrams made of tokens with same start and end

Hello,

I was using Shingle Fitler with Suggester to implement an autosuggest
dropdown. The field I'm using with shingle filter has a worddelimiter with
preserveoriginal=1 to tokenize "women's" as "women's" and "womens."

Because of this, when shingle filter is generating word ngrams, apart from
the expected tokens, there's also a "women's womens" tokens. I wanted to
know if there's any way to configure ShingleFilter so that it ignores
tokens with same start and end values.

Thanks,
Rounak

Reply via email to