Hi, My requirement is to detect duplication in title after removing punctuation marks, stop words, accented characters.
I am trying to do exact match . After that I am thinking of applying filters. I have tried solr. KeywordTokenizerFactory . It does exact matching. But when I add <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> Stop filter is not working. But If I apply solr.StandardTokenizerFactory , am not getting the exact match. Title: What is a apple? What is an apple? What is the apple? When I type "What is a apple" I need to get all the above. Could you please let me know that Is there any tokenizer/filter matching my requirement. -- View this message in context: http://lucene.472066.n3.nabble.com/Regarding-detection-of-duplication-tp4194975.html Sent from the Solr - User mailing list archive at Nabble.com.