Regarding detection of duplication

Iniyan Tue, 24 Mar 2015 10:23:08 -0700

Hi,

My requirement is to detect duplication in title after removing punctuation
marks, stop words, accented characters.


I am trying to do exact match . After that I am thinking of applying
filters. 

I have tried solr. KeywordTokenizerFactory . It does exact matching. But
when I add 

<filter class="solr.StopFilterFactory" ignoreCase="true"
                                    words="stopwords.txt"
enablePositionIncrements="true" />

Stop filter is not working.

But If I apply solr.StandardTokenizerFactory , am not getting the exact
match.


Title:

What is a apple?
What is an apple?
What is the apple?

When I type "What is a apple" I need to get all the above.

Could you please let me know that Is there any tokenizer/filter matching my
requirement.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Regarding-detection-of-duplication-tp4194975.html
Sent from the Solr - User mailing list archive at Nabble.com.

Regarding detection of duplication

Reply via email to