Re: using deduplication with dataimporthandler

Marc Sturlese Mon, 17 Nov 2008 03:49:00 -0800

Thank you so much. I have it sorted.
I am wondering now if there is any more stable way to use deduplication than
adding to the solr source project this patch:
https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
(SOLR-799.patch         2008-11-12 05:10 PM this one exactly).


I have downloaded the last nightly-build source code and couldn't see the
needed classes in there.
Anyones knows something?Should I ask this in the developers forum?

Thanks in advanced


Marc Sturlese wrote:
> 
> Hey there,
> 
> I have posted before telling about my situation but I thing my explanation
> was a bit confusing...
> I am using dataImportHanlder and delta-import and it's working perfectly.
> I have also coded my own SqlEntityProcesor to delete from the index and
> database expired rows.
> 
> Now I need to do duplication control at indexing time. In my old lucene
> core I made my own duplication control but it was so slow as it worked
> comparing strings... I have been investigating solr deduplication
> (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it
> works with hashes instead of strings.
> 
> I have learned how to use deduplication using the /update requestHandler
> as the wiki says:
>  <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" >
>     <lst name="defaults">
>       <str name="update.processor">dedupe</str>
>     </lst>
>   </requestHandler>
> 
> But the thing is that I want to use it with the /dataimport requestHanlder
> (the one used by dataimporthandler). I don't know if there's a possible
> xml configuration to add deduplication to dataimportHandler or I should
> code a plugin... in that case, I don't exacly now where.
> 
> Hope my explanation is more clear now...
> Thank's in advanced!
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: using deduplication with dataimporthandler

Reply via email to