Thank you so much. I have it sorted. I am wondering now if there is any more stable way to use deduplication than adding to the solr source project this patch: https://issues.apache.org/jira/browse/SOLR-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel (SOLR-799.patch 2008-11-12 05:10 PM this one exactly).
I have downloaded the last nightly-build source code and couldn't see the needed classes in there. Anyones knows something?Should I ask this in the developers forum? Thanks in advanced Marc Sturlese wrote: > > Hey there, > > I have posted before telling about my situation but I thing my explanation > was a bit confusing... > I am using dataImportHanlder and delta-import and it's working perfectly. > I have also coded my own SqlEntityProcesor to delete from the index and > database expired rows. > > Now I need to do duplication control at indexing time. In my old lucene > core I made my own duplication control but it was so slow as it worked > comparing strings... I have been investigating solr deduplication > (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it > works with hashes instead of strings. > > I have learned how to use deduplication using the /update requestHandler > as the wiki says: > <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" > > <lst name="defaults"> > <str name="update.processor">dedupe</str> > </lst> > </requestHandler> > > But the thing is that I want to use it with the /dataimport requestHanlder > (the one used by dataimporthandler). I don't know if there's a possible > xml configuration to add deduplication to dataimportHandler or I should > code a plugin... in that case, I don't exacly now where. > > Hope my explanation is more clear now... > Thank's in advanced! > > > -- View this message in context: http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20538008.html Sent from the Solr - User mailing list archive at Nabble.com.