Any update processor can be used with DIH . First of all you may register your dedupe update processor as you do now. You can either pass the update.processor is the request parameter pr you can keep the it in the 'defaults' of datataimport handler
<str name="update.processor">dedupe</str> On Mon, Nov 17, 2008 at 2:48 PM, Marc Sturlese <[EMAIL PROTECTED]> wrote: > > Hey there, > > I have posted before telling about my situation but I thing my explanation > was a bit confusing... > I am using dataImportHanlder and delta-import and it's working perfectly. I > have also coded my own SqlEntityProcesor to delete from the index and > database expired rows. > > Now I need to do duplication control at indexing time. In my old lucene core > I made my own duplication control but it was so slow as it worked comparing > strings... I have been investigating solr deduplication > (http://wiki.apache.org/solr/Deduplication) and it seems so cool as it works > with hashes instead of strings. > > I have learned how to use deduplication using the /update requestHandler as > the wiki says: > <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" > > <lst name="defaults"> > <str name="update.processor">dedupe</str> > </lst> > </requestHandler> > > But the thing is that I want to use it with the /dataimport requestHanlder > (the one used by dataimporthandler). I don't know if there's a possible xml > configuration to add deduplication to dataimportHandler or I should code a > plugin... in that case, I don't exacly now where. > > Hope my explanation is more clear now... > Thank's in advanced! > > > -- > View this message in context: > http://www.nabble.com/using-deduplication-with-dataimporthandler-tp20536053p20536053.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul