Hi Lance,

 sounds interesting. The idea was to use a message digest (e. g. a md5 hash) of 
a file to be indexed as an unique identifier to avoid duplicates. I wasn't 
aware of the de-duplication feature you mention. This feature seems to be the 
exact solution for my problem. In the solr wiki I found some samples how to 
configure and trigger it when calling a XmlUpdateRequestHandler. I guess I can 
also use it in a similar way when calling the DataImportHandler, correct?

Many thanks for your suggestion.
 Joe

 > The SignatureUpdateProcessor implements a smaller, faster cryptohash. > It 
 > is used by the de-duplication feature. > > What's the purpose? Do you need 
 > the MD5 algorithm, or is any competent > cryptohash good enough? > > On Sat, 
 > Apr 21, 2012 at 5:55 AM, <kuchenbr...@mail.org> wrote: > > Hi Otis, > > > > 
 > thank you very much for the quick response to my question. I'll have a look 
 > at your > suggested solution. Do you know if there's any documentation about 
 > writing such an Update > Request Handler or how to trigger it using the Data 
 > Import/Tika combination? > > > > Thanks. > > Joe

Reply via email to