Check this setting: <bool name="overwriteDupes">false</bool>
On Tuesday 14 December 2010 14:26:21 Jason Brown wrote: > I have configured de-duplication according to the Wiki.......... > > My signature field is defined thus... > > <field name="signature" type="string" stored="true" indexed="true" > multiValued="false" /> > > and my updateRequestProcessor as follows.... > > <updateRequestProcessorChain name="dedupe"> > <processor > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> > <bool name="enabled">true</bool> > <bool name="overwriteDupes">false</bool> > <str name="signatureField">signature</str> > <str name="fields">content</str> > <str > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</s > tr> </processor> > <processor class="solr.LogUpdateProcessorFactory" /> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > > I am using SOLRJ to write to the index with the binary (as opposed to XML) > so my update handler is defined as below..... > > <requestHandler name="/update/javabin" > class="solr.BinaryUpdateRequestHandler" > <lst name="defaults"> > <str name="update.processor">dedupe</str> > </lst> > </requestHandler> > > However I was expecting SOLR to only allow 1 instance of a duplicate > document into the index, but I get the following results when I query mt > index... > > I have deliberately added my ISA Letter file 4 times and can see it has > correctly generated an identical signature for the first 4 entries > (d91a5ce933457fd5). The fifth entry is a different document and correctly > has a different signature. > > I was expecting to only see 1 instance of the duplicate. Am I > misinterpreting the way it works? Many Thanks. > > <result name="response" numFound="36" start="0"> > ? > <doc> > <str name="doctitle">ISA Letter</str> > <str name="signature">d91a5ce933457fd5</str> > </doc> > ? > <doc> > <str name="doctitle">ISA Letter</str> > <str name="signature">d91a5ce933457fd5</str> > </doc> > ? > <doc> > <str name="doctitle">ISA Letter</str> > <str name="signature">d91a5ce933457fd5</str> > </doc> > ? > <doc> > <str name="doctitle">ISA Letter</str> > <str name="signature">d91a5ce933457fd5</str> > </doc> > ? > <doc> > <str name="doctitle">ISA Mailing pack letter</str> > <str name="signature">fd9d9e1c0de32fb5</str> > </doc> > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350