Currently I've the following Update Request Processor chain to prevent indexing very similar text items into a core dedicated to store queries that our users put into the web interface of our system.
<!-- Delete similar duplicated documents on index time, using some fuzzy text similary techniques --> <updateRequestProcessorChain name="dedupe"> <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> <bool name="enabled">true</bool> <bool name="overwriteDupes">false</bool> <str name="signatureField">signature</str> <str name="fields">textsuggest,textng</str> <str name="signatureClass">org.apache.solr.update.processor.TextProfileSignature</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> Right now we are trying to implement a custom update request handler to keep track of how many any given query hits our solr server, in plain simple we want to keep a field that counts how many we have tried to insert the same query. We are using Solr 3.6, so how can we use (from the code of our custom update handler) the deduplicatin request processor to check if the query we are trying to insert/update already exists? Greetings! ________________________________________________________________________________________________ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu