Currently I've the following Update Request Processor chain to prevent indexing 
very similar text items into a core dedicated to store queries that our users 
put into the web interface of our system.

<!-- Delete similar duplicated documents on index time, using some fuzzy text 
similary techniques -->
<updateRequestProcessorChain name="dedupe">
    <processor 
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
      <bool name="enabled">true</bool>
      <bool name="overwriteDupes">false</bool>
      <str name="signatureField">signature</str>
      <str name="fields">textsuggest,textng</str>
      <str 
name="signatureClass">org.apache.solr.update.processor.TextProfileSignature</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

Right now we are trying to implement a custom update request handler to keep 
track of how many any given query hits our solr server, in plain simple we want 
to keep a field that counts how many we have tried to insert the same query. We 
are using Solr 3.6, so how can we use (from the code of our custom update 
handler) the deduplicatin request processor to check if the query we are trying 
to insert/update already exists?

Greetings! 
________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

Reply via email to