I had prior to reindexing (following the recent changes to ‘solrconfig.xml’) 
removed all documents matching the undesired ‘url’ field value pattern 
(wildcard deletions using ‘curl’).

What finally worked was to remove substitutions of values in the ‘id’ field, 
restart SolrCloud cluster, and reindex those specific documents containing 
those undesired ‘url’ field values. Now only the ‘url’ field is manipulated, 
and that somehow strangely did the trick.

<!-- BEGIN rewrite of URL -->   
<updateRequestProcessorChain name="newDomainURL">
        <processor class="solr.RegexReplaceProcessorFactory”>

          <!-- REMOVED: <str name="fieldName">id</str> -->

           <str name="fieldName">url</str>
           <str name="pattern">old\.domain\.com</str>
           <str name="replacement">new.domain.net</str>
         </processor>
        <processor class="solr.LogUpdateProcessorFactory" />
        <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- END rewrite of URL -->

Sincerely,
Dishanker Raj

PGP Public Key: http://goo.gl/TulvBO

On 29 Nov 2013, at 14:59, Jack Krupansky <j...@basetechnology.com> wrote:

> Any chance that you had already indexed some data before your finalized these 
> configuration settings? That pre-existing data would need to be manually 
> reindexed for the update processor to be effective.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Dishanker Raj
> Sent: Friday, November 29, 2013 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: Why is 'solr.RegexReplaceProcessorFactory' not changing fields that 
> are being indexed?
> 
> Hello!
> 
> The following entries in file ‘solrconfig.xml’ are supposed to match and 
> replace an obsolete URL string to a newer one. But after documents are 
> indexed using the ‘/update’ handler I still see the old URL string in the 
> index that should have been replaced while indexing.
> 
> <!-- BEGIN rewrite of URL -->
> <updateRequestProcessorChain name="newDomainURL">
> <processor class="solr.RegexReplaceProcessorFactory">
>  <str name="fieldName">id</str>
>  <str name="fieldName">url</str>
>  <str name="pattern">old\.domain\.com</str>
>  <str name="replacement">new.domain.net</str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> <!-- END rewrite of URL -->
> 
> <requestHandler name="/update" class="solr.UpdateRequestHandler">
> <lst name="defaults">
> <!-- BEGIN rewrite of URL -->
> <str name="update.chain">newDomainURL</str>
> <!-- END rewrite of URL -->
> </lst>
> </requestHandler>
> 
> Anyone got any helpful pointers as to why this is not succeeding? Thanks.
> 
> Sincerely,
> Dishanker Raj
> 
> PGP Public Key: http://goo.gl/TulvBO 

Reply via email to