You should declare this

<str name="update.chain">nohtml</str>

in the "defaults" section of the RequestHandler that corresponds to your dataimporthandler. You should have something like this:

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
        <lst name="defaults">
            <str name="config">dih-config.xml</str>
            <str name="update.chain">nohtml/str>
        </lst>
    </requestHandler>

Otherwise the default update chain will be called (and your URP are not part of that). The solrj, behind the scenes, is a client of the /update request handler, that's the reason why using that you can see your URP working.

Best,
Gazza


On 08/22/2013 05:35 PM, Shawn Heisey wrote:
I have an updateProcessor defined. It seems to work perfectly when I index with SolrJ, but when I use DIH (which I do for a full index rebuild), it doesn't work. This is the case with both Solr 4.4 and Solr 4.5-SNAPSHOT, svn revision 1516342.

Here's a solrconfig.xml excerpt:

<updateRequestProcessorChain name="nohtml">
  <!-- First pass converts entities and strips html. -->
  <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
    <str name="fieldName">ft_text</str>
    <str name="fieldName">ft_subject</str>
    <str name="fieldName">keywords</str>
    <str name="fieldName">text_preview</str>
  </processor>
  <!-- Second pass fixes dually-encoded stuff. -->
  <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
    <str name="fieldName">ft_text</str>
    <str name="fieldName">ft_subject</str>
    <str name="fieldName">keywords</str>
    <str name="fieldName">text_preview</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

  <requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
      <str name="update.chain">nohtml</str>
    </lst>
  </requestHandler>

If I turn on DEBUG logging for FieldMutatingUpdateProcessorFactory, I see "replace value" debugs, but the contents of the index are only changed if the update happens with SolrJ, not with DIH.

A side issue. FieldMutatingUpdateProcessorFactory has the following line in it, at about line 72:

        if (destVal != srcVal) {

Shouldn't this be the following?

        if (destVal.equals(srcVal)) {

Thanks,
Shawn

Reply via email to