What are you actually trying to do on a business level? Maybe that's
something that can be handled better by sticking an
UpdateRequestProcessor chain _after_ DIH?

As to your configuration, you have xxCONTENT column definition twice.
It might be working, but I think it is non-deterministic. For ilang,
you don't seem to have xpath attribute, so I suspect it is just being
skipped all together.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 4 November 2014 09:05, Lemke, Michael  ST/HZA-ZSW
<lemke...@schaeffler.com> wrote:
> I am having a little fight with the DataImportHandler and the
> application of RegexTransformer and TemplateTransformer.
> A stripped down version of what I try in data-config.xml, which
> is taken pretty much from the various solr wikis:
>
> <dataConfig>
>     <dataSource type="FileDataSource" encoding="UTF-8" />
>     <document>
>          <entity name="wf" rootEntity="false" dataSource="null"
>              processor="FileListEntityProcessor"
>              
> baseDir="d:\inetpub\webapps\searchserver\solr\importdaten\import_wiki"
>              fileName="wiki_..\.xml">
>             <entity name="doc"
>                  processor="XPathEntityProcessor"
>                  forEach="/mediawiki/page"
>                  stream="true"
>                  url="${wf.fileAbsolutePath}"
>                  
> transformer="RegexTransformer,HTMLStripTransformer,TemplateTransformer"
>                  >
>               <field column="ilang" template="${wf.fileAbsolutePath}" 
> regex=".*?(..)\.xml" replaceWith="$1"/>
>               <field column="HEADER" xpath="/mediawiki/page/title" 
> required="true" stripHTML="true"/>
>
>               <field column="xxCONTENT" 
> xpath="/mediawiki/page/revision/text"/>
>               <field column="xxCONTENT" regex="(?m)^=====(.+?)=====$"
>                       replaceWith="&lt;h4&gt;$1&lt;/h4&gt;"/>
>
>               <!-- more regex transforms here -->
>               <field column="xxCONTENT" stripHTML="true"/>
>
>               <field column="NGLANG"             template="${doc.ilang}" />
>               <field column="CONTENTPREVIEW" template="${doc.xxCONTENT}"/>
>             </entity>
>          </entity>
>     </document>
> </dataConfig>
>
> The problem is with ilang.  The regex is not applied, no matter what I try.  
> Even
> a straight forward  <... regex=".*" replaceWith="en" ...> doesn't work.  I 
> always
> end up with the full pathname.
>
> The regexs on xxCONTENT work fine, however.  So it's not that my regex is 
> wrong or
> that regexs don't work at all.
>
> I tried all sorts of things like intermediate columns, sourceColumn or 
> different
> sequences in the transformer attribute.  It all lead to different errors.  
> Nothing
> worked or lead to any clues.
>
> What am I doing wrong here?  This is with solr 1.4.1.
>
>
> Thanks,
> Michael
>

Reply via email to