I am having a little fight with the DataImportHandler and the 
application of RegexTransformer and TemplateTransformer.  
A stripped down version of what I try in data-config.xml, which 
is taken pretty much from the various solr wikis:

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
         <entity name="wf" rootEntity="false" dataSource="null"
             processor="FileListEntityProcessor"
             
baseDir="d:\inetpub\webapps\searchserver\solr\importdaten\import_wiki"
             fileName="wiki_..\.xml">
            <entity name="doc"
                 processor="XPathEntityProcessor"
                 forEach="/mediawiki/page"
                 stream="true"
                 url="${wf.fileAbsolutePath}"
                 
transformer="RegexTransformer,HTMLStripTransformer,TemplateTransformer"
                 >
              <field column="ilang" template="${wf.fileAbsolutePath}" 
regex=".*?(..)\.xml" replaceWith="$1"/>
              <field column="HEADER" xpath="/mediawiki/page/title" 
required="true" stripHTML="true"/>

              <field column="xxCONTENT" xpath="/mediawiki/page/revision/text"/>
              <field column="xxCONTENT" regex="(?m)^=====(.+?)=====$"
                      replaceWith="&lt;h4&gt;$1&lt;/h4&gt;"/>

              <!-- more regex transforms here -->
              <field column="xxCONTENT" stripHTML="true"/>

              <field column="NGLANG"             template="${doc.ilang}" />
              <field column="CONTENTPREVIEW" template="${doc.xxCONTENT}"/>
            </entity>
         </entity>
    </document>
</dataConfig>

The problem is with ilang.  The regex is not applied, no matter what I try.  
Even 
a straight forward  <... regex=".*" replaceWith="en" ...> doesn't work.  I 
always
end up with the full pathname.

The regexs on xxCONTENT work fine, however.  So it's not that my regex is wrong 
or 
that regexs don't work at all.

I tried all sorts of things like intermediate columns, sourceColumn or different
sequences in the transformer attribute.  It all lead to different errors.  
Nothing
worked or lead to any clues.

What am I doing wrong here?  This is with solr 1.4.1.


Thanks,
Michael

Reply via email to