I am having a little fight with the DataImportHandler and the application of RegexTransformer and TemplateTransformer. A stripped down version of what I try in data-config.xml, which is taken pretty much from the various solr wikis:
<dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name="wf" rootEntity="false" dataSource="null" processor="FileListEntityProcessor" baseDir="d:\inetpub\webapps\searchserver\solr\importdaten\import_wiki" fileName="wiki_..\.xml"> <entity name="doc" processor="XPathEntityProcessor" forEach="/mediawiki/page" stream="true" url="${wf.fileAbsolutePath}" transformer="RegexTransformer,HTMLStripTransformer,TemplateTransformer" > <field column="ilang" template="${wf.fileAbsolutePath}" regex=".*?(..)\.xml" replaceWith="$1"/> <field column="HEADER" xpath="/mediawiki/page/title" required="true" stripHTML="true"/> <field column="xxCONTENT" xpath="/mediawiki/page/revision/text"/> <field column="xxCONTENT" regex="(?m)^=====(.+?)=====$" replaceWith="<h4>$1</h4>"/> <!-- more regex transforms here --> <field column="xxCONTENT" stripHTML="true"/> <field column="NGLANG" template="${doc.ilang}" /> <field column="CONTENTPREVIEW" template="${doc.xxCONTENT}"/> </entity> </entity> </document> </dataConfig> The problem is with ilang. The regex is not applied, no matter what I try. Even a straight forward <... regex=".*" replaceWith="en" ...> doesn't work. I always end up with the full pathname. The regexs on xxCONTENT work fine, however. So it's not that my regex is wrong or that regexs don't work at all. I tried all sorts of things like intermediate columns, sourceColumn or different sequences in the transformer attribute. It all lead to different errors. Nothing worked or lead to any clues. What am I doing wrong here? This is with solr 1.4.1. Thanks, Michael