What are you actually trying to do on a business level? Maybe that's something that can be handled better by sticking an UpdateRequestProcessor chain _after_ DIH?
As to your configuration, you have xxCONTENT column definition twice. It might be working, but I think it is non-deterministic. For ilang, you don't seem to have xpath attribute, so I suspect it is just being skipped all together. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 4 November 2014 09:05, Lemke, Michael ST/HZA-ZSW <lemke...@schaeffler.com> wrote: > I am having a little fight with the DataImportHandler and the > application of RegexTransformer and TemplateTransformer. > A stripped down version of what I try in data-config.xml, which > is taken pretty much from the various solr wikis: > > <dataConfig> > <dataSource type="FileDataSource" encoding="UTF-8" /> > <document> > <entity name="wf" rootEntity="false" dataSource="null" > processor="FileListEntityProcessor" > > baseDir="d:\inetpub\webapps\searchserver\solr\importdaten\import_wiki" > fileName="wiki_..\.xml"> > <entity name="doc" > processor="XPathEntityProcessor" > forEach="/mediawiki/page" > stream="true" > url="${wf.fileAbsolutePath}" > > transformer="RegexTransformer,HTMLStripTransformer,TemplateTransformer" > > > <field column="ilang" template="${wf.fileAbsolutePath}" > regex=".*?(..)\.xml" replaceWith="$1"/> > <field column="HEADER" xpath="/mediawiki/page/title" > required="true" stripHTML="true"/> > > <field column="xxCONTENT" > xpath="/mediawiki/page/revision/text"/> > <field column="xxCONTENT" regex="(?m)^=====(.+?)=====$" > replaceWith="<h4>$1</h4>"/> > > <!-- more regex transforms here --> > <field column="xxCONTENT" stripHTML="true"/> > > <field column="NGLANG" template="${doc.ilang}" /> > <field column="CONTENTPREVIEW" template="${doc.xxCONTENT}"/> > </entity> > </entity> > </document> > </dataConfig> > > The problem is with ilang. The regex is not applied, no matter what I try. > Even > a straight forward <... regex=".*" replaceWith="en" ...> doesn't work. I > always > end up with the full pathname. > > The regexs on xxCONTENT work fine, however. So it's not that my regex is > wrong or > that regexs don't work at all. > > I tried all sorts of things like intermediate columns, sourceColumn or > different > sequences in the transformer attribute. It all lead to different errors. > Nothing > worked or lead to any clues. > > What am I doing wrong here? This is with solr 1.4.1. > > > Thanks, > Michael >