RegexTransformer does not replace the placeholders before processing the regex. it has to be enhanced
On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: > Hello > > As per several postings I noted that I can define variables > inside an invariants list section of the DIH handler of > solrconfig.xml:- > > <requestHandler name="/dataimport" > class="org.apache.solr.handler.dataimport.DataImportHandler"> > <lst name="defaults"> > <str name="config">data-config.xml</str> > </lst> > <lst name="invariants"> > <str name="finstalldir">/Volumes/spare/ts</str> > </lst> > </requestHandler> > > > I can also reference these variables within data-config.xml. This > works, the solr field "test" is nicely populated. However how do > I use this variable within my regex transformer? Here is my > data-config.xml:- > > <dataConfig> > <dataSource name="myfilereader" type="FileDataSource"/> > <document> > <entity name="jc" > processor="FileListEntityProcessor" > fileName="^.*\.xml$" > newerThan="'NOW-1000DAYS'" > recursive="true" > rootEntity="false" > dataSource="null" > baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data"> > <entity name="x" > dataSource="myfilereader" > processor="XPathEntityProcessor" > url="${jc.fileAbsolutePath}" > stream="false" > forEach="/record" > > transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer"> > > <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" /> > <field column="fileWebPath" > regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" > sourceColName="fileAbsolutePath"/> > <field column="test" > template="${dataimporter.request.finstalldir}" /> > <field column="title" xpath="/record/title" /> > <field column="para" xpath="/record/sect1/para" > stripHTML="true" /> > <field column="date" > xpath="/record/metadata/da...@qualifier='Date']" dateTimeFormat="yyyyMMdd" > /> > </entity> > </entity> > </document> > </dataConfig> > > indexing my content I get an error as follows:- > > > INFO: SolrDeletionPolicy.onInit: commits:num=2 > > commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_7,version=1233583868834,generation=7,filenames=[_7.frq, > _4.fdt, _7.tii, _7.fnm, _4.fdx, _7.tis, segments_7, _7.nrm, _7.prx] > > commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_8,version=1233583868835,generation=8,filenames=[segments_8] > Feb 2, 2009 5:00:50 PM org.apache.solr.core.SolrDeletionPolicy updateCommits > INFO: last commit = 1233583868835 > Feb 2, 2009 5:00:57 PM org.apache.solr.handler.dataimport.EntityProcessorBase > applyTransformer > WARNING: transformer threw error > java.util.regex.PatternSyntaxException: Illegal repetition near index 0 > ${dataimporter.request.finstalldir}(.*) > ^ > at java.util.regex.Pattern.error(Pattern.java:1650) > at java.util.regex.Pattern.closure(Pattern.java:2706) > at java.util.regex.Pattern.sequence(Pattern.java:1798) > at java.util.regex.Pattern.expr(Pattern.java:1687) > at java.util.regex.Pattern.compile(Pattern.java:1397) > at java.util.regex.Pattern.<init>(Pattern.java:1124) > at java.util.regex.Pattern.compile(Pattern.java:817) > at > org.apache.solr.handler.dataimport.RegexTransformer.getPattern(RegexTransformer.java:129) > at > org.apache.solr.handler.dataimport.RegexTransformer.process(RegexTransformer.java:88) > at > org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:74) > at > org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:42) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:333) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:359) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365) > > > Is there some simple escape or other syntax to be used or is > this an enhancement? > > Regards Fergus. > -- > > =============================================================== > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > =============================================================== > -- --Noble Paul