RegexTransformer does not replace the placeholders before processing the regex.
it has to be enhanced



On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
> Hello
>
> As per several postings I noted that I can define variables
> inside an invariants list section of the DIH handler of
> solrconfig.xml:-
>
>  <requestHandler name="/dataimport" 
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>    <lst name="defaults">
>       <str name="config">data-config.xml</str>
>       </lst>
>    <lst name="invariants">
>       <str name="finstalldir">/Volumes/spare/ts</str>
>       </lst>
>    </requestHandler>
>
>
> I can also reference these variables within data-config.xml. This
> works,  the solr field "test" is nicely populated. However how do
> I use this variable within my regex transformer? Here is my
> data-config.xml:-
>
>   <dataConfig>
>   <dataSource name="myfilereader" type="FileDataSource"/>
>    <document>
>       <entity name="jc"
>               processor="FileListEntityProcessor"
>               fileName="^.*\.xml$"
>               newerThan="'NOW-1000DAYS'"
>               recursive="true"
>               rootEntity="false"
>               dataSource="null"
>               baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
>          <entity name="x"
>                  dataSource="myfilereader"
>                  processor="XPathEntityProcessor"
>                  url="${jc.fileAbsolutePath}"
>                  stream="false"
>                  forEach="/record"
>                  
> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>
>   <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>   <field column="fileWebPath"      
> regex="${dataimporter.request.finstalldir}(.*)" replaceWith="$1" 
> sourceColName="fileAbsolutePath"/>
>   <field column="test"             
> template="${dataimporter.request.finstalldir}" />
>   <field column="title"            xpath="/record/title" />
>   <field column="para"             xpath="/record/sect1/para" 
> stripHTML="true" />
>   <field column="date"             
> xpath="/record/metadata/da...@qualifier='Date']" dateTimeFormat="yyyyMMdd"   
> />
>             </entity>
>       </entity>
>       </document>
>    </dataConfig>
>
> indexing my content I get an error as follows:-
>
>
> INFO: SolrDeletionPolicy.onInit: commits:num=2
>        
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_7,version=1233583868834,generation=7,filenames=[_7.frq,
>  _4.fdt, _7.tii, _7.fnm, _4.fdx, _7.tis, segments_7, _7.nrm, _7.prx]
>        
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_8,version=1233583868835,generation=8,filenames=[segments_8]
> Feb 2, 2009 5:00:50 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1233583868835
> Feb 2, 2009 5:00:57 PM org.apache.solr.handler.dataimport.EntityProcessorBase 
> applyTransformer
> WARNING: transformer threw error
> java.util.regex.PatternSyntaxException: Illegal repetition near index 0
> ${dataimporter.request.finstalldir}(.*)
> ^
>        at java.util.regex.Pattern.error(Pattern.java:1650)
>        at java.util.regex.Pattern.closure(Pattern.java:2706)
>        at java.util.regex.Pattern.sequence(Pattern.java:1798)
>        at java.util.regex.Pattern.expr(Pattern.java:1687)
>        at java.util.regex.Pattern.compile(Pattern.java:1397)
>        at java.util.regex.Pattern.<init>(Pattern.java:1124)
>        at java.util.regex.Pattern.compile(Pattern.java:817)
>        at 
> org.apache.solr.handler.dataimport.RegexTransformer.getPattern(RegexTransformer.java:129)
>        at 
> org.apache.solr.handler.dataimport.RegexTransformer.process(RegexTransformer.java:88)
>        at 
> org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:74)
>        at 
> org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:42)
>        at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>        at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>        at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>        at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:333)
>        at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:359)
>        at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
>        at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
>        at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
>        at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
>        at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
>
>
> Is there some simple escape or other syntax to be used or is
> this an enhancement?
>
> Regards Fergus.
> --
>
> ===============================================================
> Fergus McMenemie               Email:fer...@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Reply via email to