>This looks fine. Can you post the stack trace?
>
Yep, here is the juicy bit. Let me know if you need more.
Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 2390 ms
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute
INFO: [janesdocs] webapp=/solr path=/dataimport params={command=full-import}
status=0 QTime=12
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [janesdocs] REMOVING ALL DOCUMENTS FROM INDEX
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=2
commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1232363283058,generation=1,filenames=[segments_1]
commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1232363283059,generation=2,filenames=[segments_2]
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1232363283059
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.EntityProcessorBase
applyTransformer
WARNING: transformer threw error
java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:33)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: janescurrent document : null
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Caused by: java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:33)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
... 9 more
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NullPointerException
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Caused by: java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:33)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
at
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
... 9 more
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
>On Mon, Jan 19, 2009 at 4:14 PM, Fergus McMenemie <[email protected]> wrote:
>
>> Hello all,
>>
>> I have the following DIH data-config.xml file. Adding
>> HTMLStripTransformer and the associated stripHTML on the
>> para tag seems to have broke things. I am using a nightly
>> build from 12-jan-2009
>>
>> The /record/sect1/para contains HTML sub tags which need
>> to be discarded. Is my use of stripHTML correct?
>>
>> <dataConfig>
>> <dataSource name="myfilereader" type="FileDataSource"/>
>> <document>
>> <entity name="jcurrent"
>> processor="FileListEntityProcessor"
>> fileName=".*xml"
>> newerThan="'NOW-1000DAYS'"
>> recursive="true"
>> rootEntity="false"
>> dataSource="null"
>> baseDir="/Volumes/spare/ts/jxml/data/news/groups">
>>
>> <entity name="x"
>> dataSource="myfilereader"
>> processor="XPathEntityProcessor"
>> url="${jcurrent.fileAbsolutePath}"
>> stream="false"
>> forEach="/record"
>>
>> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>>
>> <field column="fileAbsPath"
>> template="${jcurrent.fileAbsolutePath}" />
>> <field column="fileWebPath" regex="/Volumes/spare/ts/(.*)"
>> replaceWith="$1" sourceColName="fileAbsePath"/>
>> <field column="title" xpath="/record/title" />
>> <field column="para" xpath="/record/sect1/para"
>> stripHTML="true" />
>> <field column="subject"
>> xpath="/record/metadata/subje...@qualifier='fullTitle']" />
>> <field column="pubname"
>> xpath="/record/metadata/subje...@qualifier='publication']" />
>> <field column="pubdate"
>> xpath="/record/metadata/da...@qualifier='pubDate']"
>> dateTimeFormat="yyyyMMdd" />
>> </entity>
>> </entity>
>> </document>
>> </dataConfig>
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie
>> Email:[email protected]<email%[email protected]>
>> Techmore Ltd Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets Analyst Programmer
>> ===============================================================
>>
>
>
>
>--
>Regards,
>Shalin Shekhar Mangar.
--
===============================================================
Fergus McMenemie Email:[email protected]
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================