>This looks fine. Can you post the stack trace?
>
Yep, here is the juicy bit. Let me know if you need more.

Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 2390 ms
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute
INFO: [janesdocs] webapp=/solr path=/dataimport params={command=full-import} 
status=0 QTime=12 
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
INFO: Read dataimport.properties
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
INFO: [janesdocs] REMOVING ALL DOCUMENTS FROM INDEX
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=2
        
commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1232363283058,generation=1,filenames=[segments_1]
        
commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1232363283059,generation=2,filenames=[segments_2]
Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: last commit = 1232363283059
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.EntityProcessorBase 
applyTransformer
WARNING: transformer threw error
java.lang.NullPointerException
        at java.io.StringReader.<init>(StringReader.java:33)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
        at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DocBuilder 
buildDocument
SEVERE: Exception while processing: janescurrent document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.NullPointerException
        at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
        at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Caused by: java.lang.NullPointerException
        at java.io.StringReader.<init>(StringReader.java:33)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
        at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
        ... 9 more
Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter 
doFullImport
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.NullPointerException
        at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
        at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
        at 
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
        at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
        at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
        at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
        at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
        at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
        at 
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
Caused by: java.lang.NullPointerException
        at java.io.StringReader.<init>(StringReader.java:33)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
        at 
org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
        at 
org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
        ... 9 more
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback


>On Mon, Jan 19, 2009 at 4:14 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>
>> Hello all,
>>
>> I have the following DIH data-config.xml file. Adding
>> HTMLStripTransformer and the associated stripHTML on the
>> para tag seems to have broke things. I am using a nightly
>> build from 12-jan-2009
>>
>> The /record/sect1/para contains HTML sub tags which need
>> to be discarded. Is my use of stripHTML correct?
>>
>> <dataConfig>
>>  <dataSource name="myfilereader" type="FileDataSource"/>
>>  <document>
>>     <entity name="jcurrent"
>>        processor="FileListEntityProcessor"
>>        fileName=".*xml"
>>        newerThan="'NOW-1000DAYS'"
>>        recursive="true"
>>        rootEntity="false"
>>        dataSource="null"
>>        baseDir="/Volumes/spare/ts/jxml/data/news/groups">
>>
>>        <entity name="x"
>>           dataSource="myfilereader"
>>           processor="XPathEntityProcessor"
>>           url="${jcurrent.fileAbsolutePath}"
>>           stream="false"
>>           forEach="/record"
>>
>> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>>
>>           <field column="fileAbsPath"
>> template="${jcurrent.fileAbsolutePath}" />
>>           <field column="fileWebPath" regex="/Volumes/spare/ts/(.*)"
>> replaceWith="$1" sourceColName="fileAbsePath"/>
>>           <field column="title"    xpath="/record/title" />
>>           <field column="para"     xpath="/record/sect1/para"
>> stripHTML="true" />
>>           <field column="subject"
>>  xpath="/record/metadata/subje...@qualifier='fullTitle']"   />
>>           <field column="pubname"
>>  xpath="/record/metadata/subje...@qualifier='publication']" />
>>           <field column="pubdate"
>>  xpath="/record/metadata/da...@qualifier='pubDate']"
>> dateTimeFormat="yyyyMMdd"   />
>>           </entity>
>>        </entity>
>>     </document>
>>  </dataConfig>
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               
>> Email:fer...@twig.me.uk<email%3afer...@twig.me.uk>
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
>-- 
>Regards,
>Shalin Shekhar Mangar.

-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to