>This looks fine. Can you post the stack trace? > Yep, here is the juicy bit. Let me know if you need more.
Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 2390 ms Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute INFO: [janesdocs] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=12 Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [janesdocs] REMOVING ALL DOCUMENTS FROM INDEX Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=2 commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1232363283058,generation=1,filenames=[segments_1] commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1232363283059,generation=2,filenames=[segments_2] Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: last commit = 1232363283059 Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.EntityProcessorBase applyTransformer WARNING: transformer threw error java.lang.NullPointerException at java.io.StringReader.<init>(StringReader.java:33) at org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) at org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: janescurrent document : null org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) Caused by: java.lang.NullPointerException at java.io.StringReader.<init>(StringReader.java:33) at org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) at org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) ... 9 more Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) Caused by: java.lang.NullPointerException at java.io.StringReader.<init>(StringReader.java:33) at org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) at org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) ... 9 more Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback >On Mon, Jan 19, 2009 at 4:14 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: > >> Hello all, >> >> I have the following DIH data-config.xml file. Adding >> HTMLStripTransformer and the associated stripHTML on the >> para tag seems to have broke things. I am using a nightly >> build from 12-jan-2009 >> >> The /record/sect1/para contains HTML sub tags which need >> to be discarded. Is my use of stripHTML correct? >> >> <dataConfig> >> <dataSource name="myfilereader" type="FileDataSource"/> >> <document> >> <entity name="jcurrent" >> processor="FileListEntityProcessor" >> fileName=".*xml" >> newerThan="'NOW-1000DAYS'" >> recursive="true" >> rootEntity="false" >> dataSource="null" >> baseDir="/Volumes/spare/ts/jxml/data/news/groups"> >> >> <entity name="x" >> dataSource="myfilereader" >> processor="XPathEntityProcessor" >> url="${jcurrent.fileAbsolutePath}" >> stream="false" >> forEach="/record" >> >> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer"> >> >> <field column="fileAbsPath" >> template="${jcurrent.fileAbsolutePath}" /> >> <field column="fileWebPath" regex="/Volumes/spare/ts/(.*)" >> replaceWith="$1" sourceColName="fileAbsePath"/> >> <field column="title" xpath="/record/title" /> >> <field column="para" xpath="/record/sect1/para" >> stripHTML="true" /> >> <field column="subject" >> xpath="/record/metadata/subje...@qualifier='fullTitle']" /> >> <field column="pubname" >> xpath="/record/metadata/subje...@qualifier='publication']" /> >> <field column="pubdate" >> xpath="/record/metadata/da...@qualifier='pubDate']" >> dateTimeFormat="yyyyMMdd" /> >> </entity> >> </entity> >> </document> >> </dataConfig> >> >> -- >> >> =============================================================== >> Fergus McMenemie >> Email:fer...@twig.me.uk<email%3afer...@twig.me.uk> >> Techmore Ltd Phone:(UK) 07721 376021 >> >> Unix/Mac/Intranets Analyst Programmer >> =============================================================== >> > > > >-- >Regards, >Shalin Shekhar Mangar. -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================