Hi Andrew,
my experience with XPathEntityProcessor is non-existent. ;-)
Just after a quick look at the method that throws the exception:
private void addField0(String xpath, String name, boolean multiValued,
boolean isRecord) {
List<String> paths = new
LinkedList<String>(Arrays.asList(xpath.split("/")));
if ("".equals(paths.get(0).trim()))
paths.remove(0);
rootNode.build(paths, name, multiValued, isRecord);
}
and your foreach attribute value in combination with the xpath:
> forEach="/">
> <field column="content"
>
xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']"
> />
I would guess that the double slash at the beginning is not working with
your foreach regex. I don't know whether this is something the processor
should expect and handle correctly or whether you have to take care of
in your configuration.
Cheers,
Chantal
Andrew Clegg schrieb:
Chantal Ackermann wrote:
Hi Andrew,
your inner entity uses an XML type datasource. The default entity
processor is the SQL one, however.
For your inner entity, you have to specify the correct entity processor
explicitly. You do that by adding the attribute "processor", and the
value is the classname of the processor you want to use.
e.g. <entity dataSource="filesystem" name="domain_pdb"
processor="XPathEntityProcessor" ....
Thanks -- I was also missing a forEach expression -- in my case, just "/"
since each XML file contains the information for no more than one document.
However, I'm now getting a different exception:
30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.LinkedList.entry(LinkedList.java:365)
at java.util.LinkedList.get(LinkedList.java:315)
at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
at
org.apache.solr.handler.dataimport.XPathRecordReader.<init>(XPathRecordReader.java:50)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
... 9 more
My data config now looks like this:
<dataConfig>
<!-- TODO change this back to v3.3.0 when the appropriate mapping
tables are available there -->
<dataSource name="database" driver="org.postgresql.Driver"
url="jdbc:postgresql://cathdb.info/cathdb_v3_2_0" user="***" password="***"
/>
<dataSource name="filesystem" type="FileDataSource"
basePath="/cath/people/cathdata/v3_3_0/pdb-XML-noatom/" encoding="UTF-8"
connectionTimeout="5000" readTimeout="10000"/>
<document name="domain">
<entity name="domain" dataSource="database" query="select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, pdb_code || ',' || chain_id as related_ids, 'domain'
as doc_type, pdb_code from domain">
<entity dataSource="filesystem" name="domain_pdb"
url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor"
forEach="/">
<field column="content"
xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']"
/>
</entity>
</entity>
</document>
</dataConfig>
Thanks in advance, again :-)
Andrew.
--
View this message in context:
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Chantal Ackermann