Hi Andrew,

my experience with XPathEntityProcessor is non-existent. ;-)

Just after a quick look at the method that throws the exception:

  private void addField0(String xpath, String name, boolean multiValued,
                         boolean isRecord) {
List<String> paths = new LinkedList<String>(Arrays.asList(xpath.split("/")));
    if ("".equals(paths.get(0).trim()))
      paths.remove(0);
    rootNode.build(paths, name, multiValued, isRecord);
  }

and your foreach attribute value in combination with the xpath:
> forEach="/">
>                 <field column="content"
> xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']"
> />

I would guess that the double slash at the beginning is not working with your foreach regex. I don't know whether this is something the processor should expect and handle correctly or whether you have to take care of in your configuration.

Cheers,
Chantal

Andrew Clegg schrieb:

Chantal Ackermann wrote:
Hi Andrew,

your inner entity uses an XML type datasource. The default entity
processor is the SQL one, however.

For your inner entity, you have to specify the correct entity processor
explicitly. You do that by adding the attribute "processor", and the
value is the classname of the processor you want to use.

e.g. <entity dataSource="filesystem" name="domain_pdb"
processor="XPathEntityProcessor" ....


Thanks -- I was also missing a forEach expression -- in my case, just "/"
since each XML file contains the information for no more than one document.

However, I'm now getting a different exception:


30-Jul-2009 16:48:52 org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: domain document :
SolrInputDocument[{id=id(1.0)={1udaA02}, title=title(1.0)={PDB code 1uda,
chain A, domain 02}, pdb_code=pdb_code(1.0)={1uda},
doc_type=doc_type(1.0)={domain}, related_ids=related_ids(1.0)={1uda,1udaA}}]
org.apache.solr.handler.dataimport.DataImportHandlerException: Exception
while reading xpaths for fields Processing Document # 1
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:135)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:307)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:372)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:225)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:167)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372)
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.LinkedList.entry(LinkedList.java:365)
        at java.util.LinkedList.get(LinkedList.java:315)
        at
org.apache.solr.handler.dataimport.XPathRecordReader.addField0(XPathRecordReader.java:71)
        at
org.apache.solr.handler.dataimport.XPathRecordReader.<init>(XPathRecordReader.java:50)
        at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:121)
        ... 9 more


My data config now looks like this:


<dataConfig>

    <!-- TODO  change this back to v3.3.0 when the appropriate mapping
tables are available there -->

    <dataSource name="database" driver="org.postgresql.Driver"
url="jdbc:postgresql://cathdb.info/cathdb_v3_2_0" user="***" password="***"
/>

    <dataSource name="filesystem" type="FileDataSource"
basePath="/cath/people/cathdata/v3_3_0/pdb-XML-noatom/" encoding="UTF-8"
connectionTimeout="5000" readTimeout="10000"/>

    <document name="domain">

        <entity name="domain" dataSource="database" query="select domain_id
as id, 'PDB code ' || pdb_code || ', chain ' || chain_code || ', domain ' ||
domain_code as title, pdb_code || ',' || chain_id as related_ids, 'domain'
as doc_type, pdb_code from domain">

            <entity dataSource="filesystem" name="domain_pdb"
url="${domain.pdb_code}-noatom.xml" processor="XPathEntityProcessor"
forEach="/">
                <field column="content"
xpath="//*[local-name()='structCategory']/*[local-name()='struct']/*[local-name()='title']"
/>
            </entity>


        </entity>

    </document>

</dataConfig>


Thanks in advance, again :-)

Andrew.

--
View this message in context: 
http://www.nabble.com/NullPointerException-in-DataImportHandler-tp24739580p24741292.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Chantal Ackermann

Reply via email to