Incidentally, I tried adding this:
<datasource name="f" type="FieldReaderDataSource" />
<document>
<entity dataSource="f" processor="XPathEntityProcessor"
dataField="d.text" forEach="/MESSAGE">
<field column="body" xpath="//BODY"/>
</entity>
</document>
But this didn't seem to change anything.
Any insight is appreciated.
Thanks.
From: Neil Chaudhuri
Sent: Wednesday, March 17, 2010 3:24 PM
To: [email protected]
Subject: XPath Processing Applied to Clob
I am using the DataImportHandler to index 3 fields in a table: an id, a date,
and the text of a document. This is an Oracle database, and the document is an
XML document stored as Oracle's xmltype data type. Since this is nothing more
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML.
However, I don't want to index/store all the XML but instead just the XML
within a set of tags. The XPath itself is trivial, but it seems like the
XPathEntityProcessor only works for XML file content rather than the output of
a Transformer.
Here is what I currently have that fails:
<document>
<entity name="doc" query="SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID,
d.XML.getClobVal() AS TEXT FROM DOC d" transformer="ClobTransformer">
<field column="EFFECTIVE_DT" name="effectiveDate" />
<field column="ARCHIVE_ID" name="id" />
<field column="TEXT" name="text" clob="true">
<entity name="text" processor="XPathEntityProcessor"
forEach="/MESSAGE" url="${doc.text}">
<field column="body" xpath="//BODY"/>
</entity>
</entity>
</document>
Is there an easy way to do this without writing my own custom transformer?
Thanks.