Hello:
I'm trying to make use of FieldReaderDataSource so that I can read a (Oracle)
database CLOB, and then use XPathEntityProcessor to derive Solr field values
via xpath notation.
For an extra bit of fun, the CLOB itself is base 64 encoded and gzip'd. I
created a transformer of my own to take care of the encoding and compression
and that seems to work. I patterned the new transformer after the existing
ones (Solr 3.1 trunk). Anyway, I can see in catalina.out, my own debug output:
------------- Processing field: {toWrite=false, clob=true, column=SUMMARY_XML,
boost=1.0, gzip64=true}
------------- Updated field: SUMMARY_XML to type: java.lang.String value:
'<node id="ING:2ylbg" name="LOC677213" type="gene"><synonym-list><synonym
name="LOC677213"/></synonym-list><macromolecule-list><macromolecule id="677213"
source="EG" species="MM" name="similar to U2AF homology motif (UHM) kinase 1"
summary=""/></macromolecule-list><member-of></member-of><molecular-function></molecular-function><biological-process></biological-process><cellular-component></cellular-component><pathway-list></pathway-list><protein-family><term
name="unknown"/></protein-family><subcellular-location></subcellular-location><top-findings></top-findings><additional-findings></additional-findings><reference-list
finding-count="0"></reference-list><copyright>©2000-2010 Ingenuity
Systems, Inc. All rights reserved.</copyright></node>'
So, the transformer replaces the original CLOB extracted by ClobTransformer
with a String representing the decoded result. I then want to feed this XML
string to XPathEntityProcessor. So, in my DIH data config file:
<dataConfig>
<dataSource
name="ipsDb"
type="JdbcDataSource"
driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))"
user="user"
password="password"
/>
<datasource
name="fieldSource"
type="FieldReaderDataSource"
/>
<document>
<entity
rootEntity="false"
name="ipsNode"
dataSource="ipsDb"
query="select SUMMARY_XML from IPS_NODE where ROWNUM < 10"
transformer="ClobTransformer,com.ingenuity.isec.util.SolrDihGzip64Transformer">
<field column="SUMMARY_XML" clob="true" gzip64="true"/>
<entity
name="node"
dataSource="fieldSource"
dataField="ipsNode.SUMMARY_XML"
processor="XPathEntityProcessor"
forEach="/node">
<field column="n_id" xpath="/node/@id"/>
<field column="n_name" xpath="/node/@name"/>
...
</entity>
</entity>
</document>
</dataConfig>
Basically, I'm trying to specify the (former CLOB, now String) SUMMARY_XML
field as the data field for the FieldReaderDataSource. I can see it has the
ability to simply return a StringReader() for String fields, rather than have
to deal with a Clob itself. So, I figured FieldReaderDataSource would be happy
with that and it would supply XPathEntityProcessor with XML contained in the
field's value.
But, when I do a full import, I see this:
Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Mar 4, 2011 9:10:26 AM org.apache.solr.core.SolrCore execute
INFO: [ing-nodes] webapp=/solr path=/select
params={clean=false&commit=true&command=full-import&qt=/dataimport-ips}
status=0 QTime=31
Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
WARNING: Unable to read: dataimport-ips.properties
Mar 4, 2011 9:10:26 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity ipsNode with URL:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1838
Mar 4, 2011 9:10:28 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Creating a connection for entity node with URL:
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac1-vip)(PORT=1537))(ADDRESS=(PROTOCOL=TCP)(HOST=ueipa1rac2-vip)(PORT=1537))(sdu=8760)(LOAD_BALANCE=yes)(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EIPS1R)))
Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 1110
Mar 4, 2011 9:10:29 AM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
SEVERE: Exception while processing: ipsNode document : null
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: null Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:262)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:203)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:183)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:586)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:612)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:266)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:374)
Caused by: java.sql.SQLException: SQL statement to execute cannot be empty or
null
at
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
at
oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208)
at oracle.jdbc.driver.OracleSql.initialize(OracleSql.java:112)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1683)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1662)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:246)
... 13 more
I looks like XPathEntityProcessor is not using the FieldReaderDataSource I
configured for the "node" entity. Instead it creates another JDBC connection
for the "node" entity and the stack trace indicates
XPathEntityProcessor.initQuery() invokes the getData() method of that data
source rather than the FieldReaderDataSource.
I see in initQuery():
private void initQuery(String s) {
Reader data = null;
try {
final List<Map<String, Object>> rows = new ArrayList<Map<String,
Object>>();
try {
data = dataSource.getData(s);
...
dataSource is set up as:
@Override
@SuppressWarnings("unchecked")
public void init(Context context) {
super.init(context);
if (xpathReader == null)
initXpathReader();
pk = context.getEntityAttribute("pk");
dataSource = context.getDataSource();
rowIterator = null;
}
I'm not sure how all of these DIH components work, but it seems
context.getDataSource() must be returning the JDBC data source configured for
the outer entity (ipsNode), not the FieldDataSource configured for the inner
entity (node) where I'm making use of XPathEntityProcessor.
What am I missing conceptually? I've found a few of references to the very
same problem, and I think I'm following the same pattern.
Thanks for any insights you can share,
Jeff
--
Jeff Schmidt
535 Consulting
[email protected]
http://www.535consulting.com