Hi Gora,

The column type in DB is BLOB. It only stores binary data.

If I do not use TikaEntityProcessor, then the following exception occurs:

        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
59163 [Thread-16] ERROR org.apache.solr.handler.dataimport.DocBuilder  รป
Exception while processing: messages document : SolrInputDocument(fields:
[id
=2158]):org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.ClassCastException: oracle.jdbc.driver.OracleBlobInputStream
cannot b
e cast to java.util.Iterator
        at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro
cessor.java:65)
        at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProce
ssor.java:73)
        at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProc
essorWrapper.java:243)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
469)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
495)
        at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:
408)
        at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323
)
        at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
        at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja
va:411)
        at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476
)
        at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457)
Caused by: java.lang.ClassCastException:
oracle.jdbc.driver.OracleBlobInputStream cannot be cast to
java.util.Iterator
        at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro
cessor.java:59)
        ... 10 more



I have used ClobTransformer in data-config file as bellow and even then it
is not working:

<dataConfig>
<dataSource name="db" driver="oracle.jdbc.driver.OracleDriver"
url="jdbc:oracle:thin:@//a.a.a.a:a/d11gr21" user="aaaa" password="aaaaa" />
 <dataSource name="dastream" type="FieldStreamDataSource"/>
 <document>
  <entity 
      name="messages" pk="x_MSG_PK" 
      query="select * from table1"
      dataSource="db">
         <field column ="x_MSG_PK" name ="id" />
        <entity name="message"
                        transformer="ClobTransformer"
                                dataSource="dastream"
                         processor="TikaEntityProcessor"
                                  dataField="messages.MESSAGE"
                                format="text">
                        <field column="text" name="mxMsg" clob="true"/>
        </entity>
    </entity> 
 </document>
</dataConfig>


So, what changes do I need?

-Chandan


-----Original Message-----
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Monday, February 24, 2014 5:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Can not index raw binary data stored in Database in BLOB
format.

On 24 February 2014 15:34, Chandan khatua <chand...@nrifintech.com> wrote:
> Hi Gora !
>
> Your concern was "What is the type of the column used to store the 
> binary data in Oracle?"
> The column type is BLOB in DB.  The column can also have rich text file.

Um, your original message said that it does *not* contain richtext data. How
do you tell whether it has richtext data, or not? For just a binary blob,
the ClobTransformer should work, but you need the TikaEntityProcessor for
richtext data. If you do not know whether the data in the blob is richtext
or not, you will need to roll your own solution to determine that.

Regards,
Gora

Reply via email to