Hi Gora, The column type in DB is BLOB. It only stores binary data.
If I do not use TikaEntityProcessor, then the following exception occurs: at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457) 59163 [Thread-16] ERROR org.apache.solr.handler.dataimport.DocBuilder รป Exception while processing: messages document : SolrInputDocument(fields: [id =2158]):org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.ClassCastException: oracle.jdbc.driver.OracleBlobInputStream cannot b e cast to java.util.Iterator at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro cessor.java:65) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProce ssor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProc essorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 469) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 495) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 408) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323 ) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.ja va:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476 ) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457) Caused by: java.lang.ClassCastException: oracle.jdbc.driver.OracleBlobInputStream cannot be cast to java.util.Iterator at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityPro cessor.java:59) ... 10 more I have used ClobTransformer in data-config file as bellow and even then it is not working: <dataConfig> <dataSource name="db" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@//a.a.a.a:a/d11gr21" user="aaaa" password="aaaaa" /> <dataSource name="dastream" type="FieldStreamDataSource"/> <document> <entity name="messages" pk="x_MSG_PK" query="select * from table1" dataSource="db"> <field column ="x_MSG_PK" name ="id" /> <entity name="message" transformer="ClobTransformer" dataSource="dastream" processor="TikaEntityProcessor" dataField="messages.MESSAGE" format="text"> <field column="text" name="mxMsg" clob="true"/> </entity> </entity> </document> </dataConfig> So, what changes do I need? -Chandan -----Original Message----- From: Gora Mohanty [mailto:g...@mimirtech.com] Sent: Monday, February 24, 2014 5:49 PM To: solr-user@lucene.apache.org Subject: Re: Can not index raw binary data stored in Database in BLOB format. On 24 February 2014 15:34, Chandan khatua <chand...@nrifintech.com> wrote: > Hi Gora ! > > Your concern was "What is the type of the column used to store the > binary data in Oracle?" > The column type is BLOB in DB. The column can also have rich text file. Um, your original message said that it does *not* contain richtext data. How do you tell whether it has richtext data, or not? For just a binary blob, the ClobTransformer should work, but you need the TikaEntityProcessor for richtext data. If you do not know whether the data in the blob is richtext or not, you will need to roll your own solution to determine that. Regards, Gora