I've tried as per your guide. But, no data are indexing. The output of Query screen looks like :
<doc> <str name="id">2158</str> <arr name="mxMsg"> <str><?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="Content-Type" content="application/octet-stream"/> <title/> </head> <body/></html></str> </arr> <long name="_version_">1460918369230258176</long></doc> But, the indexed data should be displayed within <body> tag. When xml message are stored in DB in BLOB type, then indexing is done smoothly. But, I am trying to index binary data which are stored in DB in BLOB type. Need help. Thanking you, Chandan -----Original Message----- From: Raymond Wiker [mailto:rwi...@gmail.com] Sent: Monday, February 24, 2014 4:38 PM To: solr-user@lucene.apache.org Subject: Re: Can not index raw binary data stored in Database in BLOB format. Try replacing the inner entity with something like <entity name="message" dataSource="dastream" processor="TikaEntityProcessor" dataField="messages.MESSAGE" format="xml"> <field column="text" name="mxMsg"/> </entity> --- this assumes that you get the blob from a column named "MESSAGE" in the outer entity ("messages"). On Mon, Feb 24, 2014 at 11:51 AM, Chandan khatua <chand...@nrifintech.com>wrote: > Hi Raymond ! > > I've data-config.xml like bellow: > > <?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource > name="db" driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@//x.x.x.x:x/d11gr21" user="x" password="x"/> > <dataSource name="dastream" type="FieldStreamDataSource" /> > <document> > <entity > name="messages" pk=" PK" transformer='DateFormatTransformer' > query="select * from table1" > dataSource="db"> > <field column =" PK" name ="id" /> > <field column="last_modified" dateTimeFormat="YYYY-MM-DD > HH24:MI:SS" locale="en" /> > <entity > name="message" > dataSource="dastream" > processor="TikaEntityProcessor" > url="message" > dataField="db.MESSAGE" > format="text" > > > > <field column="text" name="mxMsg" blob="true"/> > </entity> > </entity> > > > </document> > </dataConfig> > > > > This is looks like similar to your configuration. But when xml data > are in BLOB in database, indexing is done. But, when binary data are > in BLOB in database, indexing is NOT done. > Please help. > > Thanking you, > -Chandan > > > -----Original Message----- > From: Raymond Wiker [mailto:rwi...@gmail.com] > Sent: Monday, February 24, 2014 4:06 PM > To: solr-user@lucene.apache.org > Subject: Re: Can not index raw binary data stored in Database in BLOB > format. > > I've done something like this; the key was to use a > FieldStreamDataSource to read from the BLOB field. > > Something like > > <datasource name="main" ...> > <dataSource type="FieldStreamDataSource" name="fieldstream"/> > > then > > <entity name="tika" processor="TikaEntityProcessor" > dataField="main.BLOB" dataSource="fieldstream" format="xml"> > <field column="Author" meta="true" name="..."/> > <field column="title" meta="true" name="title"/> > <field column="text" name="content"/> > <field column="content_type" name="content_type" meta="true"/> > <field column="last_modified" name="last_modified" meta="true"/> > </entity> > > ... > > > > > On Mon, Feb 24, 2014 at 11:04 AM, Chandan khatua > <chand...@nrifintech.com>wrote: > > > Hi Gora ! > > > > Your concern was "What is the type of the column used to store the > > binary data in Oracle?" > > The column type is BLOB in DB. The column can also have rich text file. > > > > Regards, > > Chandan > > > > > > -----Original Message----- > > From: Gora Mohanty [mailto:g...@mimirtech.com] > > Sent: Monday, February 24, 2014 3:02 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Can not index raw binary data stored in Database in > > BLOB format. > > > > On 24 February 2014 12:51, Chandan khatua <chand...@nrifintech.com> > wrote: > > > Hi, > > > > > > > > > > > > We have raw binary data stored in database(not word,excel,xml etc > > > files) in BLOB. > > > > > > We are trying to index using TikaEntityProcessor but nothing seems > > > to get indexed. > > > > > > But the same configuration works when xml/word/excel files are > > > stored in the BLOB field. > > > > Please start by reviewing > > http://wiki.apache.org/solr/DataImportHandler as the above seems > > quite confused. Why are you using TikaEntityProcessor if the data in > > the DB are not richtext files? > > > > What is the type of the column used to store the binary data in > > Oracle? You might be able to convert it with a ClobTransformer. > > Please see > > http://wiki.apache.org/solr/DataImportHandler#ClobTransformer > > > > http://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_t > > ab > > le_are > > _added_to_the_Solr_document_as_object_strings_like_B.401f23c5 > > > > Regards, > > Gora > > > > > >