> Ok, I'm trying to integrate the TikaEntityProcessor as suggested. I'm using > Solr Version: 1.4.0 and getting the following error: > > java.lang.ClassNotFoundException: Unable to load BinURLDataSource or > org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is not a part of Solr 1.4.0/1.4.1 release. You should use trunk / nightly builds. https://issues.apache.org/jira/browse/SOLR-1583
> My data-config.xml looks like this: > > <dataConfig> > <dataSource type="JdbcDataSource" > driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@whatever:12345:whatever" > user="me" > name="ds-db" > password="secret"/> > > <dataSource type="BinURLDataSource" > name="ds-url"/> > > <document> > <entity name="my_database" > dataSource="ds-db" > query="select * from my_database where rownum <=2"> > <field column="CONTENT_ID" name="content_id"/> > <field column="CMS_TITLE" name="cms_title"/> > <field column="FORM_TITLE" name="form_title"/> > <field column="FILE_SIZE" name="file_size"/> > <field column="KEYWORDS" name="keywords"/> > <field column="DESCRIPTION" name="description"/> > <field column="CONTENT_URL" name="content_url"/> > </entity> > > <entity name="my_database_url" > dataSource="ds-url" > query="select CONTENT_URL from my_database where > content_id='${my_database.CONTENT_ID}'"> > <entity processor="TikaEntityProcessor" > dataSource="ds-url" > format="text"> > url="http://www.mysite.com/${my_database.content_url}" > <field column="text"/> > </entity> > </entity> > > </document> > </dataConfig> > > I added the entity name="my_database_url" section to an existing (working) > database entity to be able to have Tika index the content pointed to by the > content_url. > > Is there anything obviously wrong with what I've tried so far? I think you should move Tika entity into my_database entity and simplify the whole configuration <entity name="my_database" dataSource="ds-db" query="select * from my_database where rownum <=2"> ... <field column="CONTENT_URL" name="content_url"/> <entity processor="TikaEntityProcessor" dataSource="ds-url" format="text" url="http://www.mysite.com/${my_database.content_url}" <field column="text"/> </entity> </entity>