> Ok, I'm trying to integrate the TikaEntityProcessor as suggested.  I'm using
> Solr Version: 1.4.0 and getting the following error:
>
> java.lang.ClassNotFoundException: Unable to load BinURLDataSource or
> org.apache.solr.handler.dataimport.BinURLDataSource
It seems that DIH-Tika integration is not a part of Solr 1.4.0/1.4.1
release. You should use trunk / nightly builds.
https://issues.apache.org/jira/browse/SOLR-1583

> My data-config.xml looks like this:
>
> <dataConfig>
>  <dataSource type="JdbcDataSource"
>    driver="oracle.jdbc.driver.OracleDriver"
>    url="jdbc:oracle:thin:@whatever:12345:whatever"
>    user="me"
>    name="ds-db"
>    password="secret"/>
>
>  <dataSource type="BinURLDataSource"
>    name="ds-url"/>
>
>  <document>
>    <entity name="my_database"
>     dataSource="ds-db"
>     query="select * from my_database where rownum &lt;=2">
>      <field column="CONTENT_ID"                name="content_id"/>
>      <field column="CMS_TITLE"                 name="cms_title"/>
>      <field column="FORM_TITLE"                name="form_title"/>
>      <field column="FILE_SIZE"                 name="file_size"/>
>      <field column="KEYWORDS"                  name="keywords"/>
>      <field column="DESCRIPTION"               name="description"/>
>      <field column="CONTENT_URL"               name="content_url"/>
>    </entity>
>
>    <entity name="my_database_url"
>     dataSource="ds-url"
>     query="select CONTENT_URL from my_database where
> content_id='${my_database.CONTENT_ID}'">
>     <entity processor="TikaEntityProcessor"
>      dataSource="ds-url"
>      format="text">
>      url="http://www.mysite.com/${my_database.content_url}";
>      <field column="text"/>
>     </entity>
>    </entity>
>
>  </document>
> </dataConfig>
>
> I added the entity name="my_database_url" section to an existing (working)
> database entity to be able to have Tika index the content pointed to by the
> content_url.
>
> Is there anything obviously wrong with what I've tried so far?

I think you should move Tika entity into my_database entity and
simplify the whole configuration

<entity name="my_database" dataSource="ds-db" query="select * from
my_database where rownum &lt;=2">
    ...
    <field column="CONTENT_URL"               name="content_url"/>

    <entity processor="TikaEntityProcessor" dataSource="ds-url"
format="text" url="http://www.mysite.com/${my_database.content_url}";
        <field column="text"/>
    </entity>
</entity>

Reply via email to