It is a file. Only the filename is stored in the database.

Brad


On May 31, 2010, at 2:59 AM, Noble Paul നോബിള്‍ नो ब्ळ् <noble.p...@corp.aol.com> wrote:

BinFileDataSource  will only work with file, Try FieldStreamDataSource

On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee <b...@footle.org> wrote:

Hi. I'm trying to get Solr to index a database in which one column is a filename of a PDF document I'd like to index. My configuration looks like
this:

<dataConfig>
<dataSource name="ds-db" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/document_db" user="user" password="password"
readOnly="true"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="documents">
  <entity name="document" dataSource="ds-db" query="select * from
documents">
    <entity processor="TikaEntityProcessor"
url="/some/path/${document.filename}" dataSource="ds-file" format="text">
      <field column="text" />
    </entity>
  </entity>
</document>
</dataConfig>

I'm using Solr from trunk (as of two days ago). The import process
completes without errors, and it picks up the columns from the database, but not the content from the PDF file. It is definitely trying to access the PDF file, for if I give it an incorrect path name, it complains. It doesn't seem to be attempting to index the PDF, though, as it completes in about 40ms, whereas if I import the PDF via the ExtractingRequestHandler, it takes about
11 seconds to index it.

I've also tried the tika example in example-DIH and that doesn't seem to index anything, either. Am I doing something wrong, or is this just not
working yet?

Cheers,

Brad




--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com

Reply via email to