It is a file. Only the filename is stored in the database.
Brad
On May 31, 2010, at 2:59 AM, Noble Paul നോബിള് नो
ब्ळ् <noble.p...@corp.aol.com> wrote:
BinFileDataSource will only work with file, Try FieldStreamDataSource
On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee <b...@footle.org>
wrote:
Hi. I'm trying to get Solr to index a database in which one column
is a
filename of a PDF document I'd like to index. My configuration
looks like
this:
<dataConfig>
<dataSource name="ds-db" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/document_db" user="user"
password="password"
readOnly="true"/>
<dataSource name="ds-file" type="BinFileDataSource"/>
<document name="documents">
<entity name="document" dataSource="ds-db" query="select * from
documents">
<entity processor="TikaEntityProcessor"
url="/some/path/${document.filename}" dataSource="ds-file"
format="text">
<field column="text" />
</entity>
</entity>
</document>
</dataConfig>
I'm using Solr from trunk (as of two days ago). The import process
completes without errors, and it picks up the columns from the
database, but
not the content from the PDF file. It is definitely trying to
access the PDF
file, for if I give it an incorrect path name, it complains. It
doesn't seem
to be attempting to index the PDF, though, as it completes in about
40ms,
whereas if I import the PDF via the ExtractingRequestHandler, it
takes about
11 seconds to index it.
I've also tried the tika example in example-DIH and that doesn't
seem to
index anything, either. Am I doing something wrong, or is this just
not
working yet?
Cheers,
Brad
--
-----------------------------------------------------
Noble Paul | Systems Architect| AOL | http://aol.com