BinFileDataSource will only work with file, Try FieldStreamDataSource On Mon, May 31, 2010 at 3:30 AM, Brad Greenlee <b...@footle.org> wrote:
> Hi. I'm trying to get Solr to index a database in which one column is a > filename of a PDF document I'd like to index. My configuration looks like > this: > > <dataConfig> > <dataSource name="ds-db" driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://localhost/document_db" user="user" password="password" > readOnly="true"/> > <dataSource name="ds-file" type="BinFileDataSource"/> > <document name="documents"> > <entity name="document" dataSource="ds-db" query="select * from > documents"> > <entity processor="TikaEntityProcessor" > url="/some/path/${document.filename}" dataSource="ds-file" format="text"> > <field column="text" /> > </entity> > </entity> > </document> > </dataConfig> > > I'm using Solr from trunk (as of two days ago). The import process > completes without errors, and it picks up the columns from the database, but > not the content from the PDF file. It is definitely trying to access the PDF > file, for if I give it an incorrect path name, it complains. It doesn't seem > to be attempting to index the PDF, though, as it completes in about 40ms, > whereas if I import the PDF via the ExtractingRequestHandler, it takes about > 11 seconds to index it. > > I've also tried the tika example in example-DIH and that doesn't seem to > index anything, either. Am I doing something wrong, or is this just not > working yet? > > Cheers, > > Brad > > -- ----------------------------------------------------- Noble Paul | Systems Architect| AOL | http://aol.com