If the only thing you're doing is indexing file content, then you can bypass using the Data Import Handler altogether and use the ExtractingRequestHandler (aka Solr Cell). And you can feed in a file from a URL using the stream.url capability, like the stream.file example here: <http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration>
Something like - http://localhost:8983/solr/update/extract?stream.url=http://myweb/filename.pdf&literal.id=filename.pdf But to fix what you're doing below, looks like you should be using BinURLDataSource rather than BinFileDataSource - other than that, it looks fine. Erik On Sep 9, 2011, at 06:58 , scorpking wrote: > Hi everyone, > Now i have had a problem with tika and solr. I successed in index data from > various file formats (pdf, doc...) with a file absolute path. but now I have > a link from internet (ex: http://myweb/filename.pdf). I want to index from > this link, But it's not ok. I don't why? This is my file dataconfig.xml: > > *<dataConfig> > <dataSource type="BinFileDataSource" name="bin"/> > <document> > > <entity name="tika-test" processor="TikaEntityProcessor" url=" > http://myweb/filename.pdf" format="text" dataSource="bin" > > > <field column="Author" name="author" meta="true"/> > <field column="title" name="title" meta="true"/> > <field column="text" name="text"/> > > </entity> > </document> > </dataConfig>* > > when i change url=" http://myweb/filename.pdf" by a file absolute path, it > work very good. > Any one know this? > Thanks for your help. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3322555.html > Sent from the Solr - User mailing list archive at Nabble.com.