Re: indexing data from rich documents - Tika with solr3.1

Erik Hatcher Fri, 09 Sep 2011 05:32:59 -0700

If the only thing you're doing is indexing file content, then you can bypass 
using the Data Import Handler altogether and use the ExtractingRequestHandler 
(aka Solr Cell).  And you can feed in a file from a URL using the stream.url 
capability, like the stream.file example here: 
<http://wiki.apache.org/solr/ExtractingRequestHandler#Configuration>


Something like -  
http://localhost:8983/solr/update/extract?stream.url=http://myweb/filename.pdf&literal.id=filename.pdf

But to fix what you're doing below, looks like you should be using 
BinURLDataSource rather than BinFileDataSource - other than that, it looks fine.

        Erik

On Sep 9, 2011, at 06:58 , scorpking wrote:

> Hi everyone, 
> Now i have had a problem with tika and solr. I successed in index data from
> various file formats (pdf, doc...) with a file absolute path. but now I have
> a link from internet (ex: http://myweb/filename.pdf). I want to index from
> this link, But it's not ok. I don't why? This is my file dataconfig.xml:
> 
> *<dataConfig>
>    <dataSource type="BinFileDataSource" name="bin"/>
>    <document>
>                                               
>        <entity name="tika-test" processor="TikaEntityProcessor" url="
> http://myweb/filename.pdf"; format="text" dataSource="bin" >
>                               
>                <field column="Author" name="author" meta="true"/>
>                <field column="title" name="title" meta="true"/>
>                <field column="text" name="text"/>
> 
>               </entity>
>    </document>
> </dataConfig>*
> 
> when i change url=" http://myweb/filename.pdf"; by a file absolute path, it
> work very good. 
> Any one know this? 
> Thanks for your help.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/indexing-data-from-rich-documents-Tika-with-solr3-1-tp3322555p3322555.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing data from rich documents - Tika with solr3.1

Reply via email to