Re: missing a directory, can not process pdf files

Ahmet Arslan Wed, 19 Sep 2012 11:23:57 -0700

> user:~/solr/example/exampledocs$ java
> -jar post.jar test.pdf doesnt work
> 
> Index binary documents such as Word and PDF with Solr Cell
> (ExtractingRequestHandler).


> how do i do his?
> 
> http://lucene.apache.org/solr/api-4_0_0-BETA/doc-files/tutorial.html
> 
> 
> http://wiki.apache.org/solr/ExtractingRequestHandler
> 
> it says solr 1.4?
> 
> curl is not installed normally so how do we do this like
> with post.jar?
> also the docs dir is not existing, seems very outdated?
> 
> "using "curl" or other command line tools to post documents
> to Solr is nice for testing, but not the recommended update
> method for best performance."
> 
> what then?
> 
> 
> far below there:
> 
> java -Durl=http://localhost:8983/solr/update/extract
> -Dparams=literal.id=doc5 -Dtype=text/html -jar post.jar
> tutorial.html
> 
> 
> is this the right?
> 
> java -Dauto -jar post.jar tutorial.html
> java -Dauto -Drecursive -jar post.jar .
> 
> "NOTE: The post.jar utility is not meant for production
> use"
> so how do we normally do this or  should do this?

I haven't used post.jar to index rich documents. This is new feature of solr 
4.0. To index rich documents you can use one of these : 

http://wiki.apache.org/solr/ContentStreamUpdateRequestExample
http://wiki.apache.org/solr/TikaEntityProcessor
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/

Re: missing a directory, can not process pdf files

Reply via email to