: user:~/solr/example/exampledocs$ java -jar post.jar test.pdf doesnt work

1) you can use post.jar to send PDFs, but you have to use the option to 
tell solr you are sending a PDF file - because by default it assumes you 
are posting XML.  you can see the problem by looking at the output from 
post.jar and the solr logs...

hossman@frisbee:~/tmp/solr-4.0-BETA/bin-zip/apache-solr-4.0.0-BETA/example/exampledocs$
 java -jar post.jar /tmp/test.pdf 
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type 
application/xml..
...

And in the Solr logs...

...
SEVERE: org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 
0xe3 (at char #10, byte #-1)
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:159)
...

...if you specify the type things should work fine on the clinet side.

As for the Server side...

2) by default Solr's "/update" handler supports Solr Documents in XML, 
JSON, CSV, and JavaBin.  If you wnat to use the "ExtractingRequestHandler" 
to parse rich documents you just have to change the URL exactly as noted 
in the wiki you mentioned ("-Durl=http://localhost:8983/solr/update/extract";)


-Hoss

Reply via email to