On 10/11/2013 9:32 AM, PeteBleackley wrote: > I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 > error, apparently caused by post.jar adding /extract to the end of the URL
In order to use post.jar, you would need the /update/extract handler, which is not defined in the tika core under example-DIH. The example-DIH configurations are intended to use and illustrate the dataimport handler - documents are imported using the /dataimport handler and its config file, not sent directly with post.jar. Here's a page covering what you would need in order to send PDFs directly rather than import them using DIH: http://wiki.apache.org/solr/ExtractingRequestHandler Thanks, Shawn