Using Solr Cell to index the internal structure of a PDF

2013-10-10 Thread Peter Bleackley
I'm trying to index a set of PDF documents with Solr 4.5.0. So far I can get Solr to ingest the entire document as one long string, stored in the index as "content". However, I want to index structure within the documents. I know that the ExtractingRequestHandler uses Apache Tika to convert the

Problems using DataImportHandler and TikaEntityProcessor

2013-10-11 Thread Peter Bleackley
Starting Solr with the command line java -Dsolr.solr.home=example-DIH/solr -jar start.jar and then trying to import some data with java -Durl=http://localhost:8983/solr/tika/update -Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error