You can have a look here:
http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/
2013/10/10 Peter Bleackley
> I'm trying to index a set of PDF documents with Solr 4.5.0. So far I can
> get Solr to ingest the entire document as one long string, stored in the
> index
I'm trying to index a set of PDF documents with Solr 4.5.0. So far I can
get Solr to ingest the entire document as one long string, stored in the
index as "content". However, I want to index structure within the documents.
I know that the ExtractingRequestHandler uses Apache Tika to convert the