see the section in the Solr Reference Guide: "Uploading Data with Solr Cell using Apache Tika" here:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika to get a start. The basic idea is to use Apache Tika to parse the PDF file and then stuff the data into Solr. There are a lot of tweaks you'll need to do, particularly mapping the meta-data fields to Solr fields, but the above should get you started. Once you get that operating, you can refine your approach. I'm personally not a fan of doing all this on the Solr server in a _production_ environment unless it's a one-time operation, here's a writeup of why I think that and a model Java program that'd allow you to do this on a Java client. It uses some older Solr classes (i.e. CloudSolrServer is not CloudSolrClient) but it should give you a starting place if you want to do something similar. It has both a database bit and a Tika bit but the database bits can just be taken out, there's nothing about parsing the files with Tika that requires it. https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ Best, Erick On Fri, Nov 18, 2016 at 10:14 AM, vascaino90 <jonas...@gmail.com> wrote: > Hello, i'm new in Solr and i have a big problem. > I have many text documents in PDF format (more than 10000) and I need to > create a site with this PDFs. In this site, I have to create a search by any > terms in this PDFs. > I don't have idea how to start. > Anyone can help me? > > Thank you so much. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Index-and-search-on-PDF-text-using-Solr-tp4306486.html > Sent from the Solr - User mailing list archive at Nabble.com.