Writing a Java (SolrJ) program that traverses a filesystem and extracts the contents of PDF is actually quite simple, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ (you can ignore the RDBMS stuff). That code is a little out of date so may need some very minor tweaks.
Tika (the library Solr uses to parse PDFs and most other files) may have something that makes the job even easier, I’d ask on their user’s list. Putting WordPress in the middle of it all seems unnecessarily complicated. Best, Erick > On Mar 1, 2019, at 11:18 AM, Paul Buiocchi <pfb6...@yahoo.com.INVALID> wrote: > > Thank you Shawn ! > > Sent from Yahoo Mail on Android > > On Fri, Mar 1, 2019 at 12:25 PM, Paul Buiocchi<pfb6...@yahoo.com.INVALID> > wrote: Greetings, > > I have a couple of questions about Solr /Wordpress integration - > > First , I am not "committed to using WordPress as a front end. If there is a > better front end option , I would be willing to convert. For functionality , > all I am looking for is the ability to full txt search , highlight the search > terms in the search results .... It should be pretty simple , maybe I am > overanalyzing it ...Looking for as much "out of the box" as possible > > My scenario is this: > > I am putting together an old newspaper archive site . about 25k pdf files > that are full txt searchable. > > Questions on architecture: > 1) Is there a way for Solr to index from a local file structure i.e local > drive:/newpaper_name/date/page# ? . From the experimenting I have done with > Wordpress/Solr integration , I found that I had to upload the documents in > Wordpress to get Solr to recognize them . > > I'm sure I will have more questions , any help/suggestions would be greatly > appreciated - thank you > > Sent from Yahoo Mail on Android