On 3/1/2019 10:25 AM, Paul Buiocchi wrote:
I have a couple of questions about Solr /Wordpress integration -
You would need to talk to the person who wrote the plugin for Wordpress that integrates with Solr. If they indicate that a question can only be answered by the Solr project, then bring that to us.
I am putting together an old newspaper archive site . about 25k pdf files that are full txt searchable.
If you want Solr to index your PDF documents, you would have to use SolrCell, also known as the Extracting Request Handler.
We strongly recommend that this functionality should never be used in production. The reason is that the underlying technology, Apache Tika, can crash when given certain input. PDF documents are more likely than other kinds to cause this problem. If Tika crashes when it is being run inside Solr, then Solr will also crash.
Questions on architecture: 1) Is there a way for Solr to index from a local file structure i.e local drive:/newpaper_name/date/page# ? . From the experimenting I have done with Wordpress/Solr integration , I found that I had to upload the documents in Wordpress to get Solr to recognize them .
Yes, you can index just about anything you like if you are willing to create the configuration and the software to do it. But in order for Wordpress to understand that data, it most likely would have to be done through Wordpress.
Thanks, Shawn