RE: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Phil Scadden
mmit(solr, "prindex"); return true; -----Original Message- From: Erick Erickson Sent: Wednesday, 31 October 2018 06:00 To: solr-user Subject: Re: Indexing PDF file in Apache SOLR via Apache TIKA All of the above work, but for robust production situations you'll wa

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread ☼ R Nair
I have done a production implementation of this, running for last four months without any issue. Just a resatrt every week of all components. http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ Best, Ravion On Tue, Oct 30, 2018, 1:00 PM Er

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Erick Erickson
All of the above work, but for robust production situations you'll want to consider a SolrJ client, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog combines indexing from a DB and using Tika, but those are independent. Best, Erick On Tue, Oct 30, 2018 at 12:21 AM Kamuela Lau

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Kamuela Lau
Hi there, Here are a couple of ways I'm aware of: 1. Extract-handler / post tool You can use the curl command with the extract handler or bin/post to upload a single document. Reference: https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html 2. DataImportHa