Only by 10? You must have quite small documents. OCR is extremely expensive process. Indexing is trivial by comparison. For quite large documents I am working with OCR can be 100 times slower than indexing a PDF that is searchable (text extractable without OCR).
-----Original Message----- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Tuesday, 28 March 2017 4:13 p.m. To: solr-user@lucene.apache.org Subject: Indexing speed reduced significantly with OCR Hi, Does the indexing speed of Solr reduced significantly when we are using Tesseract OCR to extract scanned inline images from PDF? I found that after I implement the solution to extract those scanned images from PDF, the indexing speed is now slower by almost more than 10 times. I'm using Solr 6.4.2, and Tika App 1.1.4. Regards, Edwin Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.