Hi Rick, Thanks for your reply. I saw this error message for the file which has a failure. Am I able to index such files together with the other files which store text as an image together in the same indexing threads?
2017-03-19 01:02:26.610 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 2017-03-19 01:02:26.610 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.SolrIndexWriter Calling setCommitData with IW:org.apache.solr.update.SolrIndexWriter@2330f07c 2017-03-19 01:02:26.610 ERROR (updateExecutor-2-thread-4-processing-n:192.168.99.1:8983_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.SolrCmdDistributor org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.99.1:8984/solr/collection1_shard1_replica1: Expected mime type application/octet-stream but got text/html. <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> <title>Error 404 </title> </head> <body> <h2>HTTP ERROR: 404</h2> <p>Problem accessing /solr/collection1_shard1_replica1/update. Reason: <pre> Not Found</pre></p> <hr /> </body> </html> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:430) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:293) at org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:282) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 2017-03-19 01:02:26.657 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening [Searcher@77e108d5[collection1_shard1_replica2] main] 2017-03-19 01:02:26.658 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.DirectUpdateHandler2 end_commit_flush 2017-03-19 01:02:26.658 INFO (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.QuerySenderListener QuerySenderListener sending requests to Searcher@77e108d5[collection1_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(6.4.2):C3)))} 2017-03-19 01:02:26.658 INFO (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.QuerySenderListener QuerySenderListener done. 2017-03-19 01:02:26.659 INFO (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.SolrCore [collection1_shard1_replica2] Registered new searcher Searcher@77e108d5[collection1_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(6.4.2):C3)))} 2017-03-19 01:02:26.659 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica2] webapp=/solr path=/update params={update.distrib=FROMLEADER&update.chain=files-update-processor&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from= http://192.168.99.1:8983/solr/collection1_shard1_replica2/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false}{commit=} 0 49 2017-03-19 01:02:26.662 WARN (qtp1543727556-139) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.p.DistributedUpdateProcessor Error sending update to http://192.168.99.1:8984/solr org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.99.1:8984/solr/collection1_shard1_replica1: Expected mime type application/octet-stream but got text/html. <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> <title>Error 404 </title> </head> <body> <h2>HTTP ERROR: 404</h2> <p>Problem accessing /solr/collection1_shard1_replica1/update. Reason: <pre> Not Found</pre></p> <hr /> </body> </html> at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:430) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:293) at org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:282) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 2017-03-19 01:02:26.662 INFO (qtp1543727556-139) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica2] webapp=/solr path=/update params={commit=true}{commit=} 0 66 2017-03-19 01:02:43.019 INFO (qtp1543727556-21) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.S.Request [collection1_shard1_replica2] webapp=/solr path=/admin/file params={wt=json&_=1489885363012} status=0 QTime=4 2017-03-19 01:02:45.453 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.PluginBag Going to create a new requestHandler with {type = requestHandler,name = /select,class = solr.SearchHandler,attributes = {enable=true, startup=lazy, name=/select, class=solr.SearchHandler},args = {defaults={echoParams=explicit,rows=10,wt=json,indent=true,df=text,fl=id, content, content_type, content_cat, content_subcat, creation_date, subject, userid, author, entity, location, geolocation, visibility, accesslevel, accessgroup, reference, crossreference, resourcename, importance, tag, popularity, language_s, score}}} 2017-03-19 01:02:45.461 INFO (qtp1543727556-19) [c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2] o.a.s.c.S.Request [collection1_shard1_replica2] webapp=/solr path=/select params={q=*:*&indent=true&wt=json&_=1489885365450} hits=3 status=0 QTime=8 Regards, Edwin On 19 March 2017 at 06:31, Rick Leir <rl...@leirtech.com> wrote: > Hi Edwin > The pdf file format can store text as an image, and then you need OCR to > get the text. However, text is more commonly not stored as an image in the > pdf, and then you should not use OCR to get the text. > > Do you get an error message when you have a failure? > Cheers -- Rick > > On March 18, 2017 12:01:17 PM EDT, Zheng Lin Edwin Yeo < > edwinye...@gmail.com> wrote: > >Hi, > > > >I'm facing the issue of that the Tesseract OCR is not able to extract > >the > >words in a PDF file in an attachment in EMLfile and index it into Solr > >occasionally? However, most of the time it can be extracted. > > > >What could be the reason that causes the file in the email attachment > >to be > >failed to extracted using OCR? > > > >I'm using Solr 6.4.2. > > > >Regards, > >Edwin > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity.