Re: OCR not working occasionally

Zheng Lin Edwin Yeo Sat, 18 Mar 2017 18:09:07 -0700

Hi Rick,

Thanks for your reply.
I saw this error message for the file which has a failure.
Am I able to index such files together with the other files which store
text as an image together in the same indexing threads?

2017-03-19 01:02:26.610 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.u.DirectUpdateHandler2
start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-03-19 01:02:26.610 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.u.SolrIndexWriter Calling
setCommitData with IW:org.apache.solr.update.SolrIndexWriter@2330f07c
2017-03-19 01:02:26.610 ERROR
(updateExecutor-2-thread-4-processing-n:192.168.99.1:8983_solr
x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1)
[c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2]
o.a.s.u.SolrCmdDistributor
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://192.168.99.1:8984/solr/collection1_shard1_replica1:
Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /solr/collection1_shard1_replica1/update. Reason:
<pre>    Not Found</pre></p>
<hr />
</body>
</html>

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:430)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:293)
at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:282)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

2017-03-19 01:02:26.657 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.s.SolrIndexSearcher
Opening [Searcher@77e108d5[collection1_shard1_replica2] main]
2017-03-19 01:02:26.658 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.u.DirectUpdateHandler2
end_commit_flush
2017-03-19 01:02:26.658 INFO
 (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr
x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1)
[c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2]
o.a.s.c.QuerySenderListener QuerySenderListener sending requests to
Searcher@77e108d5[collection1_shard1_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(6.4.2):C3)))}
2017-03-19 01:02:26.658 INFO
 (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr
x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1)
[c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2]
o.a.s.c.QuerySenderListener QuerySenderListener done.
2017-03-19 01:02:26.659 INFO
 (searcherExecutor-16-thread-1-processing-n:192.168.99.1:8983_solr
x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node1)
[c:collection1 s:shard1 r:core_node1 x:collection1_shard1_replica2]
o.a.s.c.SolrCore [collection1_shard1_replica2] Registered new searcher
Searcher@77e108d5[collection1_shard1_replica2]
main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(6.4.2):C3)))}
2017-03-19 01:02:26.659 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2]
o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica2]
 webapp=/solr path=/update
params={update.distrib=FROMLEADER&update.chain=files-update-processor&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=
http://192.168.99.1:8983/solr/collection1_shard1_replica2/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false}{commit=}
0 49
2017-03-19 01:02:26.662 WARN  (qtp1543727556-139) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2]
o.a.s.u.p.DistributedUpdateProcessor Error sending update to
http://192.168.99.1:8984/solr
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://192.168.99.1:8984/solr/collection1_shard1_replica1:
Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 404 </title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>
<p>Problem accessing /solr/collection1_shard1_replica1/update. Reason:
<pre>    Not Found</pre></p>
<hr />
</body>
</html>

at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:578)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:430)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:293)
at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:282)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2017-03-19 01:02:26.662 INFO  (qtp1543727556-139) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2]
o.a.s.u.p.LogUpdateProcessorFactory [collection1_shard1_replica2]
 webapp=/solr path=/update params={commit=true}{commit=} 0 66
2017-03-19 01:02:43.019 INFO  (qtp1543727556-21) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.c.S.Request
[collection1_shard1_replica2]  webapp=/solr path=/admin/file
params={wt=json&_=1489885363012} status=0 QTime=4
2017-03-19 01:02:45.453 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.c.PluginBag Going to
create a new requestHandler with {type = requestHandler,name =
/select,class = solr.SearchHandler,attributes = {enable=true, startup=lazy,
name=/select, class=solr.SearchHandler},args =
{defaults={echoParams=explicit,rows=10,wt=json,indent=true,df=text,fl=id,
content, content_type, content_cat, content_subcat, creation_date, subject,
userid, author, entity, location, geolocation, visibility, accesslevel,
accessgroup, reference, crossreference, resourcename, importance, tag,
popularity, language_s, score}}}
2017-03-19 01:02:45.461 INFO  (qtp1543727556-19) [c:collection1 s:shard1
r:core_node1 x:collection1_shard1_replica2] o.a.s.c.S.Request
[collection1_shard1_replica2]  webapp=/solr path=/select
params={q=*:*&indent=true&wt=json&_=1489885365450} hits=3 status=0 QTime=8

Regards,
Edwin

On 19 March 2017 at 06:31, Rick Leir <rl...@leirtech.com> wrote:

> Hi Edwin
> The pdf file format can store text as an image, and then you need OCR to
> get the text. However, text is more commonly not stored as an image in the
> pdf, and then you should not use OCR to get the text.
>
> Do you get an error message when you have a failure?
> Cheers -- Rick
>
> On March 18, 2017 12:01:17 PM EDT, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> >Hi,
> >
> >I'm facing the issue of that the Tesseract OCR is not able to extract
> >the
> >words in a PDF file in an attachment in EMLfile and index it into Solr
> >occasionally? However, most of the time it can be extracted.
> >
> >What could be the reason that causes the file in the email attachment
> >to be
> >failed to extracted using OCR?
> >
> >I'm using Solr 6.4.2.
> >
> >Regards,
> >Edwin
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: OCR not working occasionally

Reply via email to