Hi,
I've put some more tests on the issue and managed to find out more details.
Time out occurs when while long indexing some documents in the beginning is
going to one shard and then for a long time (more than 120 sec) no data at
all is going to that shard.
Connection to that core, opened in the beginning of indexing, goes to  idle
timeout :( .
If no data at all going to the shard during indexing - no timeout occurs on
that shard.
If Indexing finishes earlier than 120 sec - no timeout occurs on that shard.
Unfortunately, in our use-case there are lot of long  indexing up to 30
minutes with uneven shard distribution of documents.
Any suggestion how to mitigate issue?
--
BR
Vadim Ivanov


-----Original Message-----
From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru] 
Sent: Wednesday, September 12, 2018 4:29 PM
To: solr-user@lucene.apache.org
Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4

Hello gurus, 
I am using solrCloud with DIH for indexing my data.
Testing 7.4.0 with implicitly sharded collection  I have noticed that any
indexing 
longer then 2 minutes always failing with many timeout records in log coming
from all replicas in collection.

Such as:
x:Mycol_s_0_replica_t40 RequestHandlerBase
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
expired: 120001/120000 ms
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 120000/120000 ms
        at
org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
        at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
        at
org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
pper.java:74)
...
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
120000/120000 ms
        at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
        at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
201(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)
        ... 1 more
        Suppressed: java.lang.Throwable: HttpInput failure
                at
org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821)
                at
org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn
ection.java:649)
                at
org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134)

Resulting indexing status:
  "statusMessages":{
    "Total Requests made to DataSource":"1",
    "Total Rows Fetched":"2828323",
    "Total Documents Processed":"2828323",
    "Total Documents Skipped":"0",
    "Full Dump Started":"2018-09-12 14:28:21",
    "":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0
documents.",
    "Committed":"2018-09-12 14:33:41",
    "Time taken":"0:5:19.507",
    "Full Import failed":"2018-09-12 14:33:41"}}

Nevertheless all these documents seems indexed fine and searchable.
If the same collection not sharded  or sharded as " compositeId"   indexing
done without any errors.
Type of replicas - nrt or tolg doesn't matter.
Small Indexing (taking less than 2 minutes) run smoothly.

Testing environment - 1 node, Collection with 6 shards, 1 replica for each
shard
Collection:
/admin/collections?action=CREATE&name=Mycol
        &numShards=6
        &router.name=implicit
        &shards=s_0,s_1,s_2,s_3,s_4,s_5
        &router.field=sf_shard
        &collection.configName=Mycol 
        &maxShardsPerNode=10
        &nrtReplicas=0&tlogReplicas=1


I have never noticed such behavior before on my prod configuration (solr
6.3.0)
Seems like bug in new version, but I could not find any jira on issue.

Any ideas, please...

--
BR
Vadim Ivanov

Reply via email to