RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

Vadim Ivanov Fri, 14 Sep 2018 02:57:01 -0700

Hello
Mikhail, thank you  for support.
I have already tested this case a lot to be sure what is happening under the 
hood.
As you proposed - I 've shuffled data coming from sql to solr to see how solr 
reacts:
I have 6 shards s0 ... s5
shard - is the routing field in my collection. 
(router.name=implicit&router.field=shard)
Му sql query looks like this


Select 
        id
        ....
        , Case when  100 > RowNumber then 's5'
        else 's_' + cast(RowNumber % 4 as varchar) 
         end  as shard 
from ...

Оnly first 99 rows goes to shard s5, and all the rest data spreads evenly 
between s0 ... s3.
After 120 sec of indexing I receive IdleTimeout  from shard leader of s5 
s4 receives no data and seems do not open connection at all - so no Timeout 
occurs
s0...s3 receives data and  no Timeout occurs

When I tweak IdleTimeout  in /opt/solr-7.4.0/server/etc/jetty-http.xml It 
helps, 
But I have concerns about icreasing it from 120sec to 30 min.
Is it safe? What consequences could be?

I have noticed that IdleTimeout  in jetty 9.3.8 (coming with Solr 6.3.0) was 50 
sec
And no such behavior was observed in Solr 6.3. So default was increased 
significantly in 9.4.10 for some reason.
Maybe someone could shed some light on the reasons. What was changed in 
document routing behavior and why?
Maybe there was discussion about it that I could not find?

-- 
BR Vadim

-----Original Message-----
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Friday, September 14, 2018 12:10 PM
To: solr-user
Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

Hello, Vadim.
My guess (and only guess) that bunch of updates coming into a shard causes
a heavy merge that blocks new updates in its' order. This can be verified
with logs or threaddump from the problematic node. The probable measures
are: try to shuffle updates to load other shards for a while and let
parallel merge to pack that shard. And just wait a little by increasing
timeout in jetty.
Let us know what you will encounter.

On Thu, Sep 13, 2018 at 3:54 PM Vadim Ivanov <
vadim.iva...@spb.ntk-intourist.ru> wrote:

> Hi,
> I've put some more tests on the issue and managed to find out more details.
> Time out occurs when while long indexing some documents in the beginning is
> going to one shard and then for a long time (more than 120 sec) no data at
> all is going to that shard.
> Connection to that core, opened in the beginning of indexing, goes to  idle
> timeout :( .
> If no data at all going to the shard during indexing - no timeout occurs on
> that shard.
> If Indexing finishes earlier than 120 sec - no timeout occurs on that
> shard.
> Unfortunately, in our use-case there are lot of long  indexing up to 30
> minutes with uneven shard distribution of documents.
> Any suggestion how to mitigate issue?
> --
> BR
> Vadim Ivanov
>
>
> -----Original Message-----
> From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru]
> Sent: Wednesday, September 12, 2018 4:29 PM
> To: solr-user@lucene.apache.org
> Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4
>
> Hello gurus,
> I am using solrCloud with DIH for indexing my data.
> Testing 7.4.0 with implicitly sharded collection  I have noticed that any
> indexing
> longer then 2 minutes always failing with many timeout records in log
> coming
> from all replicas in collection.
>
> Such as:
> x:Mycol_s_0_replica_t40 RequestHandlerBase
> java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
> expired: 120001/120000 ms
> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
> timeout expired: 120000/120000 ms
>         at
>
> org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
>         at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
>         at
>
> org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
> pper.java:74)
> ...
> Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
> 120000/120000 ms
>         at
> org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
>         at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
> 201(ScheduledThreadPoolExecutor.java:180)
>         at
>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
> eduledThreadPoolExecutor.java:293)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 49)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 24)
>         ... 1 more
>         Suppressed: java.lang.Throwable: HttpInput failure
>                 at
> org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821)
>                 at
>
> org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn
> ection.java:649)
>                 at
> org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134)
>
> Resulting indexing status:
>   "statusMessages":{
>     "Total Requests made to DataSource":"1",
>     "Total Rows Fetched":"2828323",
>     "Total Documents Processed":"2828323",
>     "Total Documents Skipped":"0",
>     "Full Dump Started":"2018-09-12 14:28:21",
>     "":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0
> documents.",
>     "Committed":"2018-09-12 14:33:41",
>     "Time taken":"0:5:19.507",
>     "Full Import failed":"2018-09-12 14:33:41"}}
>
> Nevertheless all these documents seems indexed fine and searchable.
> If the same collection not sharded  or sharded as " compositeId"   indexing
> done without any errors.
> Type of replicas - nrt or tolg doesn't matter.
> Small Indexing (taking less than 2 minutes) run smoothly.
>
> Testing environment - 1 node, Collection with 6 shards, 1 replica for each
> shard
> Collection:
> /admin/collections?action=CREATE&name=Mycol
>         &numShards=6
>         &router.name=implicit
>         &shards=s_0,s_1,s_2,s_3,s_4,s_5
>         &router.field=sf_shard
>         &collection.configName=Mycol
>         &maxShardsPerNode=10
>         &nrtReplicas=0&tlogReplicas=1
>
>
> I have never noticed such behavior before on my prod configuration (solr
> 6.3.0)
> Seems like bug in new version, but I could not find any jira on issue.
>
> Any ideas, please...
>
> --
> BR
> Vadim Ivanov
>
>

-- 
Sincerely yours
Mikhail Khludnev

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

Reply via email to