Hello Mikhail, thank you for support. I have already tested this case a lot to be sure what is happening under the hood. As you proposed - I 've shuffled data coming from sql to solr to see how solr reacts: I have 6 shards s0 ... s5 shard - is the routing field in my collection. (router.name=implicit&router.field=shard) Му sql query looks like this
Select id .... , Case when 100 > RowNumber then 's5' else 's_' + cast(RowNumber % 4 as varchar) end as shard from ... Оnly first 99 rows goes to shard s5, and all the rest data spreads evenly between s0 ... s3. After 120 sec of indexing I receive IdleTimeout from shard leader of s5 s4 receives no data and seems do not open connection at all - so no Timeout occurs s0...s3 receives data and no Timeout occurs When I tweak IdleTimeout in /opt/solr-7.4.0/server/etc/jetty-http.xml It helps, But I have concerns about icreasing it from 120sec to 30 min. Is it safe? What consequences could be? I have noticed that IdleTimeout in jetty 9.3.8 (coming with Solr 6.3.0) was 50 sec And no such behavior was observed in Solr 6.3. So default was increased significantly in 9.4.10 for some reason. Maybe someone could shed some light on the reasons. What was changed in document routing behavior and why? Maybe there was discussion about it that I could not find? -- BR Vadim -----Original Message----- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: Friday, September 14, 2018 12:10 PM To: solr-user Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4 Hello, Vadim. My guess (and only guess) that bunch of updates coming into a shard causes a heavy merge that blocks new updates in its' order. This can be verified with logs or threaddump from the problematic node. The probable measures are: try to shuffle updates to load other shards for a while and let parallel merge to pack that shard. And just wait a little by increasing timeout in jetty. Let us know what you will encounter. On Thu, Sep 13, 2018 at 3:54 PM Vadim Ivanov < vadim.iva...@spb.ntk-intourist.ru> wrote: > Hi, > I've put some more tests on the issue and managed to find out more details. > Time out occurs when while long indexing some documents in the beginning is > going to one shard and then for a long time (more than 120 sec) no data at > all is going to that shard. > Connection to that core, opened in the beginning of indexing, goes to idle > timeout :( . > If no data at all going to the shard during indexing - no timeout occurs on > that shard. > If Indexing finishes earlier than 120 sec - no timeout occurs on that > shard. > Unfortunately, in our use-case there are lot of long indexing up to 30 > minutes with uneven shard distribution of documents. > Any suggestion how to mitigate issue? > -- > BR > Vadim Ivanov > > > -----Original Message----- > From: Вадим Иванов [mailto:vadim.iva...@spb.ntk-intourist.ru] > Sent: Wednesday, September 12, 2018 4:29 PM > To: solr-user@lucene.apache.org > Subject: Idle Timeout while DIH indexing and implicit sharding in 7.4 > > Hello gurus, > I am using solrCloud with DIH for indexing my data. > Testing 7.4.0 with implicitly sharded collection I have noticed that any > indexing > longer then 2 minutes always failing with many timeout records in log > coming > from all replicas in collection. > > Such as: > x:Mycol_s_0_replica_t40 RequestHandlerBase > java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout > expired: 120001/120000 ms > null:java.io.IOException: java.util.concurrent.TimeoutException: Idle > timeout expired: 120000/120000 ms > at > > org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075) > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313) > at > > org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra > pper.java:74) > ... > Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: > 120000/120000 ms > at > org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166) > at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$ > 201(ScheduledThreadPoolExecutor.java:180) > at > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch > eduledThreadPoolExecutor.java:293) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11 > 49) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6 > 24) > ... 1 more > Suppressed: java.lang.Throwable: HttpInput failure > at > org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821) > at > > org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn > ection.java:649) > at > org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134) > > Resulting indexing status: > "statusMessages":{ > "Total Requests made to DataSource":"1", > "Total Rows Fetched":"2828323", > "Total Documents Processed":"2828323", > "Total Documents Skipped":"0", > "Full Dump Started":"2018-09-12 14:28:21", > "":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0 > documents.", > "Committed":"2018-09-12 14:33:41", > "Time taken":"0:5:19.507", > "Full Import failed":"2018-09-12 14:33:41"}} > > Nevertheless all these documents seems indexed fine and searchable. > If the same collection not sharded or sharded as " compositeId" indexing > done without any errors. > Type of replicas - nrt or tolg doesn't matter. > Small Indexing (taking less than 2 minutes) run smoothly. > > Testing environment - 1 node, Collection with 6 shards, 1 replica for each > shard > Collection: > /admin/collections?action=CREATE&name=Mycol > &numShards=6 > &router.name=implicit > &shards=s_0,s_1,s_2,s_3,s_4,s_5 > &router.field=sf_shard > &collection.configName=Mycol > &maxShardsPerNode=10 > &nrtReplicas=0&tlogReplicas=1 > > > I have never noticed such behavior before on my prod configuration (solr > 6.3.0) > Seems like bug in new version, but I could not find any jira on issue. > > Any ideas, please... > > -- > BR > Vadim Ivanov > > -- Sincerely yours Mikhail Khludnev