How big is your transaction log? If you don't do a hard commit (openSearcher = true or false doesn't matter), then the tlog can grow and upon restart the tlog gets replayed. I've seen tlogs in the 10s of G range which can take a long time to replay. In the mean time, new updates are written to, you guessed it, the tlog.
So check the tlog size. If it's big, be sure you have indexing turned off and be very patient (as in hours in some cases). To avoid this make sure to do a hard commit when indexing. Here's a long blog on the topic: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ and if this is irrelevant, then I'm not quite sure what's going on. Best, Erick On Tue, Dec 9, 2014 at 12:48 AM, Norgorn <lsunnyd...@mail.ru> wrote: > I'm using SOLR 4.10.1 in cloud mode with 3 instances, 5 shards per instance > without replication. > I restarted one SOLR and now all shards from that instance are down, but > there are no errors in logs. > All I see is > > 09.12.2014, 11:13:40 WARN UpdateLog Starting log replay > tlog{file=/opt/data4/data/tlog/tlog.0000000000000000297 refcount=2} > active=false starting pos=0 > 09.12.2014, 11:13:40 WARN UpdateLog Starting log replay > tlog{file=/opt/data5/data/tlog/tlog.0000000000000000297 refcount=2} > active=false starting pos=0 > 09.12.2014, 11:13:40 WARN UpdateLog Starting log replay > tlog{file=/opt/data/data/tlog/tlog.0000000000000000298 refcount=2} > active=false starting pos=0 > 09.12.2014, 11:13:40 WARN UpdateLog Starting log replay > tlog{file=/opt/data3/data/tlog/tlog.0000000000000000298 refcount=2} > active=false starting pos=0 > 09.12.2014, 11:13:40 WARN UpdateLog Starting log replay > tlog{file=/opt/data2/data/tlog/tlog.0000000000000000299 refcount=2} > active=false starting pos=0 > > SOLR with down shards tries to open new searcher, and I see something like > this in output: > > INFO org.apache.solr.servlet.SolrDispatchFilter – [admin] webapp=null > path=/admin/info/system params={_=1418106009371&wt=json} status=0 QTime=314 > 4020344 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore – > [collection_shard5_replica1] Registered new searcher > Searcher@4dcb568c[vk_hbase_shard5_replica1] > main{StandardDirectoryReader(segments_85:4012:nrt _4l(4.10.1):C8880876 > _88(4.10.1):C8730658 _im(4.10.1):C8773208 _pa(4.10.1):C8435426 > _cy(4.10.1):C9802246 _fc(4.10.1):C9046837 _sc(4.10.1):C7806921 > _m7(4.10.1):C9362895 _zy(4.10.1):C8808455 _w0(4.10.1):C8384542 > _ui(4.10.1):C164859 _1dd(4.10.1):C7764232 _13a(4.10.1):C8240288 > _16n(4.10.1):C8839542 _19w(4.10.1):C1071719 _172(4.10.1):C200551 > _1av(4.10.1):C9141784 _1if(4.10.1):C997348 _1eh(4.10.1):C174190 > _1hb(4.10.1):C9050675 _1dl(4.10.1):C64 _1j9(4.10.1):C119759 > _1fw(4.10.1):C795323 _1gn(4.10.1):C4922 _1ht(4.10.1):C984261 > _1hh(4.10.1):C966986 _1iz(4.10.1):C953605 _1ip(4.10.1):C994842 > _1i6(4.10.1):C75701 _1i7(4.10.1):C4011 _1id(4.10.1):C17581 > _1iy(4.10.1):C75483 _1j1(4.10.1):C102710 _1jj(4.10.1):C1030895 > _1j5(4.10.1):C90936 _1jc(4.10.1):C79955 _1jd(4.10.1):C6312 > _1jh(4.10.1):C96957 _1ji(4.10.1):C71555 _1jk(4.10.1):C3270 > _1jl(4.10.1):C107854 _1jm(4.10.1):C107286 _1jn(4.10.1):C94250 > _1jo(4.10.1):C98851 _1jp(4.10.1):C88492)} > > But all shards remain down state. > > I tried to stop SOLR-1 (the one with problems), delete shards with > DELETESHARD command and then start it again - didn't help. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SOLR-shards-stay-down-forever-tp4173284.html > Sent from the Solr - User mailing list archive at Nabble.com.