Indeed no CDCR. All NRT replicas.
In the logs of the node with the big tlog, I don't see anything that looks unusual to me. Not sure how to see the commits though. I can see some "Opening [Searcher@...]" logs, but these are related to the softCommit, right? Both replicas are in "active" state right now. I can't say for sure in which states they were when the tlog started to accumulate though but I would assume they were "active". Replica with the large tlog is leader. When restarting the node with the large tlog, it stays in "down" status for a while (replaying the tlog afaik). Then it goes "active". At this stage the large tlog file is still there. When a new document arrives, a new tlog file is created (tlog.0000000000000000301). I don't see how it could be related but it seems to have started to accumulate when we changed some collection aliases (I said we have 1 collection, it's not entirely true: we have 2 collections and we switch to one or the other using aliases, only one is considered to be "in use" at a time). Kind regards, Gaël De : Erick Erickson <erickerick...@gmail.com> Envoyé : mercredi 22 juillet 2020 22:14 À : solr-user@lucene.apache.org <solr-user@lucene.apache.org> Objet : Re: tlog keeps growing I’m assuming you do not have CDCR configured, correct? This is weird. Every hard commit should close the current tlog, open a new one and delete old ones respecting numRecordsToKeep. Are these NRT replicas or TLOG replicas? That shouldn’t make a lot of difference, but might be a clue. Your solr log in the one with 20G tlogs should show commits, is there anything that points up? It’s also a bit weird that the numbers are so very different. While not lock-step, I’d expect that they were reasonably close. When you restart the server, does Solr roll over the logs for some period or does it just start accumulating the tlog? Are both replicas in the “active” state? And is the replica with the large tlogs the follower or the leader? Mainly asking a bunch of questions because I haven’t seen this happen, the answers to the above might give a clue where to look next. Best, Erick > On Jul 22, 2020, at 3:39 PM, Gael Jourdan-Weil > <gael.jourdan-w...@kelkoogroup.com> wrote: > > Hello, > > I'm facing a situation where a transaction log file keeps growing and is > never deleted. > > The setup is as follow: > - Solr 8.4.1 > - SolrCloud with 2 nodes > - 1 collection, 1 shard > > On one of the node I can see the tlog files having the expected behavior, > that is new tlog files being created and old ones being deleted at a > frequency that matches the autocommit settings. > For instance, there is currently two files tlog.0000000000000003226 and > tlog.0000000000000003227, each of them is around 1G (size). > > But on the other node, I see two files tlog.0000000000000000298 and > tlog.0000000000000000299, the later being now 20G and has been created 10 > hours ago. > > It already happened a few times, restarting the server seems to make things > go right but it's obviously not a durable solution. > > Do you have any idea what could cause this behavior? > > solrconfig.xml: > <updateHandler class="solr.DirectUpdateHandler2"> > <updateLog> > <str name="dir">${solr.ulog.dir:}</str> > <int name="numRecordsToKeep">1000</int> > <int name="maxNumLogsToKeep">100</int> > </updateLog> > <autoCommit> > <maxTime>900000</maxTime> > <openSearcher>false</openSearcher> > </autoCommit> > <autoSoftCommit> > <maxTime>180000</maxTime> > </autoSoftCommit> > </updateHandler> > > Kind regards, > Gaël >