tlogs are not deleted
Hi, We are using solr 7.7 cloud with CDCR(every collection has 3 replicas, 1 shard). In solrconfig.xml, tlog configuration is super simple like : There is also daily data import and commit is called after data import every time. Indexing works fine, but the problem is that the number of tlogs keeps growing. According to the documentation here(https://lucene.apache.org/solr/guide/6_6/updatehandlers-in-solrconfig.html), I expected tlog will remain as many as 10(default value of maxNumLogsToKeep=10). However I still have a bunch of tlogs - the oldest one is Sep 6..! I did an experiment by running data import with commit option from solr admin ui, but any of tlogs were not deleted. tlog.002.1643995079881261056 tlog.018.1645444642733293568 tlog.034.1646803619099443200 tlog.003.1644085718240198656 tlog.019.1645535304072822784 tlog.035.1646894195509559296 tlog.004.1644176284537847808 tlog.020.1645625651261079552 tlog.036.1646984623121498112 tlog.005.1644357373324689408 tlog.021.1645625651316654083 tlog.037.1647076244416626688 tlog.006.167899616018432 tlog.022.1645716477747134464 tlog.038.1647165801017376768 tlog.007.1644538486210953216 tlog.023.1645806853961023488 tlog.039.1647165801042542594 tlog.008.1644629084296183808 tlog.024.1645897663703416832 tlog.040.1647256590865137664 tlog.009.1644719895268556800 tlog.025.1645988248838733824 tlog.041.1647347172490870784 tlog.010.1644810493331767296 tlog.026.1646078905702940672 tlog.042.1647437758859313152 tlog.011.1644901113324896256 tlog.027.1646169478772293632 tlog.043.1647528345005457408 tlog.012.1645031030684385280 tlog.028.1646259838395613184 tlog.044.1647618793025830912 tlog.013.164503103008545 tlog.029.1646350429145006080 tlog.045.1647709579019026432 tlog.014.1645082080252526592 tlog.030.1646441456502571008 tlog.046.1647890587519549440 tlog.015.1645172929206419456 tlog.031.1646531802044563456 tlog.047.1647981403286011904 tlog.016.1645263488829882368 tlog.032.16466061568 tlog.048.1648071989042085888 tlog.017.1645353861842468864 tlog.033.1646712822719053824 tlog.049.1648135546466205696 Did I miss something in the solrconfig file? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Three questions about huge tlog problem and CDCR
found a typo. correcting "updateLogSynchronizer" is set to 6(1 min), not 1 hour -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Three questions about huge tlog problem and CDCR
Thank you for the advice. By the way, when I upload a new collectin configuration to zookeepr and enable bidirectional CDCR for the collections on both prod and dr side(/cdcr?action=START), and reload the collections, CDCR usually didn't work. So if I restarted entire nodes in the cluster on both prod and dr, CDCR started working. Should I normally restart Solr after enabling/disabling the CDCR? Reloading the collections without Solr restart is not enough to apply the CDCR change? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Three questions about huge tlog problem and CDCR
sure. I disabled buffer and started cdcr by calling api on both side. And when I do indexing, I see the size of tlog folder stays within 1MB while the size of index folder is increasing. So I imagined that tlog would be consumed by target node and cleared, and data is being forwarded to target node.. but actually when I checked target node, index in target nodes is still empty and data was loaded only in source node. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Using solr 7.7.2, Is it safe to manually delete tlog after doing commit?
Using solr 7.7.2. Our CDCR is broken for some reason as I posted the other question(https://lucene.472066.n3.nabble.com/Three-questions-about-huge-tlog-problem-and-CDCR-td4453788.html). So the size of tlog is huge now... I don't care CDCR for now, and just want to clean all these tlog first. Otherwise, disk space will become full. Is it safe to manually delete by using "rm -rf ./tlog" after commit with /solr/collectionname/update?commit=true (simply doing commit was not able to clean tlog because of CDCR malfunction)? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
solr unable to locate core on CDCR
Whenever I create a new collection on solr 7.7.2 cluster CDCR(uni directional), and once I disable buffer on both source and target nodes and start CDCR process on the source node, then I encounter the error message "solr unable to locate core..." on one of target node. On source node, CDCR bootstrap failed because of the target node's failure to locate core. The exact moment thatthe error message occurs is when I trigger CDCR process on the source node. Is there any bug on solr 7.7.2 CDCR? Following is the steps to reproduce. 1) create collection on both source and target nodes 2) disable buffer on the source and target 3) enable CDCR on source nodes then target node can't locate the core, and the source node failed to start CDCR bootstrap. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
source cluster sends incorrect recovery request to target cluster when CDCR is enabled
Hi, Running Solr 7.7.2, cluster with 3 replicas When CDCR is enabled, one of the target nodes gets an incorrect recovery request. Below is the content of the state.json file from the zookeeper. "shards":{"shard1":{ "range":"8000-7fff", "state":"active", "replicas":{ "core_node3":{ "core":"tbh_manuals_test_bi2_shard1_replica_n1", "base_url":"https://host1:8983/solr";, "node_name":"host1:8983_solr", "state":"active", "type":"NRT", "force_set_state":"false"}, "core_node5":{ "core":"tbh_manuals_test_bi2_shard1_replica_n2", "base_url":"https://host2:8983/solr";, "node_name":"host2:8983_solr", "state":"active", "type":"NRT", "force_set_state":"false", "leader":"true"}, "core_node6":{ "core":"tbh_manuals_test_bi2_shard1_replica_n4", "base_url":"https://host3:8983/solr";, "node_name":"host3:8983_solr", "state":"active", "type":"NRT", "force_set_state":"false"}} As we see, host1 doesn't have tbh_manuals_test_bi2_shard1_replica_n4. However, host1 is receiving the request that tbh_manuals_test_bi2_shard1_replica_n4 will be recovered, which cause "unable to locate core" error. Below is the entire error message of host1 on target cluster 2020-01-08 03:05:52.355 INFO (zkCallback-7-thread-14) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tbh_manuals_test_bi2/state.json] for collection [tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3]) 2020-01-08 03:05:52.355 INFO (zkCallback-7-thread-15) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tbh_manuals_test_bi2/state.json] for collection [tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3]) 2020-01-08 03:05:52.378 INFO (qtp1155769010-87) [ x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.a.CoreAdminOperation It has been requested that we recover: core=tbh_manuals_test_bi2_shard1_replica_n4 2020-01-08 03:05:52.379 ERROR (qtp1155769010-87) [ x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Unable to locate core tbh_manuals_test_bi2_shard1_replica_n4 at org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167) at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360) at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736) at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)