tlogs are not deleted

2019-10-22 Thread alwaysbluesky
Hi,

We are using solr 7.7 cloud with CDCR(every collection has 3 replicas, 1
shard).

In solrconfig.xml,

tlog configuration is super simple like : 

There is also daily data import and commit is called after data import every
time.

Indexing works fine, but the problem is that the number of tlogs keeps
growing.

According to the documentation
here(https://lucene.apache.org/solr/guide/6_6/updatehandlers-in-solrconfig.html),
I expected tlog will remain as many as 10(default value of
maxNumLogsToKeep=10).

However I still have a bunch of tlogs - the oldest one is Sep 6..!

I did an experiment by running data import with commit option from solr
admin ui, but any of tlogs were not deleted.

tlog.002.1643995079881261056 
tlog.018.1645444642733293568 
tlog.034.1646803619099443200
tlog.003.1644085718240198656 
tlog.019.1645535304072822784 
tlog.035.1646894195509559296
tlog.004.1644176284537847808 
tlog.020.1645625651261079552 
tlog.036.1646984623121498112
tlog.005.1644357373324689408 
tlog.021.1645625651316654083 
tlog.037.1647076244416626688
tlog.006.167899616018432 
tlog.022.1645716477747134464 
tlog.038.1647165801017376768
tlog.007.1644538486210953216 
tlog.023.1645806853961023488 
tlog.039.1647165801042542594
tlog.008.1644629084296183808 
tlog.024.1645897663703416832 
tlog.040.1647256590865137664
tlog.009.1644719895268556800 
tlog.025.1645988248838733824 
tlog.041.1647347172490870784
tlog.010.1644810493331767296 
tlog.026.1646078905702940672 
tlog.042.1647437758859313152
tlog.011.1644901113324896256 
tlog.027.1646169478772293632 
tlog.043.1647528345005457408
tlog.012.1645031030684385280 
tlog.028.1646259838395613184 
tlog.044.1647618793025830912
tlog.013.164503103008545 
tlog.029.1646350429145006080 
tlog.045.1647709579019026432
tlog.014.1645082080252526592 
tlog.030.1646441456502571008 
tlog.046.1647890587519549440
tlog.015.1645172929206419456 
tlog.031.1646531802044563456 
tlog.047.1647981403286011904
tlog.016.1645263488829882368 
tlog.032.16466061568 
tlog.048.1648071989042085888
tlog.017.1645353861842468864 
tlog.033.1646712822719053824 
tlog.049.1648135546466205696

Did I miss something in the solrconfig file?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-18 Thread alwaysbluesky
found a typo. correcting "updateLogSynchronizer" is set to 6(1 min), not
1 hour



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-19 Thread alwaysbluesky
Thank you for the advice.

By the way, when I upload a new collectin configuration to zookeepr and
enable bidirectional CDCR for the collections on both prod and dr
side(/cdcr?action=START), and reload the collections, CDCR
usually didn't work. So if I restarted entire nodes in the cluster on both
prod and dr, CDCR started working.

Should I normally restart Solr after enabling/disabling the CDCR? Reloading
the collections without Solr restart is not enough to apply the CDCR change?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Three questions about huge tlog problem and CDCR

2019-12-20 Thread alwaysbluesky
sure.

I disabled buffer and started cdcr by calling api on both side.

And when I do indexing, I see the size of tlog folder stays within 1MB while
the size of index folder is increasing. 

So I imagined that tlog would be consumed by target node and cleared, and
data is being forwarded to target node.. but actually when I checked target
node, index in target nodes is still empty and data was loaded only in
source node.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Using solr 7.7.2, Is it safe to manually delete tlog after doing commit?

2019-12-20 Thread alwaysbluesky
Using solr 7.7.2.

Our CDCR is broken for some reason as I posted the other
question(https://lucene.472066.n3.nabble.com/Three-questions-about-huge-tlog-problem-and-CDCR-td4453788.html).

 So the size of tlog is huge now... I don't care CDCR for now, and just want
to clean all these tlog first. Otherwise, disk space will become full.

Is it safe to manually delete by using "rm -rf ./tlog" after commit with
/solr/collectionname/update?commit=true (simply doing commit was not able to
clean tlog because of CDCR malfunction)?





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


solr unable to locate core on CDCR

2020-01-07 Thread alwaysbluesky
Whenever I create a new collection on solr 7.7.2 cluster CDCR(uni
directional),

and once I disable buffer on both source and target nodes and start CDCR
process on the source node, then I encounter the error message "solr unable
to locate core..." on one of target node.

On source node, CDCR bootstrap failed because of the target node's failure
to locate core.

The exact moment thatthe error message occurs is when I trigger CDCR process
on the source node.

Is there any bug on solr 7.7.2 CDCR?


Following is the steps to reproduce.

1) create collection on both source and target nodes

2) disable buffer on the source and target

3) enable CDCR on source nodes

then target node can't locate the core, and the source node failed to start
CDCR bootstrap.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


source cluster sends incorrect recovery request to target cluster when CDCR is enabled

2020-01-07 Thread alwaysbluesky
Hi,

Running Solr 7.7.2, cluster with 3 replicas

When CDCR is enabled, one of the target nodes gets an incorrect recovery
request.

Below is the content of the state.json file from the zookeeper.

"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node3":{
"core":"tbh_manuals_test_bi2_shard1_replica_n1",
"base_url":"https://host1:8983/solr";,
"node_name":"host1:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false"},
  "core_node5":{
"core":"tbh_manuals_test_bi2_shard1_replica_n2",
"base_url":"https://host2:8983/solr";,
"node_name":"host2:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"},
  "core_node6":{
"core":"tbh_manuals_test_bi2_shard1_replica_n4",
"base_url":"https://host3:8983/solr";,
"node_name":"host3:8983_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false"}}

As we see, host1 doesn't have tbh_manuals_test_bi2_shard1_replica_n4.
However, host1 is receiving the request that
tbh_manuals_test_bi2_shard1_replica_n4 will be recovered, which cause
"unable to locate core" error.

Below is the entire error message of host1 on target cluster

2020-01-08 03:05:52.355 INFO  (zkCallback-7-thread-14) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/tbh_manuals_test_bi2/state.json] for collection
[tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3])
2020-01-08 03:05:52.355 INFO  (zkCallback-7-thread-15) [   ]
o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/collections/tbh_manuals_test_bi2/state.json] for collection
[tbh_manuals_test_bi2] has occurred - updating... (live nodes size: [3])
2020-01-08 03:05:52.378 INFO  (qtp1155769010-87) [  
x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.a.CoreAdminOperation It
has been requested that we recover:
core=tbh_manuals_test_bi2_shard1_replica_n4
2020-01-08 03:05:52.379 ERROR (qtp1155769010-87) [  
x:tbh_manuals_test_bi2_shard1_replica_n4] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Unable to locate core
tbh_manuals_test_bi2_shard1_replica_n4
at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:396)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)