This is an automated email from the ASF dual-hosted git repository.

kturner pushed a commit to branch elasticity
in repository https://gitbox.apache.org/repos/asf/accumulo.git


The following commit(s) were added to refs/heads/elasticity by this push:
     new d31a1df405 replaces hosting requested cache with in progress set 
(#4285)
d31a1df405 is described below

commit d31a1df4058a6a73bf24781af9a6c87e41fe7f57
Author: Keith Turner <ktur...@apache.org>
AuthorDate: Wed Feb 21 10:26:12 2024 -0500

    replaces hosting requested cache with in progress set (#4285)
    
    A test in MergeIT was running slower than expected. Debugged and saw the 
following happen in the manager logs.
    
    ```
    2024-02-20T22:21:40,578 80 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:40,587 80 [manager.EventCoordinator] DEBUG: Tablet hosting 
requested for 3 tablets in 1
    2024-02-20T22:21:42,718 93 [manager.EventCoordinator] DEBUG: tablet 
1;row_0000000250< was unloaded from localhost:35711
    2024-02-20T22:21:45,136 85 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:45,136 85 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:45,242 98 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:45,242 98 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:45,431 99 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:45,431 99 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:45,797 62 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:45,797 62 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:46,480 71 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:46,480 71 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:47,885 88 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:47,886 88 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:50,539 89 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:50,539 89 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:21:55,802 80 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:21:55,802 80 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:01,034 75 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:01,034 75 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:06,421 87 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:06,421 87 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:11,126 60 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:11,126 60 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:16,457 67 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:16,457 67 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:21,816 80 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:21,816 80 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:27,104 62 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:27,104 62 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:32,472 82 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:32,472 82 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:37,224 99 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:37,224 99 [manager.Manager] TRACE: Ignoring hosting 
request because it was recently requested 1;row_0000000250<
    2024-02-20T22:22:41,811 87 [manager.Manager] INFO : Tablet hosting 
requested for: 1;row_0000000250<
    2024-02-20T22:22:41,817 87 [manager.EventCoordinator] DEBUG: Tablet hosting 
requested for 1 tablets in 1
    ```
    
    A tablet was loaded as a result of a hosting request and then immediately
    unloaded because of table operations done by the client.  The client then
    attempts to scan the table and this hangs for a minute because the recent
    hosting request in the manager has a 1 min timeout.
    
    Trying to keep the cache of recent hosting request consistent for events
    like this would require a lot of effort.  One reason for the cache was to
    deal with the thundering herd.  To avoid cache consistency issues and deal
    with the thundering herd, the cache was replaced with a concurrent set
    that tracks what extents are currently being processed.  If 100s of request
    show up to host a single tablet at the same time, then only one of them
    will be processed.
---
 .../manager/ManagerClientServiceHandler.java       | 50 +++++++++++-----------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git 
a/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java
 
b/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java
index 499bde7544..9a1fdaf478 100644
--- 
a/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java
+++ 
b/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java
@@ -33,6 +33,7 @@ import java.util.List;
 import java.util.Map;
 import java.util.Map.Entry;
 import java.util.Set;
+import java.util.concurrent.ConcurrentSkipListSet;
 import java.util.concurrent.TimeUnit;
 
 import org.apache.accumulo.core.Constants;
@@ -68,7 +69,6 @@ import 
org.apache.accumulo.core.manager.thrift.TabletLoadState;
 import org.apache.accumulo.core.manager.thrift.ThriftPropertyException;
 import org.apache.accumulo.core.metadata.AccumuloTable;
 import org.apache.accumulo.core.metadata.TServerInstance;
-import org.apache.accumulo.core.metadata.schema.Ample;
 import org.apache.accumulo.core.metadata.schema.Ample.ConditionalResult.Status;
 import org.apache.accumulo.core.metadata.schema.TabletDeletedException;
 import org.apache.accumulo.core.metadata.schema.TabletMetadata;
@@ -77,8 +77,6 @@ import 
org.apache.accumulo.core.securityImpl.thrift.TCredentials;
 import org.apache.accumulo.core.securityImpl.thrift.TDelegationToken;
 import org.apache.accumulo.core.securityImpl.thrift.TDelegationTokenConfig;
 import org.apache.accumulo.core.util.ByteBufferUtil;
-import org.apache.accumulo.core.util.cache.Caches.CacheName;
-import org.apache.accumulo.manager.split.Splitter;
 import org.apache.accumulo.manager.tableOps.TraceRepo;
 import org.apache.accumulo.manager.tserverOps.ShutdownTServer;
 import org.apache.accumulo.server.client.ClientServiceHandler;
@@ -94,24 +92,15 @@ import org.apache.thrift.TException;
 import org.apache.zookeeper.KeeperException.NoNodeException;
 import org.slf4j.Logger;
 
-import com.github.benmanes.caffeine.cache.Cache;
-import com.github.benmanes.caffeine.cache.Weigher;
-
 public class ManagerClientServiceHandler implements ManagerClientService.Iface 
{
 
   private static final Logger log = Manager.log;
   private final Manager manager;
-
-  private final Cache<KeyExtent,Long> recentHostingRequest;
-
-  private static final int TEN_MB = 10 * 1024 * 1024;
+  private final Set<KeyExtent> hostingRequestInProgress;
 
   protected ManagerClientServiceHandler(Manager manager) {
     this.manager = manager;
-    Weigher<KeyExtent,Long> weigher = (extent, t) -> Splitter.weigh(extent) + 
8;
-    this.recentHostingRequest = this.manager.getContext().getCaches()
-        .createNewBuilder(CacheName.HOSTING_REQUEST_CACHE, true)
-        .expireAfterWrite(1, 
TimeUnit.MINUTES).maximumWeight(TEN_MB).weigher(weigher).build();
+    this.hostingRequestInProgress = new ConcurrentSkipListSet<>();
   }
 
   @Override
@@ -623,25 +612,31 @@ public class ManagerClientServiceHandler implements 
ManagerClientService.Iface {
     manager.mustBeOnline(tableId);
 
     final List<KeyExtent> success = new ArrayList<>();
-    final Ample ample = manager.getContext().getAmple();
-    try (var mutator = ample.conditionallyMutateTablets()) {
-      extents.forEach(e -> {
+    final List<KeyExtent> inProgress = new ArrayList<>();
+    extents.forEach(e -> {
+      KeyExtent ke = KeyExtent.fromThrift(e);
+      if (hostingRequestInProgress.add(ke)) {
         log.info("Tablet hosting requested for: {} ", KeyExtent.fromThrift(e));
-        KeyExtent ke = KeyExtent.fromThrift(e);
-        if (recentHostingRequest.getIfPresent(ke) == null) {
-          mutator.mutateTablet(ke).requireAbsentOperation()
-              
.requireTabletAvailability(TabletAvailability.ONDEMAND).requireAbsentLocation()
-              
.setHostingRequested().submit(TabletMetadata::getHostingRequested);
-        } else {
-          log.trace("Ignoring hosting request because it was recently 
requested {}", ke);
-        }
+        inProgress.add(ke);
+      } else {
+        log.trace("Ignoring hosting request because another thread is 
currently processing it {}",
+            ke);
+      }
+    });
+    // Do not add any code here, it may interfere with the finally block 
removing extents from
+    // hostingRequestInProgress
+    try (var mutator = 
manager.getContext().getAmple().conditionallyMutateTablets()) {
+      inProgress.forEach(ke -> {
+        mutator.mutateTablet(ke).requireAbsentOperation()
+            
.requireTabletAvailability(TabletAvailability.ONDEMAND).requireAbsentLocation()
+            .setHostingRequested().submit(TabletMetadata::getHostingRequested);
+
       });
 
       mutator.process().forEach((extent, result) -> {
         if (result.getStatus() == Status.ACCEPTED) {
           // cache this success for a bit
           success.add(extent);
-          recentHostingRequest.put(extent, System.currentTimeMillis());
         } else {
           if (log.isTraceEnabled()) {
             // only read the metadata if the logging is enabled
@@ -649,8 +644,11 @@ public class ManagerClientServiceHandler implements 
ManagerClientService.Iface {
           }
         }
       });
+    } finally {
+      inProgress.forEach(hostingRequestInProgress::remove);
     }
 
+    // ELASTICITY_TODO pass ranges of individual tablets
     manager.getEventCoordinator().event(success, "Tablet hosting requested for 
%d tablets in %s",
         success.size(), tableId);
   }

Reply via email to