This is an automated email from the ASF dual-hosted git repository. kturner pushed a commit to branch elasticity in repository https://gitbox.apache.org/repos/asf/accumulo.git
The following commit(s) were added to refs/heads/elasticity by this push: new d31a1df405 replaces hosting requested cache with in progress set (#4285) d31a1df405 is described below commit d31a1df4058a6a73bf24781af9a6c87e41fe7f57 Author: Keith Turner <ktur...@apache.org> AuthorDate: Wed Feb 21 10:26:12 2024 -0500 replaces hosting requested cache with in progress set (#4285) A test in MergeIT was running slower than expected. Debugged and saw the following happen in the manager logs. ``` 2024-02-20T22:21:40,578 80 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:40,587 80 [manager.EventCoordinator] DEBUG: Tablet hosting requested for 3 tablets in 1 2024-02-20T22:21:42,718 93 [manager.EventCoordinator] DEBUG: tablet 1;row_0000000250< was unloaded from localhost:35711 2024-02-20T22:21:45,136 85 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:45,136 85 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:45,242 98 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:45,242 98 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:45,431 99 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:45,431 99 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:45,797 62 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:45,797 62 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:46,480 71 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:46,480 71 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:47,885 88 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:47,886 88 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:50,539 89 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:50,539 89 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:21:55,802 80 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:21:55,802 80 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:01,034 75 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:01,034 75 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:06,421 87 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:06,421 87 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:11,126 60 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:11,126 60 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:16,457 67 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:16,457 67 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:21,816 80 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:21,816 80 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:27,104 62 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:27,104 62 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:32,472 82 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:32,472 82 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:37,224 99 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:37,224 99 [manager.Manager] TRACE: Ignoring hosting request because it was recently requested 1;row_0000000250< 2024-02-20T22:22:41,811 87 [manager.Manager] INFO : Tablet hosting requested for: 1;row_0000000250< 2024-02-20T22:22:41,817 87 [manager.EventCoordinator] DEBUG: Tablet hosting requested for 1 tablets in 1 ``` A tablet was loaded as a result of a hosting request and then immediately unloaded because of table operations done by the client. The client then attempts to scan the table and this hangs for a minute because the recent hosting request in the manager has a 1 min timeout. Trying to keep the cache of recent hosting request consistent for events like this would require a lot of effort. One reason for the cache was to deal with the thundering herd. To avoid cache consistency issues and deal with the thundering herd, the cache was replaced with a concurrent set that tracks what extents are currently being processed. If 100s of request show up to host a single tablet at the same time, then only one of them will be processed. --- .../manager/ManagerClientServiceHandler.java | 50 +++++++++++----------- 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java b/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java index 499bde7544..9a1fdaf478 100644 --- a/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java +++ b/server/manager/src/main/java/org/apache/accumulo/manager/ManagerClientServiceHandler.java @@ -33,6 +33,7 @@ import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; +import java.util.concurrent.ConcurrentSkipListSet; import java.util.concurrent.TimeUnit; import org.apache.accumulo.core.Constants; @@ -68,7 +69,6 @@ import org.apache.accumulo.core.manager.thrift.TabletLoadState; import org.apache.accumulo.core.manager.thrift.ThriftPropertyException; import org.apache.accumulo.core.metadata.AccumuloTable; import org.apache.accumulo.core.metadata.TServerInstance; -import org.apache.accumulo.core.metadata.schema.Ample; import org.apache.accumulo.core.metadata.schema.Ample.ConditionalResult.Status; import org.apache.accumulo.core.metadata.schema.TabletDeletedException; import org.apache.accumulo.core.metadata.schema.TabletMetadata; @@ -77,8 +77,6 @@ import org.apache.accumulo.core.securityImpl.thrift.TCredentials; import org.apache.accumulo.core.securityImpl.thrift.TDelegationToken; import org.apache.accumulo.core.securityImpl.thrift.TDelegationTokenConfig; import org.apache.accumulo.core.util.ByteBufferUtil; -import org.apache.accumulo.core.util.cache.Caches.CacheName; -import org.apache.accumulo.manager.split.Splitter; import org.apache.accumulo.manager.tableOps.TraceRepo; import org.apache.accumulo.manager.tserverOps.ShutdownTServer; import org.apache.accumulo.server.client.ClientServiceHandler; @@ -94,24 +92,15 @@ import org.apache.thrift.TException; import org.apache.zookeeper.KeeperException.NoNodeException; import org.slf4j.Logger; -import com.github.benmanes.caffeine.cache.Cache; -import com.github.benmanes.caffeine.cache.Weigher; - public class ManagerClientServiceHandler implements ManagerClientService.Iface { private static final Logger log = Manager.log; private final Manager manager; - - private final Cache<KeyExtent,Long> recentHostingRequest; - - private static final int TEN_MB = 10 * 1024 * 1024; + private final Set<KeyExtent> hostingRequestInProgress; protected ManagerClientServiceHandler(Manager manager) { this.manager = manager; - Weigher<KeyExtent,Long> weigher = (extent, t) -> Splitter.weigh(extent) + 8; - this.recentHostingRequest = this.manager.getContext().getCaches() - .createNewBuilder(CacheName.HOSTING_REQUEST_CACHE, true) - .expireAfterWrite(1, TimeUnit.MINUTES).maximumWeight(TEN_MB).weigher(weigher).build(); + this.hostingRequestInProgress = new ConcurrentSkipListSet<>(); } @Override @@ -623,25 +612,31 @@ public class ManagerClientServiceHandler implements ManagerClientService.Iface { manager.mustBeOnline(tableId); final List<KeyExtent> success = new ArrayList<>(); - final Ample ample = manager.getContext().getAmple(); - try (var mutator = ample.conditionallyMutateTablets()) { - extents.forEach(e -> { + final List<KeyExtent> inProgress = new ArrayList<>(); + extents.forEach(e -> { + KeyExtent ke = KeyExtent.fromThrift(e); + if (hostingRequestInProgress.add(ke)) { log.info("Tablet hosting requested for: {} ", KeyExtent.fromThrift(e)); - KeyExtent ke = KeyExtent.fromThrift(e); - if (recentHostingRequest.getIfPresent(ke) == null) { - mutator.mutateTablet(ke).requireAbsentOperation() - .requireTabletAvailability(TabletAvailability.ONDEMAND).requireAbsentLocation() - .setHostingRequested().submit(TabletMetadata::getHostingRequested); - } else { - log.trace("Ignoring hosting request because it was recently requested {}", ke); - } + inProgress.add(ke); + } else { + log.trace("Ignoring hosting request because another thread is currently processing it {}", + ke); + } + }); + // Do not add any code here, it may interfere with the finally block removing extents from + // hostingRequestInProgress + try (var mutator = manager.getContext().getAmple().conditionallyMutateTablets()) { + inProgress.forEach(ke -> { + mutator.mutateTablet(ke).requireAbsentOperation() + .requireTabletAvailability(TabletAvailability.ONDEMAND).requireAbsentLocation() + .setHostingRequested().submit(TabletMetadata::getHostingRequested); + }); mutator.process().forEach((extent, result) -> { if (result.getStatus() == Status.ACCEPTED) { // cache this success for a bit success.add(extent); - recentHostingRequest.put(extent, System.currentTimeMillis()); } else { if (log.isTraceEnabled()) { // only read the metadata if the logging is enabled @@ -649,8 +644,11 @@ public class ManagerClientServiceHandler implements ManagerClientService.Iface { } } }); + } finally { + inProgress.forEach(hostingRequestInProgress::remove); } + // ELASTICITY_TODO pass ranges of individual tablets manager.getEventCoordinator().event(success, "Tablet hosting requested for %d tablets in %s", success.size(), tableId); }