This is an automated email from the ASF dual-hosted git repository.

dlmarion pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/main by this push:
     new 3e353de01 Add updates for merge changes in Accumulo 4.0 (#452)
3e353de01 is described below

commit 3e353de015fa0c2cfbb77907f8eefaa959b947c6
Author: Christopher L. Shannon <cshan...@apache.org>
AuthorDate: Mon Apr 28 15:33:22 2025 -0400

    Add updates for merge changes in Accumulo 4.0 (#452)
---
 _docs-4/administration/merging.md | 94 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/_docs-4/administration/merging.md 
b/_docs-4/administration/merging.md
new file mode 100644
index 000000000..798090045
--- /dev/null
+++ b/_docs-4/administration/merging.md
@@ -0,0 +1,94 @@
+---
+title: Merging
+category: administration
+order: 6
+---
+
+Accumulo 4.0 has improved tablet merging support, including:
+
+* Merging no longer requires "chop" compactions.
+* Merging is now managed by FATE
+* Accumulo now supports auto merging of tablets.
+
+## New Merge Design
+
+Merge used to be a slow operation because tablets had to be compacted before 
merging. This was necessary because Rfiles may contain data outside the tablet 
range and this data needed to be removed.
+The updated merge algorithm works by "fencing" the RFiles in a tablet by the 
valid range. This operation is a fast metadata operation and the valid range of 
a file is now inserted into the file column. 
+Scans will only return data in the specified range so compactions are no 
longer required. The normal system compaction process will eventually remove 
the data outside the range.
+
+## Auto Merge 
+
+Accumulo supports auto merging tablets that are below a certain threshold, 
similar to splitting tablets that are above a threshold.
+The manager runs a task that periodically looks for ranges of tablets that can 
be merged. For a range of tablets to be eligible to be merged the following 
must be true:
+
+1. All tablets in the range must be marked as eligible to be merged using the 
per tablet `TabletMergeability` setting. (more below)
+2. The combined files must be less than `table.merge.file.max`
+3. The total size must be less than `table.mergeability.threshold`. This is 
defined as the combined size of RFiles as a percentage of the split threshold
+
+## Configuration
+
+The following properties are used to configure merging:.
+
+* `manager.tablet.mergeability.interval` - Time to wait between scanning 
tables to identify ranges of tablets that can be auto-merged (default is `24h`)
+* `table.mergeability.threshold` - A range of tablets are eligible for 
automatic merging until the combined size of RFiles reaches this percentage of 
the split threshold. (default is `.25`)
+* `table.merge.file.max` - The maximum number of files that a merge operation 
will process (default is `10000`). This property also applies to merges through 
the API as well.
+
+## Tablet Mergeability
+
+Each tablet can be marked individually with a value to indicate if/when it can 
be auto merged by the system.
+The following are the possible settings:
+
+* `NEVER` - Tablets are never eligible for automatic merging
+* `ALWAYS` - Tablets are always eligible for automatic merging
+* `DELAY` - Tablets are eligible to be merged after the configured delay, 
relative to the Manager time.
+
+### Tablet Mergeability Defaults
+
+* System generated splits - Defaults to `ALWAYS` mergeable. Any system created 
tablets are always eligible to be merged.
+* User added splits - Defaults to `NEVER` mergeable if not specified.
+
+### Upgrade
+
+During upgrade all existing tablets will be marked with a default of `NEVER` 
for the TabletMergeability column to preserve
+the previous behavior. Only new tablets that are generated by system splits 
will be marked as `ALWAYS`.
+
+### Configuring Tablets with the  API
+
+#### Adding/updating splits
+
+There is a new `putSplits()` method that takes a map of splits and 
mergeability settings and will either create those splits or update existing 
with the settings.
+
+```java
+// Adding splits or updating existing splits
+String tableName = "table";
+SortedMap<Text,TabletMergeability> splits = new TreeMap<>();
+// Mark each split with its mergeability setting
+splits.put(new Text(String.format("%09d", 333)), TabletMergeability.always());
+splits.put(new Text(String.format("%09d", 444)), TabletMergeability.always());
+splits.put(new Text(String.format("%09d", 666)), TabletMergeability.never());
+splits.put(new Text(String.format("%09d", 999)),
+  TabletMergeability.after(Duration.ofDays(1)));
+// add or update splits
+client.tableOperations().putSplits(String tableName, splits);
+```
+
+`TabletInformation` contains information describing the current mergeability 
state inside `TabletMergeAbilityInfo`.
+
+#### Listing TabletMergeabilityInfo
+```java
+try (Stream<TabletInformation> tabletInfo =
+        client.tableOperations().getTabletInformation(table, new Range())) {
+        tabletInfo.forEach(ti -> {
+    TabletMergeabilityInfo tmi = ti.getTabletMergeabilityInfo();
+    // Some examples of the API usage
+    // Gets the optional delay that is configured
+    Optional<Duration> delay = tmi.getDelay();
+    // If the tablet is currently eligilbe for merging
+    boolean mergeable = tmi.isMergeable();
+    // Optional estimated elapsed time since the delay was set
+    Optional<Duration> elapsed = tmi.getElapsed();
+    // Optional estimated remaining time before the tablet is eligible for 
merging
+    Optional<Duration> remaining = tmi.getRemaining();
+  });
+}
+```

Reply via email to