This is an automated email from the ASF dual-hosted git repository. dlmarion pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/accumulo-website.git
The following commit(s) were added to refs/heads/main by this push: new 3e353de01 Add updates for merge changes in Accumulo 4.0 (#452) 3e353de01 is described below commit 3e353de015fa0c2cfbb77907f8eefaa959b947c6 Author: Christopher L. Shannon <cshan...@apache.org> AuthorDate: Mon Apr 28 15:33:22 2025 -0400 Add updates for merge changes in Accumulo 4.0 (#452) --- _docs-4/administration/merging.md | 94 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/_docs-4/administration/merging.md b/_docs-4/administration/merging.md new file mode 100644 index 000000000..798090045 --- /dev/null +++ b/_docs-4/administration/merging.md @@ -0,0 +1,94 @@ +--- +title: Merging +category: administration +order: 6 +--- + +Accumulo 4.0 has improved tablet merging support, including: + +* Merging no longer requires "chop" compactions. +* Merging is now managed by FATE +* Accumulo now supports auto merging of tablets. + +## New Merge Design + +Merge used to be a slow operation because tablets had to be compacted before merging. This was necessary because Rfiles may contain data outside the tablet range and this data needed to be removed. +The updated merge algorithm works by "fencing" the RFiles in a tablet by the valid range. This operation is a fast metadata operation and the valid range of a file is now inserted into the file column. +Scans will only return data in the specified range so compactions are no longer required. The normal system compaction process will eventually remove the data outside the range. + +## Auto Merge + +Accumulo supports auto merging tablets that are below a certain threshold, similar to splitting tablets that are above a threshold. +The manager runs a task that periodically looks for ranges of tablets that can be merged. For a range of tablets to be eligible to be merged the following must be true: + +1. All tablets in the range must be marked as eligible to be merged using the per tablet `TabletMergeability` setting. (more below) +2. The combined files must be less than `table.merge.file.max` +3. The total size must be less than `table.mergeability.threshold`. This is defined as the combined size of RFiles as a percentage of the split threshold + +## Configuration + +The following properties are used to configure merging:. + +* `manager.tablet.mergeability.interval` - Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) +* `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`) +* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`). This property also applies to merges through the API as well. + +## Tablet Mergeability + +Each tablet can be marked individually with a value to indicate if/when it can be auto merged by the system. +The following are the possible settings: + +* `NEVER` - Tablets are never eligible for automatic merging +* `ALWAYS` - Tablets are always eligible for automatic merging +* `DELAY` - Tablets are eligible to be merged after the configured delay, relative to the Manager time. + +### Tablet Mergeability Defaults + +* System generated splits - Defaults to `ALWAYS` mergeable. Any system created tablets are always eligible to be merged. +* User added splits - Defaults to `NEVER` mergeable if not specified. + +### Upgrade + +During upgrade all existing tablets will be marked with a default of `NEVER` for the TabletMergeability column to preserve +the previous behavior. Only new tablets that are generated by system splits will be marked as `ALWAYS`. + +### Configuring Tablets with the API + +#### Adding/updating splits + +There is a new `putSplits()` method that takes a map of splits and mergeability settings and will either create those splits or update existing with the settings. + +```java +// Adding splits or updating existing splits +String tableName = "table"; +SortedMap<Text,TabletMergeability> splits = new TreeMap<>(); +// Mark each split with its mergeability setting +splits.put(new Text(String.format("%09d", 333)), TabletMergeability.always()); +splits.put(new Text(String.format("%09d", 444)), TabletMergeability.always()); +splits.put(new Text(String.format("%09d", 666)), TabletMergeability.never()); +splits.put(new Text(String.format("%09d", 999)), + TabletMergeability.after(Duration.ofDays(1))); +// add or update splits +client.tableOperations().putSplits(String tableName, splits); +``` + +`TabletInformation` contains information describing the current mergeability state inside `TabletMergeAbilityInfo`. + +#### Listing TabletMergeabilityInfo +```java +try (Stream<TabletInformation> tabletInfo = + client.tableOperations().getTabletInformation(table, new Range())) { + tabletInfo.forEach(ti -> { + TabletMergeabilityInfo tmi = ti.getTabletMergeabilityInfo(); + // Some examples of the API usage + // Gets the optional delay that is configured + Optional<Duration> delay = tmi.getDelay(); + // If the tablet is currently eligilbe for merging + boolean mergeable = tmi.isMergeable(); + // Optional estimated elapsed time since the delay was set + Optional<Duration> elapsed = tmi.getElapsed(); + // Optional estimated remaining time before the tablet is eligible for merging + Optional<Duration> remaining = tmi.getRemaining(); + }); +} +```