This is an automated email from the ASF dual-hosted git repository.

dlmarion pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/main by this push:
     new 0d1fea1ae Accumulo 4.0 Feature Preview blog post (#441)
0d1fea1ae is described below

commit 0d1fea1ae5e6878727770f7334c2621afdc7e812
Author: Dave Marion <dlmar...@apache.org>
AuthorDate: Tue Oct 8 07:01:13 2024 -0400

    Accumulo 4.0 Feature Preview blog post (#441)
    
    
    Co-authored-by: Dom Garguilo <domgargu...@apache.org>
---
 _posts/blog/2024-10-07-accumulo4-preview.md        |  73 +++++++++++++++++++++
 .../blog/202409_accumulo4/AccumuloDeployment1.png  | Bin 0 -> 30785 bytes
 .../blog/202409_accumulo4/AccumuloDeployment2.png  | Bin 0 -> 36242 bytes
 .../blog/202409_accumulo4/AccumuloDeployment3.png  | Bin 0 -> 61210 bytes
 .../blog/202409_accumulo4/AccumuloDeployment4.png  | Bin 0 -> 55056 bytes
 .../blog/202409_accumulo4/AccumuloDeployment5.png  | Bin 0 -> 48132 bytes
 6 files changed, 73 insertions(+)

diff --git a/_posts/blog/2024-10-07-accumulo4-preview.md 
b/_posts/blog/2024-10-07-accumulo4-preview.md
new file mode 100644
index 000000000..a4b88fa6b
--- /dev/null
+++ b/_posts/blog/2024-10-07-accumulo4-preview.md
@@ -0,0 +1,73 @@
+---
+title: "Accumulo 4.0 Feature Preview"
+author: Dave Marion
+---
+
+## Background
+
+In version 2.1, we introduced two new optional and experimental features, 
[External 
Compactions](https://accumulo.apache.org/blog/2021/07/08/external-compactions.html)
 and [ScanServers](https://github.com/apache/accumulo/pull/2665). The External 
Compactions feature included two new server processes, the 
CompactionCoordinator and the Compactor. Using these new processes and their 
related configurations allows the user to perform major compactions on Tablets 
external to the TabletServer pr [...]
+
+The ScanServers feature included one new server process, the ScanServer, which 
allows users to execute scans against a Tablet external to the TabletServer. 
Because the ScanServer does not have access to the in-memory mutations within 
the TabletServer, we introduced a consistency level setting on the Scanner and 
BatchScanner where scans with the “immediate” consistency setting (default) 
would be sent to the TabletServer only and scans with the “eventual” 
consistency setting would be sent  [...]
+
+## New For 4.0
+
+The features in version 4.0 are intended to make running Accumulo in a cloud 
environment more cost-efficient by building on the optional and experimental 
features added in version 2.1. Prior to version 4.0, running an Accumulo 
instance required enough compute resources to host enough TabletServers to 
support the ingest, query, and Tablet maintenance (compact, split, merge, etc.) 
workload as Accumulo was originally designed to keep all Tablets immediately 
accessible all the time. Version  [...]
+
+### On-Demand Tablets
+
+On an upgrade to Accumulo 4.0, the upgrade code will assign all Tablets 
(except for the root and metadata tables) with an availability setting of 
ONDEMAND. What this means is that the Tablet is not assigned and hosted by a 
TabletServer by default. If an operation is performed that requires a Tablet to 
be hosted by a TabletServer, then the operation will wait for the Tablet to be 
assigned and hosted. This setting can be changed and checked using the Shell 
commands `setavailability` and `g [...]
+
+User operations that would require a Tablet to be hosted are live ingest and 
immediate consistency scans. Users can still interact with data in unhosted 
tablets via bulk import and eventual consistency scans, and users can still 
perform tablet maintenance operations on unhosted tablets. The root and 
metadata tables have an availability value of HOSTED, which cannot be changed 
by the user. If your application only performs eventual scans and bulk imports, 
then only one TabletServer is req [...]
+
+Because Tablets are now optionally hosted in a TabletServer, the 
implementation of all the Tablet maintenance functions had to be moved out of 
the TabletServer and re-implemented. Split, Merge, and other metadata-only 
operations were re-implemented as Fate operations in the Manager.
+
+### External Compactions Only
+
+If a Tablet is not hosted, and the user is bulk importing to it, this could 
trigger the need for a major compaction. Hosting the Tablet just for the 
purpose of compacting it will cause churn on the cluster as the balancer may 
move Tablets around. This led to the decision to move all major compactions to 
the External Compactions feature. In 4.0, the CompactionCoordinator component 
was merged into the Manager process, so manually running the 
CompactionCoordinator process is no longer requi [...]
+
+### Resource Groups
+
+In version 4.0 a new group property can be supplied to the Compactor, 
ScanServer, and TabletServer processes (this replaces the “queue” property 
mentioned previously for Compactors). If not specified, the default group is 
used. These properties allow the user to create groups of processes with the 
same name that can be used to host Tablets, execute major compactions, and 
perform eventual scans. For example, application A may have requirements that 
dictate the need for immediate access to [...]
+
+### Increased Visibility
+
+Speaking of scaling, in version 4.0 we are emitting more metrics that can be 
used to determine when and how a resource needs to be scaled. The resource 
group and application name tags can be used to identify the group and type of 
resource that needs to be scaled. The metric name and value can be used to 
determine how the resource needs to be scaled. For example, if the value for 
metric `accumulo.compactor.queue.jobs.queued` is increasing, you likely need 
more Compactor resources. Likewis [...]
+
+## Possible Deployment Scenarios
+
+With the new features described above, many possible deployment scenarios are 
possible. We highlight a few of them below.
+
+### Scenario 1
+
+The diagram below depicts the simplest deployment where all tables operate 
within the default group of resources. There are no ScanServers, so only 
immediate scans are available. Tablets for all tables are assigned, hosted, and 
balanced within the same set of TabletServers. Major compactions for all 
tablets are executed within the same set of Compactors.
+
+![Scenario1](/images/blog/202409_accumulo4/AccumuloDeployment1.png)
+
+### Scenario 2
+
+The diagram below depicts a slightly more complicated deployment where 
ScanServers are also being used to support eventual scans against Tablets.
+
+![Scenario2](/images/blog/202409_accumulo4/AccumuloDeployment2.png)
+
+### Scenario 3
+
+The diagram below depicts a deployment where multiple compactor groups are 
configured in the default resource group. The compaction configuration enables 
the user, for example, to send compactions to different groups based on the sum 
of the input file sizes. In this example we have two additional Compactor 
groups, default-small and default-large, that can be configured for some user 
tables. The Compactor group in the default resource group would be used for all 
other tables.  See the [Ra [...]
+
+![Scenario3](/images/blog/202409_accumulo4/AccumuloDeployment3.png)
+
+### Scenario 4
+
+The diagram below depicts a scenario where a second resource group, app1, has 
been created to service Tablets for Tables associated with a particular 
application. The application can perform eventual and immediate consistent 
scans, and performs live ingest into the Tables, so it needs both ScanServers 
and TabletServers. The user would configure their application to perform 
eventual scans using the instructions in the ScanServer blog post, configure 
major compactions to run in the app1 Co [...]
+
+![Scenario4](/images/blog/202409_accumulo4/AccumuloDeployment4.png)
+
+### Scenario 5
+
+The diagram below is a slight modification to the prior scenario that shows 
the same app1 resource group, but without TabletServers. In this situation the 
associated application is only performing bulk imports and eventual scans on 
table data.
+
+![Scenario5](/images/blog/202409_accumulo4/AccumuloDeployment5.png)
+
+## Current State and Path Forward
+
+Version 4.0.0-SNAPSHOT has been merged into the main branch. We have added 
over 100 new integration tests and all of the old and new tests are passing. We 
are planning on performing testing at increasing scales to determine what other 
architectural changes are needed. For example, we have discussed the 
possibility of needing to run multiple active Manager processes as the Manager 
is now responsible for performing more functions (CompactionCoordinator, more 
Fate operations, etc.).
+
diff --git a/images/blog/202409_accumulo4/AccumuloDeployment1.png 
b/images/blog/202409_accumulo4/AccumuloDeployment1.png
new file mode 100644
index 000000000..9af24857c
Binary files /dev/null and 
b/images/blog/202409_accumulo4/AccumuloDeployment1.png differ
diff --git a/images/blog/202409_accumulo4/AccumuloDeployment2.png 
b/images/blog/202409_accumulo4/AccumuloDeployment2.png
new file mode 100644
index 000000000..183ff8b16
Binary files /dev/null and 
b/images/blog/202409_accumulo4/AccumuloDeployment2.png differ
diff --git a/images/blog/202409_accumulo4/AccumuloDeployment3.png 
b/images/blog/202409_accumulo4/AccumuloDeployment3.png
new file mode 100644
index 000000000..1e6eeb93b
Binary files /dev/null and 
b/images/blog/202409_accumulo4/AccumuloDeployment3.png differ
diff --git a/images/blog/202409_accumulo4/AccumuloDeployment4.png 
b/images/blog/202409_accumulo4/AccumuloDeployment4.png
new file mode 100644
index 000000000..fd4595aa8
Binary files /dev/null and 
b/images/blog/202409_accumulo4/AccumuloDeployment4.png differ
diff --git a/images/blog/202409_accumulo4/AccumuloDeployment5.png 
b/images/blog/202409_accumulo4/AccumuloDeployment5.png
new file mode 100644
index 000000000..751cac58d
Binary files /dev/null and 
b/images/blog/202409_accumulo4/AccumuloDeployment5.png differ

Reply via email to