ACCUMULO-2925 Add warning about server-assigned timestamps with replication

Leave a note about updates to equal keys that have different updates that are
assigned the same timestamp by the server.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/4d7e90ae
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/4d7e90ae
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/4d7e90ae

Branch: refs/heads/master
Commit: 4d7e90aeef3a6de6a36a30a188d5c1bc564ade3a
Parents: 0676057
Author: Josh Elser <els...@apache.org>
Authored: Thu Jun 19 17:58:10 2014 -0700
Committer: Josh Elser <els...@apache.org>
Committed: Thu Jun 19 17:58:10 2014 -0700

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/replication.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/4d7e90ae/docs/src/main/asciidoc/chapters/replication.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/replication.txt 
b/docs/src/main/asciidoc/chapters/replication.txt
index 8755e24..5d24649 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -361,3 +361,23 @@ primary and peer. As such, the SummingCombiner wouldn't be 
recommended on a tabl
 While there are changes that could be made to the replication implementation 
which could attempt to mitigate this risk,
 presently, it is not recommended to configure Iterators or Combiners which are 
not idempotent to support cases where
 inaccuracy of aggregations is not acceptable.
+
+==== Server-Assigned Timestamps
+
+Accumulo has the ability to, when not provided by the client, assign a 
timestamp to updates made to a table. This is a
+very useful feature as it reduces the amount of code a client must write and 
also gives some notion of ordering to the
+updates that were made to a table (in addition to some solving some very 
problematic Accumulo implementation details).
+However, replicating Mutations that were created with a server-assigned 
timestamp can be very problematic. To understand
+this, we must first start at the BatchWriter.
+
+To allow for efficient ingest into Accumulo, the BatchWriter will collect many 
mutations, group them into batches and
+send them to the correct server to be applied to the appropriate Tablet. For 
each Mutation in that batch that the server
+receives, the server will set a timestamp that is at least as large as the 
last timestamp (to account for clock skew). In short,
+this means that all of the Mutations in this batch will get the same timestamp 
and be deduplicated in a certain order
+via the in-memory map and recorded in the write-ahead log.
+
+The problem is that these updates could be replayed on the remote in different 
commit sessions, which means that they
+could result in different RFiles on disk (separate minor-compactions). Because 
of this, mutations with server-assigned
+timestamps which are written within the same batch have the possibility to be 
applied in a different order on a peer. In
+the case where a user might submit multiple updates for the same Key in rapid 
succession, the user should ensure proper
+timestamps are set at the client.

Reply via email to