Repository: accumulo
Updated Branches:
  refs/heads/master da3534115 -> 4b1196257


ACCUMULO-3502 Update documentation about "server timestamps"

This started as a realization about server-assigned timestamps,
but was really meant to warn that the non-determinism of multiple
updates to the same exact key is independent of replicas and the primary.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/4b119625
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/4b119625
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/4b119625

Branch: refs/heads/master
Commit: 4b1196257070a1ab788372f03725dc0425567a63
Parents: da35341
Author: Josh Elser <els...@apache.org>
Authored: Wed Jan 21 21:00:16 2015 -0500
Committer: Josh Elser <els...@apache.org>
Committed: Wed Jan 21 21:00:16 2015 -0500

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/replication.txt | 34 +++++++++-----------
 1 file changed, 15 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/4b119625/docs/src/main/asciidoc/chapters/replication.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/replication.txt 
b/docs/src/main/asciidoc/chapters/replication.txt
index 5d24649..48f6ffa 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -362,22 +362,18 @@ While there are changes that could be made to the 
replication implementation whi
 presently, it is not recommended to configure Iterators or Combiners which are 
not idempotent to support cases where
 inaccuracy of aggregations is not acceptable.
 
-==== Server-Assigned Timestamps
-
-Accumulo has the ability to, when not provided by the client, assign a 
timestamp to updates made to a table. This is a
-very useful feature as it reduces the amount of code a client must write and 
also gives some notion of ordering to the
-updates that were made to a table (in addition to some solving some very 
problematic Accumulo implementation details).
-However, replicating Mutations that were created with a server-assigned 
timestamp can be very problematic. To understand
-this, we must first start at the BatchWriter.
-
-To allow for efficient ingest into Accumulo, the BatchWriter will collect many 
mutations, group them into batches and
-send them to the correct server to be applied to the appropriate Tablet. For 
each Mutation in that batch that the server
-receives, the server will set a timestamp that is at least as large as the 
last timestamp (to account for clock skew). In short,
-this means that all of the Mutations in this batch will get the same timestamp 
and be deduplicated in a certain order
-via the in-memory map and recorded in the write-ahead log.
-
-The problem is that these updates could be replayed on the remote in different 
commit sessions, which means that they
-could result in different RFiles on disk (separate minor-compactions). Because 
of this, mutations with server-assigned
-timestamps which are written within the same batch have the possibility to be 
applied in a different order on a peer. In
-the case where a user might submit multiple updates for the same Key in rapid 
succession, the user should ensure proper
-timestamps are set at the client.
+==== Duplicate Keys
+
+In Accumulo, when more than one key exists that are exactly the same, keys 
that are equal down to the timestamp,
+the retained value is non-deterministic. Replication introduces another level 
of non-determinism in this case.
+For a table that is being replicated and has multiple equal keys with 
different values inserted into it, the final
+value in that table on the primary instance is not guaranteed to be the final 
value on all replicas.
+
+For example, say the values that were inserted on the primary instance were 
+value1+ and +value2+ and the final
+value was +value1+, it is not guaranteed that all replicas will have +value1+ 
like the primary. The final value is
+non-deterministic for each instance.
+
+As is the recommendation without replication enabled, if multiple values for 
the same key (sans timestamp) are written to
+Accumulo, it is strongly recommended that the value in the timestamp properly 
reflects the intended version by
+the client. That is to say, newer values inserted into the table should have 
larger timestamps. If the time between
+writing updates to the same key is significant (order minutes), this concern 
can likely be ignored.

Reply via email to