accumulo git commit: ACCUMULO-3500 Update replication docs for bulk imports

elserj Thu, 22 Jan 2015 07:45:35 -0800

Repository: accumulo
Updated Branches:
  refs/heads/master 4b1196257 -> 80805545e



ACCUMULO-3500 Update replication docs for bulk imports


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/80805545
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/80805545
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/80805545

Branch: refs/heads/master
Commit: 80805545e7617bed41bfd5f50c0ba8032fd71d91
Parents: 4b11962
Author: Josh Elser <els...@apache.org>
Authored: Thu Jan 22 10:39:41 2015 -0500
Committer: Josh Elser <els...@apache.org>
Committed: Thu Jan 22 10:39:41 2015 -0500

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/replication.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/80805545/docs/src/main/asciidoc/chapters/replication.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/replication.txt 
b/docs/src/main/asciidoc/chapters/replication.txt
index 48f6ffa..69bb3c4 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -377,3 +377,13 @@ As is the recommendation without replication enabled, if 
multiple values for the
 Accumulo, it is strongly recommended that the value in the timestamp properly 
reflects the intended version by
 the client. That is to say, newer values inserted into the table should have 
larger timestamps. If the time between
 writing updates to the same key is significant (order minutes), this concern 
can likely be ignored.
+
+==== Bulk Imports
+
+Currently, files that are bulk imported into a table configured for 
replication are not replicated. There is no
+technical reason why it was not implemented, it was simply omitted from the 
initial implementation. This is considered a
+fair limitation because bulk importing generated files multiple locations is 
much simpler than bifurcating "live" ingest
+data into two instances. Given some existing bulk import process which creates 
files and them imports them into an
+Accumulo instance, it is trivial to copy those files to a new HDFS instance 
and import them into another Accumulo
+instance using the same process. Hadoop's +distcp+ command provides an easy 
way to copy large amounts of data to another
+HDFS instance which makes the problem of duplicating bulk imports very easy to 
solve.

accumulo git commit: ACCUMULO-3500 Update replication docs for bulk imports

Reply via email to