Repository: accumulo
Updated Branches:
  refs/heads/master 452732c8a -> 7a0788b6f


http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/asciidoc/chapters/troubleshooting.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt 
b/docs/src/main/asciidoc/chapters/troubleshooting.txt
deleted file mode 100644
index 0f01d46..0000000
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ /dev/null
@@ -1,845 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-## Troubleshooting
-
-### Logs
-
-*Q*: The tablet server does not seem to be running!? What happened?
-
-Accumulo is a distributed system.  It is supposed to run on remote
-equipment, across hundreds of computers.  Each program that runs on
-these remote computers writes down events as they occur, into a local
-file. By default, this is defined in +conf/accumulo-env.sh+ as 
+ACCUMULO_LOG_DIR+.
-
-*A*: Look in the +$ACCUMULO_LOG_DIR/tserver*.log+ file.  Specifically, check 
the end of the file.
-
-*Q*: The tablet server did not start and the debug log does not exists!  What 
happened?
-
-When the individual programs are started, the stdout and stderr output
-of these programs are stored in +.out+ and +.err+ files in
-+$ACCUMULO_LOG_DIR+.  Often, when there are missing configuration
-options, files or permissions, messages will be left in these files.
-
-*A*: Probably a start-up problem.  Look in +$ACCUMULO_LOG_DIR/tserver*.err+
-
-### Monitor
-
-*Q*: Accumulo is not working, what's wrong?
-
-There's a small web server that collects information about all the
-components that make up a running Accumulo instance. It will highlight
-unusual or unexpected conditions.
-
-*A*: Point your browser to the monitor (typically the master host, on port 
9995).  Is anything red or yellow?
-
-*Q*: My browser is reporting connection refused, and I cannot get to the 
monitor
-
-The monitor program's output is also written to .err and .out files in
-the +$ACCUMULO_LOG_DIR+. Look for problems in this file if the
-+$ACCUMULO_LOG_DIR/monitor*.log+ file does not exist.
-
-*A*: The monitor program is probably not running.  Check the log files for 
errors.
-
-*Q*: My browser hangs trying to talk to the monitor.
-
-Your browser needs to be able to reach the monitor program.  Often
-large clusters are firewalled, or use a VPN for internal
-communications. You can use SSH to proxy your browser to the cluster,
-or consult with your system administrator to gain access to the server
-from your browser.
-
-It is sometimes helpful to use a text-only browser to sanity-check the
-monitor while on the machine running the monitor:
-
-    $ links http://localhost:9995
-
-*A*: Verify that you are not firewalled from the monitor if it is running on a 
remote host.
-
-*Q*: The monitor responds, but there are no numbers for tservers and tables.  
The summary page says the master is down.
-
-The monitor program gathers all the details about the master and the
-tablet servers through the master. It will be mostly blank if the
-master is down.
-
-*A*: Check for a running master.
-
-### HDFS
-
-Accumulo reads and writes to the Hadoop Distributed File System.
-Accumulo needs this file system available at all times for normal operations.
-
-*Q*: Accumulo is having problems ``getting a block blk_1234567890123.'' How do 
I fix it?
-
-This troubleshooting guide does not cover HDFS, but in general, you
-want to make sure that all the datanodes are running and an fsck check
-finds the file system clean:
-
-    $ hadoop fsck /accumulo
-
-You can use:
-
-    $ hadoop fsck /accumulo/path/to/corrupt/file -locations -blocks -files
-
-to locate the block references of individual corrupt files and use those
-references to search the name node and individual data node logs to determine 
which
-servers those blocks have been assigned and then try to fix any underlying file
-system issues on those nodes.
-
-On a larger cluster, you may need to increase the number of Xcievers for HDFS 
DataNodes:
-
-[source,xml]
-<property>
-    <name>dfs.datanode.max.xcievers</name>
-    <value>4096</value>
-</property>
-
-*A*: Verify HDFS is healthy, check the datanode logs.
-
-### Zookeeper
-
-*Q*: +accumulo init+ is hanging.  It says something about talking to zookeeper.
-
-Zookeeper is also a distributed service.  You will need to ensure that
-it is up.  You can run the zookeeper command line tool to connect to
-any one of the zookeeper servers:
-
-    $ zkCli.sh -server zoohost
-    ...
-    [zk: zoohost:2181(CONNECTED) 0]
-
-It is important to see the word +CONNECTED+!  If you only see
-+CONNECTING+ you will need to diagnose zookeeper errors.
-
-*A*: Check to make sure that zookeeper is up, and that
-+accumulo-site.xml+ has been pointed to
-your zookeeper server(s).
-
-*Q*: Zookeeper is running, but it does not say +CONNECTED+
-
-Zookeeper processes talk to each other to elect a leader.  All updates
-go through the leader and propagate to a majority of all the other
-nodes.  If a majority of the nodes cannot be reached, zookeeper will
-not allow updates.  Zookeeper also limits the number connections to a
-server from any other single host.  By default, this limit can be as small as 
10
-and can be reached in some everything-on-one-machine test configurations.
-
-You can check the election status and connection status of clients by
-asking the zookeeper nodes for their status.  You connect to zookeeper
-and ask it with the four-letter +stat+ command:
-
-----
-$ nc zoohost 2181
-stat
-Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
-Clients:
- /127.0.0.1:58289[0](queued=0,recved=1,sent=0)
- /127.0.0.1:60231[1](queued=0,recved=53910,sent=53915)
-
-Latency min/avg/max: 0/5/3008
-Received: 1561459
-Sent: 1561592
-Connections: 2
-Outstanding: 0
-Zxid: 0x621a3b
-Mode: standalone
-Node count: 22524
-----
-
-*A*: Check zookeeper status, verify that it has a quorum, and has not exceeded 
maxClientCnxns.
-
-*Q*: My tablet server crashed!  The logs say that it lost its zookeeper lock.
-
-Tablet servers reserve a lock in zookeeper to maintain their ownership
-over the tablets that have been assigned to them.  Part of their
-responsibility for keeping the lock is to send zookeeper a keep-alive
-message periodically.  If the tablet server fails to send a message in
-a timely fashion, zookeeper will remove the lock and notify the tablet
-server.  If the tablet server does not receive a message from
-zookeeper, it will assume its lock has been lost, too.  If a tablet
-server loses its lock, it kills itself: everything assumes it is dead
-already.
-
-*A*: Investigate why the tablet server did not send a timely message to
-zookeeper.
-
-#### Keeping the tablet server lock
-
-*Q*: My tablet server lost its lock.  Why?
-
-The primary reason a tablet server loses its lock is that it has been pushed 
into swap.
-
-A large java program (like the tablet server) may have a large portion
-of its memory image unused.  The operation system will favor pushing
-this allocated, but unused memory into swap so that the memory can be
-re-used as a disk buffer.  When the java virtual machine decides to
-access this memory, the OS will begin flushing disk buffers to return that
-memory to the VM.  This can cause the entire process to block long
-enough for the zookeeper lock to be lost.
-
-*A*: Configure your system to reduce the kernel parameter _swappiness_ from 
the default (60) to zero.
-
-*Q*: My tablet server lost its lock, and I have already set swappiness to
-zero.  Why?
-
-Be careful not to over-subscribe memory.  This can be easy to do if
-your accumulo processes run on the same nodes as hadoop's map-reduce
-framework.  Remember to add up:
-
-* size of the JVM for the tablet server
-* size of the in-memory map, if using the native map implementation
-* size of the JVM for the data node
-* size of the JVM for the task tracker
-* size of the JVM times the maximum number of mappers and reducers
-* size of the kernel and any support processes
-
-If a 16G node can run 2 mappers and 2 reducers, and each can be 2G,
-then there is only 8G for the data node, tserver, task tracker and OS.
-
-*A*: Reduce the memory footprint of each component until it fits comfortably.
-
-*Q*: My tablet server lost its lock, swappiness is zero, and my node has lots 
of unused memory!
-
-The JVM memory garbage collector may fall behind and cause a
-"stop-the-world" garbage collection. On a large memory virtual
-machine, this collection can take a long time.  This happens more
-frequently when the JVM is getting low on free memory.  Check the logs
-of the tablet server.  You will see lines like this:
-
-    2013-06-20 13:43:20,607 [tabletserver.TabletServer] DEBUG: gc 
ParNew=0.00(+0.00) secs
-        ConcurrentMarkSweep=0.00(+0.00) secs 
freemem=1,868,325,952(+1,868,325,952) totalmem=2,040,135,680
-
-When +freemem+ becomes small relative to the amount of memory
-needed, the JVM will spend more time finding free memory than
-performing work.  This can cause long delays in sending keep-alive
-messages to zookeeper.
-
-*A*: Ensure the tablet server JVM is not running low on memory.
-
-*Q*: I'm seeing errors in tablet server logs that include the words 
"MutationsRejectedException" and "# constraint violations: 1". Moments after 
that the server died.
-
-The error you are seeing is part of a failing tablet server scenario.
-This is a bit complicated, so name two of your tablet servers A and B.
-
-Tablet server A is hosting a tablet, let's call it a-tablet.
-
-Tablet server B is hosting a metadata tablet, let's call it m-tablet.
-
-m-tablet records the information about a-tablet, for example, the names of the 
files it is using to store data.
-
-When A ingests some data, it eventually flushes the updates from memory to a 
file.
-
-Tablet server A then writes this new information to m-tablet, on Tablet server 
B.
-
-Here's a likely failure scenario:
-
-Tablet server A does not have enough memory for all the processes running on 
it.
-The operating system sees a large chunk of the tablet server being unused, and 
swaps it out to disk to make room for other processes.
-Tablet server A does a java memory garbage collection, which causes it to 
start using all the memory allocated to it.
-As the server starts pulling data from swap, it runs very slowly.
-It fails to send the keep-alive messages to zookeeper in a timely fashion, and 
it looses its zookeeper session.
-
-But, it's running so slowly, that it takes a moment to realize it should no 
longer be hosting tablets.
-
-The thread that is flushing a-tablet memory attempts to update m-tablet with 
the new file information.
-
-Fortunately there's a constraint on m-tablet.
-Mutations to the metadata table must contain a valid zookeeper session.
-This prevents tablet server A from making updates to m-tablet when it no long 
has the right to host the tablet.
-
-The "MutationsRejectedException" error is from tablet server A making an 
update to tablet server B's m-tablet.
-It's getting a constraint violation: tablet server A has lost its zookeeper 
session, and will fail momentarily.
-
-*A*: Ensure that memory is not over-allocated.  Monitor swap usage, or turn 
swap off.
-
-*Q*: My accumulo client is getting a MutationsRejectedException. The monitor 
is displaying "No Such SessionID" errors.
-
-When your client starts sending mutations to accumulo, it creates a session. 
Once the session is created,
-mutations are streamed to accumulo, without acknowledgement, against this 
session.  Once the client is done,
-it will close the session, and get an acknowledgement.
-
-If the client fails to communicate with accumulo, it will release the session, 
assuming that the client has died.
-If the client then attempts to send more mutations against the session, you 
will see "No Such SessionID" errors on
-the server, and MutationRejectedExceptions in the client.
-
-The client library should be either actively using the connection to the 
tablet servers,
-or closing the connection and sessions. If the session times out, something is 
causing your client
-to pause.
-
-The most frequent source of these pauses are java garbage collection pauses
-due to the JVM running out of memory, or being swapped out to disk.
-
-*A*: Ensure your client has adequate memory and is not being swapped out to 
disk.
-
-### Tools
-
-The accumulo script can be used to run classes from the command line.
-This section shows how a few of the utilities work, but there are many
-more.
-
-There's a class that will examine an accumulo storage file and print
-out basic metadata.
-
-----
-$ accumulo org.apache.accumulo.core.file.rfile.PrintInfo 
/accumulo/tables/1/default_tablet/A000000n.rf
-2013-07-16 08:17:14,778 [util.NativeCodeLoader] INFO : Loaded the 
native-hadoop library
-Locality group         : <DEFAULT>
-        Start block          : 0
-        Num   blocks         : 1
-        Index level 0        : 62 bytes  1 blocks
-        First key            : 288be9ab4052fe9e 
span:34078a86a723e5d3:3da450f02108ced5 [] 1373373521623 false
-        Last key             : start:13fc375709e id:615f5ee2dd822d7a [] 
1373373821660 false
-        Num entries          : 466
-        Column families      : [waitForCommits, start, md major compactor 1, 
md major compactor 2, md major compactor 3,
-                                 bringOnline, prep, md major compactor 4, md 
major compactor 5, md root major compactor 3,
-                                 minorCompaction, wal, compactFiles, md root 
major compactor 4, md root major compactor 1,
-                                 md root major compactor 2, compact, id, 
client:update, span, update, commit, write,
-                                 majorCompaction]
-
-Meta block     : BCFile.index
-      Raw size             : 4 bytes
-      Compressed size      : 12 bytes
-      Compression type     : gz
-
-Meta block     : RFile.index
-      Raw size             : 780 bytes
-      Compressed size      : 344 bytes
-      Compression type     : gz
-----
-
-When trying to diagnose problems related to key size, the +PrintInfo+ tool can 
provide a histogram of the individual key sizes:
-
-    $ accumulo org.apache.accumulo.core.file.rfile.PrintInfo --histogram 
/accumulo/tables/1/default_tablet/A000000n.rf
-    ...
-    Up to size      count      %-age
-             10 :        222  28.23%
-            100 :        244  71.77%
-           1000 :          0   0.00%
-          10000 :          0   0.00%
-         100000 :          0   0.00%
-        1000000 :          0   0.00%
-       10000000 :          0   0.00%
-      100000000 :          0   0.00%
-     1000000000 :          0   0.00%
-    10000000000 :          0   0.00%
-
-Likewise, +PrintInfo+ will dump the key-value pairs and show you the contents 
of the RFile:
-
-    $ accumulo org.apache.accumulo.core.file.rfile.PrintInfo --dump 
/accumulo/tables/1/default_tablet/A000000n.rf
-    row columnFamily:columnQualifier [visibility] timestamp deleteFlag -> Value
-    ...
-
-*Q*: Accumulo is not showing me any data!
-
-*A*: Do you have your auths set so that it matches your visibilities?
-
-*Q*: What are my visibilities?
-
-*A*: Use +PrintInfo+ on a representative file to get some idea of the 
visibilities in the underlying data.
-
-Note that the use of +PrintInfo+ is an administrative tool and can only
-by used by someone who can access the underlying Accumulo data. It
-does not provide the normal access controls in Accumulo.
-
-If you would like to backup, or otherwise examine the contents of Zookeeper, 
there are commands to dump and load to/from XML.
-
-    $ accumulo org.apache.accumulo.server.util.DumpZookeeper --root /accumulo 
>dump.xml
-    $ accumulo org.apache.accumulo.server.util.RestoreZookeeper --overwrite < 
dump.xml
-
-*Q*: How can I get the information in the monitor page for my cluster 
monitoring system?
-
-*A*: Use GetMasterStats:
-
-    $ accumulo org.apache.accumulo.test.GetMasterStats | grep Load
-     OS Load Average: 0.27
-
-*Q*: The monitor page is showing an offline tablet.  How can I find out which 
tablet it is?
-
-*A*: Use FindOfflineTablets:
-
-    $ accumulo org.apache.accumulo.server.util.FindOfflineTablets
-    2<<@(null,null,localhost:9997) is UNASSIGNED  #walogs:2
-
-Here's what the output means:
-
-+2<<+::
-    This is the tablet from (-inf, pass:[+]inf) for the
-    table with id 2.  The command +tables -l+ in the shell will show table ids 
for
-    tables.
-
-+@(null, null, localhost:9997)+::
-    Location information.  The
-    format is +@(assigned, hosted, last)+.  In this case, the
-    tablet has not been assigned, is not hosted anywhere, and was once
-    hosted on localhost.
-
-+#walogs:2+::
-     The number of write-ahead logs that this tablet requires for recovery.
-
-An unassigned tablet with write-ahead logs is probably waiting for
-logs to be sorted for efficient recovery.
-
-*Q*: How can I be sure that the metadata tables are up and consistent?
-
-*A*: +CheckForMetadataProblems+ will verify the start/end of
-every tablet matches, and the start and stop for the table is empty:
-
-    $ accumulo org.apache.accumulo.server.util.CheckForMetadataProblems -u 
root --password
-    Enter the connection password:
-    All is well for table !0
-    All is well for table 1
-
-*Q*: My hadoop cluster has lost a file due to a NameNode failure.  How can I 
remove the file?
-
-*A*: There's a utility that will check every file reference and ensure
-that the file exists in HDFS.  Optionally, it will remove the
-reference:
-
-    $ accumulo org.apache.accumulo.server.util.RemoveEntriesForMissingFiles -u 
root --password
-    Enter the connection password:
-    2013-07-16 13:10:57,293 [util.RemoveEntriesForMissingFiles] INFO : File 
/accumulo/tables/2/default_tablet/F0000005.rf
-     is missing
-    2013-07-16 13:10:57,296 [util.RemoveEntriesForMissingFiles] INFO : 1 files 
of 3 missing
-
-*Q*: I have many entries in zookeeper for old instances I no longer need.  How 
can I remove them?
-
-*A*: Use CleanZookeeper:
-
-    $ accumulo org.apache.accumulo.server.util.CleanZookeeper
-
-This command will not delete the instance pointed to by the local 
+accumulo-site.xml+ file.
-
-*Q*: I need to decommission a node.  How do I stop the tablet server on it?
-
-*A*: Use the admin command:
-
-    $ accumulo admin stop hostname:9997
-    2013-07-16 13:15:38,403 [util.Admin] INFO : Stopping server 
12.34.56.78:9997
-
-*Q*: I cannot login to a tablet server host, and the tablet server will not 
shut down.  How can I kill the server?
-
-*A*: Sometimes you can kill a "stuck" tablet server by deleting its lock in 
zookeeper:
-
-    $ accumulo org.apache.accumulo.server.util.TabletServerLocks --list
-                      127.0.0.1:9997 TSERV_CLIENT=127.0.0.1:9997
-    $ accumulo org.apache.accumulo.server.util.TabletServerLocks -delete 
127.0.0.1:9997
-    $ accumulo org.apache.accumulo.server.util.TabletServerLocks -list
-                      127.0.0.1:9997             null
-
-You can find the master and instance id for any accumulo instances using the 
same zookeeper instance:
-
-----
-$ accumulo org.apache.accumulo.server.util.ListInstances
-INFO : Using ZooKeepers localhost:2181
-
- Instance Name       | Instance ID                          | Master
----------------------+--------------------------------------+-------------------------------
-              "test" | 6140b72e-edd8-4126-b2f5-e74a8bbe323b |                
127.0.0.1:9999
-----
-
-[[metadata]]
-### System Metadata Tables
-
-Accumulo tracks information about tables in metadata tables. The metadata for
-most tables is contained within the metadata table in the accumulo namespace,
-while metadata for that table is contained in the root table in the accumulo
-namespace. The root table is composed of a single tablet, which does not
-split, so it is also called the root tablet. Information about the root
-table, such as its location and write-ahead logs, are stored in ZooKeeper.
-
-Let's create a table and put some data into it:
-
-----
-shell> createtable test
-
-shell> tables -l
-accumulo.metadata    =>        !0
-accumulo.root        =>        +r
-test                 =>         2
-trace                =>         1
-
-shell> insert a b c d
-
-shell> flush -w
-----
-
-Now let's take a look at the metadata for this table:
-
-    shell> table accumulo.metadata
-    shell> scan -b 3; -e 3<
-    3< file:/default_tablet/F000009y.rf []    186,1
-    3< last:13fe86cd27101e5 []    127.0.0.1:9997
-    3< loc:13fe86cd27101e5 []    127.0.0.1:9997
-    3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    
127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6
-    3< srv:dir []    /default_tablet
-    3< srv:flush []    1
-    3< srv:lock []    tservers/127.0.0.1:9997/zlock-0000000001$13fe86cd27101e5
-    3< srv:time []    M1373998392323
-    3< ~tab:~pr []    \x00
-
-Let's decode this little session:
-
-+scan -b 3; -e 3<+::   Every tablet gets its own row. Every row starts with 
the table id followed by
-    +;+ or +<+, and followed by the end row split point for that tablet.
-
-+file:/default_tablet/F000009y.rf [] 186,1+::
-    File entry for this tablet.  This tablet contains a single file reference. 
The
-    file is +/accumulo/tables/3/default_tablet/F000009y.rf+.  It contains 1
-    key/value pair, and is 186 bytes long.
-
-+last:13fe86cd27101e5 []    127.0.0.1:9997+::
-    Last location for this tablet.  It was last held on 127.0.0.1:9997, and the
-    unique tablet server lock data was +13fe86cd27101e5+. The default balancer
-    will tend to put tablets back on their last location.
-
-+loc:13fe86cd27101e5 []    127.0.0.1:9997+::
-    The current location of this tablet.
-
-+log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    127.0. ...+::
-    This tablet has a reference to a single write-ahead log. This file can be 
found in
-    +/accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995+. The 
value
-    of this entry could refer to multiple files. This tablet's data is encoded 
as
-    +6+ within the log.
-
-+srv:dir []    /default_tablet+::
-    Files written for this tablet will be placed into
-    +/accumulo/tables/3/default_tablet+.
-
-+srv:flush []    1+::
-    Flush id.  This table has successfully completed the flush with the id of 
+1+.
-
-+srv:lock []    tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5+::
-    This is the lock information for the tablet holding the present lock.  This
-    information is checked against zookeeper whenever this is updated, which
-    prevents a metadata update from a tablet server that no longer holds its
-    lock.
-
-+srv:time []    M1373998392323+::
-    This indicates the time time type (+M+ for milliseconds or +L+ for 
logical) and the timestamp of the most recently written key in this tablet.  It 
is used to ensure automatically assigned key timestamps are strictly increasing 
for the tablet, regardless of the tablet server's system time.
-
-`~tab:~pr []    \x00`::
-    The end-row marker for the previous tablet (prev-row).  The first byte
-    indicates the presence of a prev-row.  This tablet has the range (-inf, 
+inf),
-    so it has no prev-row (or end row).
-
-Besides these columns, you may see:
-
-+rowId future:zooKeeperID location+::
-    Tablet has been assigned to a tablet, but not yet loaded.
-
-+~del:filename+::
-    When a tablet server is done use a file, it will create a delete marker in 
the appropriate metadata table, unassociated with any tablet.  The garbage 
collector will remove the marker, and the file, when no other reference to the 
file exists.
-
-+~blip:txid+::
-    Bulk-Load In Progress marker.
-
-+rowId loaded:filename+::
-    A file has been bulk-loaded into this tablet, however the bulk load has 
not yet completed on other tablets, so this marker prevents the file from being 
loaded multiple times.
-
-+rowId !cloned+::
-    A marker that indicates that this tablet has been successfully cloned.
-
-+rowId splitRatio:ratio+::
-    A marker that indicates a split is in progress, and the files are being 
split at the given ratio.
-
-+rowId chopped+::
-    A marker that indicates that the files in the tablet do not contain keys 
outside the range of the tablet.
-
-+rowId scan+::
-    A marker that prevents a file from being removed while there are still 
active scans using it.
-
-### Simple System Recovery
-
-*Q*: One of my Accumulo processes died. How do I bring it back?
-
-The easiest way to bring all services online for an Accumulo instance is to 
run the +accumulo-cluster+ script.
-
-    $ accumulo-cluster start
-
-This process will check the process listing, using +jps+ on each host before 
attempting to restart a service on the given host.
-Typically, this check is sufficient except in the face of a hung/zombie 
process. For large clusters, it may be
-undesirable to ssh to every node in the cluster to ensure that all hosts are 
running the appropriate processes and +accumulo-service+ may be of use.
-
-    $ ssh host_with_dead_process
-    $ accumulo-service tserver start
-
-*Q*: My process died again. Should I restart it via +cron+ or tools like 
+supervisord+?
-
-*A*: A repeatedly dying Accumulo process is a sign of a larger problem. 
Typically these problems are due to a
-misconfiguration of Accumulo or over-saturation of resources. Blind automation 
of any service restart inside of Accumulo
-is generally an undesirable situation as it is indicative of a problem that is 
being masked and ignored. Accumulo
-processes should be stable on the order of months and not require frequent 
restart.
-
-### Advanced System Recovery
-
-#### HDFS Failure
-*Q*: I had disasterous HDFS failure.  After bringing everything back up, 
several tablets refuse to go online.
-
-Data written to tablets is written into memory before being written into 
indexed files.  In case the server
-is lost before the data is saved into a an indexed file, all data stored in 
memory is first written into a
-write-ahead log (WAL).  When a tablet is re-assigned to a new tablet server, 
the write-ahead logs are read to
-recover any mutations that were in memory when the tablet was last hosted.
-
-If a write-ahead log cannot be read, then the tablet is not re-assigned.  All 
it takes is for one of
-the blocks in the write-ahead log to be missing.  This is unlikely unless 
multiple data nodes in HDFS have been
-lost.
-
-*A*: Get the WAL files online and healthy.  Restore any data nodes that may be 
down.
-
-*Q*: How do find out which tablets are offline?
-
-*A*: Use +accumulo admin checkTablets+
-
-    $ accumulo admin checkTablets
-
-*Q*: I lost three data nodes, and I'm missing blocks in a WAL.  I don't care 
about data loss, how
-can I get those tablets online?
-
-See the discussion in <<metadata>>, which shows a typical metadata table 
listing.
-The entries with a column family of +log+ are references to the WAL for that 
tablet.
-If you know what WAL is bad, you can find all the references with a grep in 
the shell:
-
-    shell> grep 0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995
-    3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 []    
127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6
-
-*A*: You can remove the WAL references in the metadata table.
-
-    shell> grant -u root Table.WRITE -t accumulo.metadata
-    shell> delete 3< log 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995
-
-Note: the colon (+:+) is omitted when specifying the _row cf cq_ for the 
delete command.
-
-The master will automatically discover the tablet no longer has a bad WAL 
reference and will
-assign the tablet.  You will need to remove the reference from all the tablets 
to get them
-online.
-
-
-*Q*: The metadata (or root) table has references to a corrupt WAL.
-
-This is a much more serious state, since losing updates to the metadata table 
will result
-in references to old files which may not exist, or lost references to new 
files, resulting
-in tablets that cannot be read, or large amounts of data loss.
-
-The best hope is to restore the WAL by fixing HDFS data nodes and bringing the 
data back online.
-If this is not possible, the best approach is to re-create the instance and 
bulk import all files from
-the old instance into a new tables.
-
-A complete set of instructions for doing this is outside the scope of this 
guide,
-but the basic approach is:
-
-* Use +tables -l+ in the shell to discover the table name to table id mapping
-* Stop all accumulo processes on all nodes
-* Move the accumulo directory in HDFS out of the way:
-       $ hadoop fs -mv /accumulo /corrupt
-* Re-initalize accumulo
-* Recreate tables, users and permissions
-* Import the directories under +/corrupt/tables/<id>+ into the new instance
-
-*Q*: One or more HDFS Files under /accumulo/tables are corrupt
-
-Accumulo maintains multiple references into the tablet files in the metadata
-tables and within the tablet server hosting the file, this makes it difficult 
to
-reliably just remove those references.
-
-The directory structure in HDFS for tables will follow the general structure:
-
-  /accumulo
-  /accumulo/tables/
-  /accumulo/tables/!0
-  /accumulo/tables/!0/default_tablet/A000001.rf
-  /accumulo/tables/!0/t-00001/A000002.rf
-  /accumulo/tables/1
-  /accumulo/tables/1/default_tablet/A000003.rf
-  /accumulo/tables/1/t-00001/A000004.rf
-  /accumulo/tables/1/t-00001/A000005.rf
-  /accumulo/tables/2/default_tablet/A000006.rf
-  /accumulo/tables/2/t-00001/A000007.rf
-
-If files under +/accumulo/tables+ are corrupt, the best course of action is to
-recover those files in hdsf see the section on HDFS. Once these recovery 
efforts
-have been exhausted, the next step depends on where the missing file(s) are
-located. Different actions are required when the bad files are in Accumulo data
-table files or if they are metadata table files.
-
-*Data File Corruption*
-
-When an Accumulo data file is corrupt, the most reliable way to restore 
Accumulo
-operations is to replace the missing file with an ``empty'' file so that
-references to the file in the METADATA table and within the tablet server
-hosting the file can be resolved by Accumulo. An empty file can be created 
using
-the CreateEmpty utiity:
-
-  $ accumulo org.apache.accumulo.core.file.rfile.CreateEmpty 
/path/to/empty/file/empty.rf
-
-The process is to delete the corrupt file and then move the empty file into its
-place (The generated empty file can be copied and used multiple times if 
necessary and does not need
-to be regenerated each time)
-
-  $ hadoop fs –rm /accumulo/tables/corrupt/file/thename.rf; \
-  hadoop fs -mv /path/to/empty/file/empty.rf 
/accumulo/tables/corrupt/file/thename.rf
-
-*Metadata File Corruption*
-
-If the corrupt files are metadata files, see <<metadata>> (under the path
-+/accumulo/tables/!0+) then you will need to rebuild
-the metadata table by initializing a new instance of Accumulo and then 
importing
-all of the existing data into the new instance.  This is the same procedure as
-recovering from a zookeeper failure (see <<zookeeper_failure>>), except that
-you will have the benefit of having the existing user and table authorizations
-that are maintained in zookeeper.
-
-You can use the DumpZookeeper utility to save this information for reference
-before creating the new instance.  You will not be able to use RestoreZookeeper
-because the table names and references are likely to be different between the
-original and the new instances, but it can serve as a reference.
-
-*A*: If the files cannot be recovered, replace corrupt data files with a empty
-rfiles to allow references in the metadata table and in the tablet servers to 
be
-resolved. Rebuild the metadata table if the corrupt files are metadata files.
-
-*Write-Ahead Log(WAL) File Corruption*
-
-In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
-or a bug in Accumulo that created the file) can block the successful recovery
-of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
-WAL file, never being able to succeed.
-
-In the cases where the WAL file's original contents are unrecoverable or some 
degree
-of data loss is acceptable (beware if the WAL file contains updates to the 
Accumulo
-metadat table!), the following process can be followed to create an valid, 
empty
-WAL file. Run the following commands as the Accumulo unix user (to ensure that
-the proper file permissions in HDFS)
-
-  $ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' > empty.wal
-
-The above creates a file with the text "--- Log File Header (v2) ---" and then
-four bytes. You should verify the contents of the file with a hexdump tool.
-
-Then, place this empty WAL in HDFS and then replace the corrupt WAL file in 
HDFS
-with the empty WAL.
-
-  $ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
-  $ hdfs dfs -mv /user/accumulo/empty.wal 
/accumulo/wal/tserver-4.example.com+10011/26abec5b-63e7-40dd-9fa1-b8ad2436606e
-
-After the corrupt WAL file has been replaced, the system should automatically 
recover.
-It may be necessary to restart the Accumulo Master process as an exponential
-backup policy is used which could lead to a long wait before Accumulo will
-try to re-load the WAL file.
-
-[[zookeeper_failure]]
-#### ZooKeeper Failure
-*Q*: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. 
How can I recover my Accumulo instance?
-
-ZooKeeper, in addition to its lock-service capabilities, also serves to 
bootstrap an Accumulo
-instance from some location in HDFS. It contains the pointers to the root 
tablet in HDFS which
-is then used to load the Accumulo metadata tablets, which then loads all user 
tables. ZooKeeper
-also stores all namespace and table configuration, the user database, the 
mapping of table IDs to
-table names, and more across Accumulo restarts.
-
-Presently, the only way to recover such an instance is to initialize a new 
instance and import all
-of the old data into the new instance. The easiest way to tackle this problem 
is to first recreate
-the mapping of table ID to table name and then recreate each of those tables 
in the new instance.
-Set any necessary configuration on the new tables and add some split points to 
the tables to close
-the gap between how many splits the old table had and no splits.
-
-The directory structure in HDFS for tables will follow the general structure:
-
-    /accumulo
-    /accumulo/tables/
-    /accumulo/tables/1
-    /accumulo/tables/1/default_tablet/A000001.rf
-    /accumulo/tables/1/t-00001/A000002.rf
-    /accumulo/tables/1/t-00001/A000003.rf
-    /accumulo/tables/2/default_tablet/A000004.rf
-    /accumulo/tables/2/t-00001/A000005.rf
-
-For each table, make a new directory that you can move (or copy if you have 
the HDFS space to do so)
-all of the rfiles for a given table into. For example, to process the table 
with an ID of +1+, make a new directory,
-say +/new-table-1+ and then copy all files from +/accumulo/tables/1/\*/*.rf+ 
into that directory. Additionally,
-make a directory, +/new-table-1-failures+, for any failures during the import 
process. Then, issue the import
-command using the Accumulo shell into the new table, telling Accumulo to not 
re-set the timestamp:
-
-    user@instance new_table> importdirectory /new-table-1 
/new-table-1-failures false
-
-Any RFiles which were failed to be loaded will be placed in 
+/new-table-1-failures+. Rfiles that were successfully
-imported will no longer exist in +/new-table-1+. For failures, move them back 
to the import directory and retry
-the +importdirectory+ command.
-
-It is *extremely* important to note that this approach may introduce stale 
data back into
-the tables. For a few reasons, RFiles may exist in the table directory which 
are candidates for deletion but have
-not yet been deleted. Additionally, deleted data which was not compacted away, 
but still exists in write-ahead logs if
-the original instance was somehow recoverable, will be re-introduced in the 
new instance. Table splits and merges
-(which also include the deleteRows API call on TableOperations, are also 
vulnerable to this problem. This process should
-*not* be used if these are unacceptable risks. It is possible to try to 
re-create a view of the +accumulo.metadata+
-table to prune out files that are candidates for deletion, but this is a 
difficult task that also may not be entirely accurate.
-
-Likewise, it is also possible that data loss may occur from write-ahead log 
(WAL) files which existed on the old table but
-were not minor-compacted into an RFile. Again, it may be possible to 
reconstruct the state of these WAL files to
-replay data not yet in an RFile; however, this is a difficult task and is not 
implemented in any automated fashion.
-
-*A*: The +importdirectory+ shell command can be used to import RFiles from the 
old instance into a newly created instance,
-but extreme care should go into the decision to do this as it may result in 
reintroduction of stale data or the
-omission of new data.
-
-### Upgrade Issues
-
-*Q*: I upgraded from 1.4 to 1.5 to 1.6 but still have some WAL files on local 
disk. Do I have any way to recover them?
-
-*A*: Yes, you can recover them by running the LocalWALRecovery utility (not 
available in 1.8 and later) on each node that needs recovery performed. The 
utility
-will default to using the directory specified by +logger.dir.walog+ in your 
configuration, or can be
-overriden by using the +--local-wal-directories+ option on the tool. It can be 
invoked as follows:
-
-  accumulo org.apache.accumulo.tserver.log.LocalWALRecovery
-
-### File Naming Conventions
-
-*Q*: Why are files named like they are? Why do some start with +C+ and others 
with +F+?
-
-*A*: The file names give you a basic idea for the source of the file.
-
-The base of the filename is a base-36 unique number. All filenames in accumulo 
are coordinated
-with a counter in zookeeper, so they are always unique, which is useful for 
debugging.
-
-The leading letter gives you an idea of how the file was created:
-
-+F+::
-    Flush: entries in memory were written to a file (Minor Compaction)
-
-+M+::
-    Merging compaction: entries in memory were combined with the smallest file 
to create one new file
-
-+C+::
-    Several files, but not all files, were combined to produce this file 
(Major Compaction)
-
-+A+::
-    All files were compacted, delete entries were dropped
-
-+I+::
-    Bulk import, complete, sorted index files. Always in a directory starting 
with +b-+
-
-This simple file naming convention allows you to see the basic structure of 
the files from just
-their filenames, and reason about what should be happening to them next, just
-by scanning their entries in the metadata tables.
-
-For example, if you see multiple files with +M+ prefixes, the tablet is, or 
was, up against its
-maximum file limit, so it began merging memory updates with files to keep the 
file count reasonable.  This
-slows down ingest performance, so knowing there are many files like this tells 
you that the system
-is struggling to keep up with ingest vs the compaction strategy which reduces 
the number of files.
-
-### HDFS Decommissioning Issues
-
-*Q*: My Hadoop DataNode is hung for hours trying to decommission.
-
-*A*: Write Ahead Logs stay open until they hit the size threshold, which could 
be many hours or days in some cases. These open files will prevent a DN from 
finishing its decommissioning process (HDFS-3599) in some versions of Hadoop 2. 
If you stop the DN, then the WALog file will not be closed and you could lose 
data. To work around this issue, we now close WALogs on a time period specified 
by the property +tserver.walog.max.age+ with a default period of 24 hours.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/asciidoc/images/accumulo-logo.png
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/images/accumulo-logo.png 
b/docs/src/main/asciidoc/images/accumulo-logo.png
deleted file mode 100644
index 5b0f6b4..0000000
Binary files a/docs/src/main/asciidoc/images/accumulo-logo.png and /dev/null 
differ

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/asciidoc/images/data_distribution.png
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/images/data_distribution.png 
b/docs/src/main/asciidoc/images/data_distribution.png
deleted file mode 100644
index 7f18d3f..0000000
Binary files a/docs/src/main/asciidoc/images/data_distribution.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/asciidoc/images/failure_handling.png
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/images/failure_handling.png 
b/docs/src/main/asciidoc/images/failure_handling.png
deleted file mode 100644
index c131de6..0000000
Binary files a/docs/src/main/asciidoc/images/failure_handling.png and /dev/null 
differ

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/resources/design/ACCUMULO-378-design.mdtext
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/design/ACCUMULO-378-design.mdtext 
b/docs/src/main/resources/design/ACCUMULO-378-design.mdtext
deleted file mode 100644
index d5f46ef..0000000
--- a/docs/src/main/resources/design/ACCUMULO-378-design.mdtext
+++ /dev/null
@@ -1,468 +0,0 @@
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements.  See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to You under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License.  You may obtain a copy of the License at
-
-    http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
-
-Accumulo Multi-DataCenter Replication
-=====================================
-
-ACCUMULO-378 deals with disaster recovery techniques in Accumulo through 
cross-site replication of tables. Data which is
-written to one Accumulo instance will automatically be replicated to a 
separate Accumulo instance.
-
-
-Justification
--------------
-
-Losing an entire instance really stinks. In addition to natural disasters or 
facility problems, Hadoop always has the
-potential for failure. In the newest versions of Hadoop, the high availability 
(HA) namenode functionality increases the
-redundancy of Hadoop in regards to the single point of failure which the 
namenode previously was. Despite this, there is
-always a varying amount of required administrative intervention to ensure that 
failure does not result in data loss:
-userspace software (the entire Hadoop and Java stack), kernel-space software 
(filesystem implementations), “expected”
-hardware failures (hard drives), unexpected compute hardware failures (NICs, 
CPU, Memory), and infrastructure failures
-(switches and routers). Accumulo currently has the ability for manual 
snapshots/copies across multiple instances;
-however, this is not sufficient for multiple reasons with the biggest reason 
being a lack of automated replication.
-
-
-Background
-----------
-
-Apache HBase has had master-master replication, cyclic replication and 
multi-peer replication since 0.92. This
-satisfies a wide range of cross-site replication strategies. Master-master 
replication lets us have two systems which
-both replicate to each other. Both systems can service new writes and will 
update their “view” of a table from one
-another. Cyclic replication allows us to have cycles in our replication graph. 
This is a generalization of the
-master-master strategy in which we may have ultimately have a system which 
replicates to a system that it receives data
-from. A system with three masters, A, B and C, which replicate in a row (A to 
B, B to C and C to A) is an example of
-this. More complicated examples of this can be envisioned when dealing with 
multiple replicas inside one geographic
-region or data center. Multi-peer replication is a relatively simple in that a 
single master system will replicate to
-multiple peers instead of just one.
-
-
-While these are relatively different to one another, I believe most can be 
satisfied through a single, master-push,
-      replication implementation. Although, the proposed data structure should 
also be capable of supporting a
-      peer-pull strategy.
-
-
-Implementation
---------------
-
-As a first implementation, I will prototype a single master with multiple peer 
replication strategy. This should grant
-us the most flexibility and the most functionality. The general implementation 
should be capable of application to the
-other replication structures (master-master and cyclic-replication). I’ll 
outline a simple master-peer replication use
-case, followed by application of this approach to replication cycles and 
master-master replication. This approach does
-not consider conditional mutations.
-
-
-### Replication Framework
-
-In an attempt to be as clear as possible, I’ll use the following terminology 
when explaining the implementation: master
-will refer to the “master” Accumulo cluster (the system accepting new 
writes), peer will refer to the “peer” Accumulo
-cluster (the system which does not receive new data through the Accumulo 
client API, but only from master through
-        replication). The design results in an eventual consistency model of 
replication which will allow for peers to
-be offline and the online master to still process new updates.
-
-
-In the simplest notion, when a new file is created by master, we want to 
ensure that this file is also sent to the
-peer. In practice, this new file can either be an RFile that was bulk-imported 
to master or this can be a write-ahead
-log (WAL) file. The bulk-imported RFile is the easy case, but the WAL case 
merits additional explanation. While data is
-being written to Accumulo is it written to a sorted, in-memory map and an 
append-only WAL file. While the in-memory map
-provides a very useful interface for the TabletServer to use for scans and 
compactions, it is difficult to extract new
-updates at the RFile level. As such, this proposed implementation uses the WAL 
as the transport “file format”[a]. While
-it is noted that in sending a WAL to multiple peers, each peer will need to 
reprocess each WAL to make Mutations to
-apply whereas they could likely be transformed once, that is left as a future 
optimization.
-
-
-To increase the speed in eventual consistency can be achieved, WAL offsets can 
be tracked to begin the replication
-process before a WAL is closed. We can bin these mutations together for a lazy 
replication which can be combined to each
-target server which amortizes the cost into a single write set message. It is 
not apparent that this requires
-co-location within each source tablet in the Accumulo metadata table which 
means that the worry of inadvertent errors
-caused by placing this data in the metadata table is entirely removed.
-
-
-In every replication graph, which consists of master(s) and peer(s), each 
system should have a unique identifier. It is
-desirable to be able to uniquely identify each system, and each system should 
have knowledge of the other systems
-participating.
-
-
-These identifiers also make implementing cyclic replication easier, as a 
cluster can ignore any requests to replicate
-some data when that request already contains the current cluster’s 
identifier. In other words, data we try to replicate
-will contain a linked list of identifiers with the provenance of where that 
data came and each cluster can make the
-determination of whether or not it has seen this data already (and thus needs 
to process and propagate it). This also
-lets us treat replication rules as a graph which grants us a common 
terminology to use when describing replication.
-
-
-This framework provides a general strategy to allow pluggable replication 
strategies to export data out of an Accumulo
-cluster. An AccumuloReplicationStrategy is the only presently targeted 
replication strategy; however, the implementation
-should not prohibit alternative approaches to replication such as other 
databases or filesystems.
-
-
-### Replication Strategy Implementation
-
-
-Henceforth, both of the RFiles and WAL files that need replication can be 
treated as a chunk of data. This chunk
-references a start offset and length from the source (RFile or WAL) which 
needs to be replicated. This has the nice
-property of being able to use a Combiner to combine multiple, sequential 
chunks into one larger chunk to amortize RPC
-costs.
-
-
-#### Make the master aware of file to replicate
-
-
-Let us define a column family that is used to denote a chunk that needs to be 
replicated: REPL. We first need to let
-master know that it has a new chunk which needs to be replicated. When the 
file comes from a bulk-import, we need to
-create a new entry in the !METADATA table for the given tablet with the REPL 
column family. If the file is a WAL, we
-also want to write an entry for the REPL column[b]. In both cases, the 
chunk’s URI will be stored in the column
-qualifier. The Value can contain some serialized data structure to track 
cluster replication provenance and offset
-values. Each row (tablet) in the !METADATA table will contain zero to many 
REPL columns. As such, the garbage collector
-needs to be modified to not delete these files on the master’s HDFS instance 
until these files are replicated (copied to
-        the peer).
-
-
-#### Choose local TabletServer to perform replication
-
-
-The Accumulo Master can have a thread that scans the replication table to look 
for chunks to replicate. When it finds
-some, choose a TabletServer to perform the replication to all peers. The 
master should use a FATE operation to manage
-the state machine of this replication process. The expected principles, such 
as exponential backoff on network errors,
-    should be followed. When all peers have reported successfully receiving 
the file, the master can remove the REPL
-    column for the given chunk. 
-
-
-On the peer, before beginning transfer, the peer should ascertain a new local, 
unique filename to use for the remote
-file. When the transfer is complete, the file should be treated like log 
recovery and brought into the appropriate
-Tablet. If the peer is also a master (replicating to other nodes), the 
replicated data should create a new REPL column
-in the peer’s table to repeat the replication process, adding in its cluster 
identifier to the provenance list.
-Otherwise, the file can be a candidate for deletion by the garbage collection.
-
-
-The tserver chosen to replicate the data from the master cluster should 
ideally be the tserver that created that data.
-This helps reduce the complexity of dealing with locality later on. If the 
HDFS blocks written by the tserver are local,
-     then we gain the same locality perks.
-
-
-#### Recurse
-
-
-In our simple master and peer replication scheme, we are done after the new 
updates are made available on peer. As
-aforementioned, it is relatively easy to “schedule” replication of a new 
file on peer because we just repeat the same
-process that master did to replicate to peer in the first place.
-
-
-### Master cluster replication “bookkeeping”
-
-
-This section outlines the steps on the master cluster to manage the lifecycle 
of data: when/what data needs to be
-replicated and when is a file safe to be removed.
-
-
-Two key structures are used to implement this bookkeeping:
-
-
-1. Tablet-level entry: tabletId        repl:fully-qualified-file        []     
   value
-
-
-2. Row-prefix space at end of accumulo.metadata or its own table: 
*~repl*_fully-qualified-file
-clusterName:remoteTableID        []        value
-
-
-These two key structures will be outlined below, with “*repl* column” and 
“*~repl* row” denoting which is being referred to.
-
-
-#### Data Structure in Value
-
-
-To avoid the necessity of using conditional mutations or other 
“transaction-like” operations, we can use a combiner to
-generate an aggregate view of replication information. Protobuf is decent 
choice, however, the description isn’t tied to
-any implementation. I believe a Combiner used in conjunction with the 
following data structure provides all necessary
-functionality:
-
-
-        ``// Tracks general lifecycle of the data: file is open and might have 
new data to replicate, or the file has been``
-        ``// closed and will have no new data to replicate``
-
-
-        ``State:Enum { OPEN, CLOSED }``
-
-
-        ``ReplicationStatus { State state, long replication_needed_offset, 
long replication_finished_offset }``
-
-
-The offsets refer to the contiguous ranges of records (key-values) written to 
the WAL. The replication_finished_offset
-value tracks what data has been replicated to the given cluster and while the 
replication_needed_offset value tracks how
-much data has been written to the WAL that is ready for replication. 
replication_finished_offset should always be less
-than or equal to replication_needed_offset. For RFiles instead of WALs, state 
is always CLOSED and
-replication_needed_offset is initialized to the length of the RFile. In this 
context, one can consider the RFile as a
-read-only file and the WAL as an append-only file.
-
-
-For *~repl* entries, the target clusterName and remote tableId would be stored 
in the key to preserve uniqueness. Using
-this information, we would be able to implement the following methods:
-
-
-    ``bool        isFullyReplicated(ReplicationStatus)``
-    ``Pair<long,long> rangeNeedingReplication(ReplicationStatus)``
-
-
-The isFullyReplicated method is straightforward: given the values for 
start/finish stored for data that needs to be
-replicated, and the values for start/finish stored for data that has been 
replicated and the state is CLOSED, is there
-still more data for this ReplicationStatus that needs to be replicated for the 
given clustername/tableID.
-
-
-rangeNeedingReplication is a bit more complicated. Given the end of a range of 
data that has already been replicated,
-some the end of a range of data that still needs replication, return a range 
of data that has
-not yet been replicated. For example, if keyvalues up to offset 100 in a WAL 
have already been
-replicated and keyvalues up to offset 300 are marked as needing replication, 
this method should
-return [101,300]. Ranges of data replicated, and data needing replication must 
always be
-disjoint and contiguous to ensure that data is replayed in the correct order 
on the peer.
-
-
-The use of a Combiner is used to create a basic notion of “addition” and 
“subtraction”. We cannot use deletes to manage
-this without creating a custom iterator, which would not be desirable since it 
would required to run over the entire
-accumulo.metadata table. Avoiding deletions exception on cleanup is also 
desired to avoid handling “tombstone’ing”
-future version of a Key. The addition operation is when new data is appended 
to the WAL which signifies new data to be
-replicated. This equates to an addition to replication_needed_offset. The 
subtraction operation is when data from the
-WAL has be successfully replicated to the peer for this *~repl* record. This 
is implemented as an addition to the
-replication_finished_offset.
-
-
-When CLOSED is set on a ReplicationStatus, this implies that the WAL has been 
closed and no new offsets will be added is
-would be tracked via the *repl* column. As such, a ReplicationStatus 
“object” is candidate for deletion when the state is
-CLOSED and replication_finished_offset is equal to replication_needed_offset. 
A value of CLOSED for state is always
-propagated over the NEW state. An addition after the state is CLOSED is an 
invalid operation and would be a logic error.
-
-
-Consider the case of a new data being ingested to the cluster: the following 
discrete steps should happen. The
-assumption that replication is enabled is made to not distract from the actual 
steps. As previously mentioned, a
-combiner must be set on the *repl* column to aggregate the values to properly 
maintain replication state. The following is
-what a tserver will do.
-
-
-1) When a new WAL is created by request of a tserver and the log column is 
created for a *repl* column within the tablet’s
-row to track that this WAL will need to be replicated.
-
-
-        INSERT
-        tablet        repl:hdfs://localhost:8020/accumulo/.../wal/...  -> 
ReplicationStatus(state=OPEN)
-
-
-2) As the tserver using this WAL finishes commits to the WAL, it should submit 
a new mutation to track the current
-length of data in the WAL that it just wrote that needs to be read for 
purposes of replication.
-
-
-        INSERT
-        tablet        repl:hdfs://localhost:8020/accumulo/.../wal/...        
-> ReplicationStatus(addition
-        offset)
-
-
-3) Eventually, the tablet server will finish using a WAL, minc contents of 
memory to disk, and mark the WAL as unused.
-This results in updating the state to be CLOSED.
-
-
-        INSERT
-        tablet repl:hdfs://localhost:8020/accumulo/.../wal/…        -> 
ReplicationStatus(state=CLOSED)
-
-
-The master also needs a new thread to process the *repl* columns across all 
tablets in a table and create *~repl* row
-entries for the file and where it should be replicated to. The high-level 
goals for this thread are as follows:
-
-
-1) Create mutations for a WAL that outline where the file must be replicated 
to (cluster and tabletID)
-
-
-        INSERT
-        *~repl*_hdfs://localhost:8020/accumulo/.../wal/… clusterName:tableId 
       -> ReplicationStatus(addition
-        offset)
-
-
-2) Determine when the *repl* column in a tablet is safe for deletion (all data 
for it has been replicated). This is the
-sign that the GC can then remove this file.
-
-
-        DELETE
-        tablet repl:hdfs://localhost:8020/accumulo/.../wal/… 
-
-
-This can be accomplished with a single thread that scans the metadata table:
-
-
-1) Construct “snapshot” of tablet *repl* file entries with aggregated 
offsets, sorted by file, 
-
-
-        [hdfs://localhost:8020/.../file1 => {[tablet1, RS], [tablet2, RS], ... 
},
-         hdfs://localhost:8020/.../file2 => {[tablet3, RS], [tablet4, RS], ... 
},
-         hdfs://localhost:8020/.../file3 => {[tablet5, [RS:CLOSED]], [tablet6, 
[RS:CLOSED]], ...] ]
-
-
-2) Begin scanning *~repl* row-prefix with Scanner, perform merged read to join 
“state” from aggregate *repl* column across
-tablets, and columns in *~repl* row for the file.
-
-
-   for each file in *~repl* rowspace:
-       if all columns in *~repl*_file row isFullyReplicated:
-           issue deletes for file in *repl* column for all tablets with 
references
-           if delete of *repl* is successful:
-               delete *~repl* row
-       else if *~repl* row exists but no *repl* columns:
-           // Catch failure case from first conditional
-           delete *~repl* row
-       else
-           for each file in “snapshot” of *repl* columns:
-           make mutation for *~repl*_file
-           for each peer cluster in configuration:
-               if file should be replicated on peer:
-                   add column for clusterid:remote_tableID -> RS
-
-
-Combiner should be set on all columns in *~repl* prefix rowspace and the 
*repl* colfam to ensure multiple runs of the
-described procedure without actual replication occurring to aggregate data 
that needs replication.  Configuration
-
-
-Replication can be configured on a per-locality-group, replicated that 
locality group to one or more peers. Given that
-we have dynamic column families, trying to track per-column-family replication 
would be unnecessarily difficult.
-Configuration requires new configuration variables that need to be introduced 
to support the necessary information. Each
-peer is defined with a name and the zookeeper quorum of the remote cluster to 
locate the active Accumulo Master. The
-API should ease configuration on replication across all locality groups. 
Replication cannot be configured on the root or
-metadata table.
-
-
-Site-wide:
-// The name and location of other clusters
-instance.cluster.$name.zookeepers=zk1,zk2,zk3[c]
-// The name of this cluster
-instance.replication.name=my_cluster_name[d]
-
-Per-table:
-// Declare the locality group(s) that should be replicated and the clusters 
that they should be replicated to
-table.replication.$locality_group_name=cluster1,cluster2,...
-
-
-Shell commands can also be created to make this configuration easier.
-
-
-definecluster cluster_name zookeeper_quorum
-
-
-e.g.  definecluster peer peerZK1:2181,peerZK2:2181,peerZK3:2181
-
-
-
-
-deletecluster cluster_name zookeeper_quorum
-
-
-e.g.  deletecluster peer peerZK1:2181,peerZK2:2181,peerZK3:2181
-
-
-
-
-enablereplication -t table (-lg loc_group | --all-loc-groups) cluster_name
-
-
-e.g. enablereplication -t foo -lg cf1 peer1 enablereplication -t foo 
-all-loc-groups peer1
-
-
-
-
-
-
-disablereplication -t table (-lg loc_group | --all-loc-groups) cluster_name
-
-
-e.g. disablereplication -t foo -lg cf1 peer1 disablereplication -t foo 
-all-loc-groups peer1
-
-
-For peers, we likely do not want to allow users to perform writes against the 
cluster. Thus, they should be read-only.
-This likely requires custom configuration and some ZK state to not accept 
regular API connections. Should be
-exposed/controllable by the shell, too.  Common Questions
-
-
-*How do conditional mutations work with this approach?*
-
-
-They do not. They will need to throw an Exception.
-
-
-*How does replication work on a table which already contains data?*
-
-
-When replication is enabled on a table, all new data will be replicated. This 
implementation does not attempt to support
-this as the existing importtable and exporttable already provide support to do 
this.
-
-
-*When I update a table property on the master, will it propagate to the peer?*
-
-
-There are both arguments for and against this. We likely want to revisit this 
later as a configuration parameter that
-could allow the user to choose if this should happen. We should avoid 
implementations that would tie us to one or the
-other.
-
-
-As an argument against this, consider a production and a backup cluster where 
the backup cluster is smaller in number of
-nodes, but contains more disks. Despite wanting to replicate the data in a 
table, the configuration of that table may
-not be desired (e.g. split threshold, compression codecs, etc). Another 
argument against could be age-off. If a replica
-cluster is not the same size as the production cluster (which is extremely 
plausible) you would not want the same
-age-off rules for both the production and replica.
-
-
-An argument for this feature is that you would want custom compaction 
iterators (as a combiner, for example) to only be
-configured on a table once. You would want these iterators to appear on all 
replicas. Such an implementation is also
-difficult in master-master situations as we don’t have a shared ZooKeeper 
instance that we can use to reliably commit
-these changes.
-
-
-*What happens in master-master if two Keys are exactly the same with different 
values?*
-
-
-Non-deterministic - mostly because we already have this problem: 
https://issues.apache.org/jira/browse/ACCUMULO-1528
-
-
-*Did you come up with this all on your own?*
-
-
-Ha, no. Big thanks goes out to HBase’s documentation, Enis Söztutar 
(HBase), and other Accumulo devs that I’ve bounced
-these ideas off of (too many to enumerate).
-
-
-
-
-Goals
-1. Master-Peer configuration that doesn’t exclude future master-master work 
Per locality-group replication configuration
-2. Shell administration of replication Accumulo Monitor integration/insight to 
replication status State machines for
-3. lifecycle of chunks Versionable (read-as protobuf) datastructure to track 
chunk metadata Thrift for RPC Replication
-4. does not require “closed” files (can send incremental updates to peers) 
Ability to replicate “live inserts” and “bulk
-5. imports” Provide replication interface with Accumulo->Accumulo 
implementation Do not rely on active Accumulo Master to
-6. perform replication (send or receive) -- delegate to a TabletServer Use 
FATE where applicable Gracefully handle
-7. offline peers Implement read-only variant Master/TabletServer[e]
-
-
-Non-Goals
-1. Replicate on smaller granularity than locality group (not individual 
colfams/colquals or based on visibilities)
-2. Wire security between master and peer
-3. Support replication of encrypted data[f]
-4. Replication of existing data (use importtable & exporttable)
-5. Enforce replication of table configuration
-
-
-References
-
-
-* http://www.cs.mcgill.ca/~kemme/papers/vldb00.html
-[a] While the WAL is a useful file format for shipping updates (an append-only 
file), the actual LogFileKey and
-LogFileValue pairs may not be sufficient? Might need some extra data 
internally? Maybe the DFSLogger header could
-contain that? 
-[b] This approach makes the assumption that we only begin the replication 
process when a WAL is closed.
-This is likely too long of a period of time: an offset and length likely 
needed to be interested to decrease latency.
-[c] This needs to be consistent across clusters. Do we need to control access 
to ensure that it is? Is it excessive to
-force users to configure it correctly? 
-[d] Same as instance.cluster.$name: Do we need to enforce these values? 
-[e] This isn't an immediate necessity, so I'm tempted to consider punting it 
as a non-goal for the first implementation
-[f] While not in the original scope, it is definitely of great concern.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/resources/state/replicationstatus.gv
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/state/replicationstatus.gv 
b/docs/src/main/resources/state/replicationstatus.gv
deleted file mode 100644
index b407172..0000000
--- a/docs/src/main/resources/state/replicationstatus.gv
+++ /dev/null
@@ -1,40 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-digraph Replication {
-    graph [ label="Replication", fontsize=24, fontname=Helvetica];
-    node [fontsize=12, fontname=Helvetica];
-    edge [fontsize=9, fontcolor=blue, fontname=ArialMT];
-    subgraph cluster_ReplicationStatus {
-        label="ReplicationStatus"
-        "ReplicationStatus.OPEN" [ label = "Open for\nreplication" ];
-        "ReplicationStatus.OPEN" -> "ReplicationStatus.DATA_REPLICATED" [ 
label = "Data replicated" ];
-        "ReplicationStatus.OPEN" -> "ReplicationStatus.DATA_INGESTED" [ label 
= "Data ingested locally" ];
-        "ReplicationStatus.OPEN" -> "ReplicationStatus.CLOSED" [ label = 
"Local file closed\nfor addl writes" ];
-
-        "ReplicationStatus.DATA_REPLICATED" [ label = "Data Replicated" ];
-        "ReplicationStatus.DATA_REPLICATED" -> "ReplicationStatus.OPEN" [ 
label = "Increment replication\nfinished offset" ];
-        "ReplicationStatus.DATA_REPLICATED" -> "ReplicationStatus.CLOSED" [ 
label = "Increment replication\nfinished offset" ];
-
-        "ReplicationStatus.DATA_INGESTED" [ label = "Data Ingested" ];
-        "ReplicationStatus.DATA_INGESTED" -> "ReplicationStatus.OPEN" [ label 
= "Increment replication\nneeded offset" ];
-
-        "ReplicationStatus.CLOSED" [ label = Closed];
-        "ReplicationStatus.CLOSED" -> "ReplicationStatus.DATA_REPLICATED" [ 
label = "Data replicated" ];
-        "ReplicationStatus.CLOSED" -> "ReplicationStatus.DELETED" [ label = 
"All data replicated" ];
-
-        "ReplicationStatus.DELETED" [ label = "Local resources ready for 
deletion" ];
-    }
-}

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/resources/state/replicationstatus.png
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/state/replicationstatus.png 
b/docs/src/main/resources/state/replicationstatus.png
deleted file mode 100644
index 85ca0e0..0000000
Binary files a/docs/src/main/resources/state/replicationstatus.png and 
/dev/null differ

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/resources/state/table-lifecycle.gv
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/state/table-lifecycle.gv 
b/docs/src/main/resources/state/table-lifecycle.gv
deleted file mode 100644
index 228b69f..0000000
--- a/docs/src/main/resources/state/table-lifecycle.gv
+++ /dev/null
@@ -1,77 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-digraph Replication {
-    graph [ label="Replication Pipeline", fontsize=24, fontname=Helvetica];
-    node [fontsize=12, fontname=Helvetica];
-    edge [fontsize=9, fontcolor=blue, fontname=ArialMT];
-
-    subgraph cluster_zookeeper {
-        label = "ZooKeeper"
-        "DistributedWorkQueue" [ label = "DistributedWorkQueue" ];
-    }
-
-    subgraph cluster_tables {
-        label = "Tables"
-        "MetadataTable" [ label = "Metadata Table" ];
-        "ReplicationTable" [ label = "Replication Table" ];
-    }
-
-    subgraph cluster_tserver {
-        label = "TabletServer"
-        "WalCreated" [ label = "'New' WAL used" ];
-        "WalCreated" -> "MetadataTable" [ label = "Create record for WAL\nand 
local table id" ];
-
-        "WalMinC" [ label = "Minor Compaction" ];
-        "WalMinC" -> "MetadataTable" [ label = "Update record for data 
available to replicate" ];
-
-        "ReplicaSystem" [ label = "ReplicaSystem" ];
-        "DistributedWorkQueue" -> "ReplicaSystem" [ label = "ReplicaSystem 
accepts Work" ];
-        "ReplicaSystem" -> "ReplicaSystem" [ label = "Replicate data in chunks 
to peer" ];
-        "ReplicaSystem" -> "ReplicationTable" [ label = "Update Work record 
with\ntotal data replicated" ];
-    }
-
-    subgraph cluster_master {
-        label = "Master"
-
-        "StatusMaker" [ label = "StatusMaker" ];
-        "MetadataTable" -> "StatusMaker" [ label = "Reads records" ];
-        "StatusMaker" -> "ReplicationTable" [ label = "Makes Status records" ];
-
-        "WorkMaker" [ label = "WorkMaker" ];
-        "ReplicationTable" -> "WorkMaker" [ label = "Read Status records" ]; 
-        "WorkMaker" -> "ReplicationTable" [ label = "Write Work record for 
each peer\nwhen work is needed" ];
-
-        "FinishedWorkUpdater" [ label = "FinishedWorkUpdater" ];
-        "ReplicationTable" -> "FinishedWorkUpdater" [ label = "Read all Work 
records for file" ];
-        "FinishedWorkUpdater" -> "ReplicationTable" [ label = "Record new 
Status with\nminimum replication progress" ];
-        "FinishedWorkUpdater" -> "ReplicationTable" [ label = "Delete Work 
records when\nall are fully replicated" ];
-
-        "WorkAssigner" [ label = "WorkAssigner" ];
-        "ReplicationTable" -> "WorkAssigner" [ label = "Read Work records" ];
-        "WorkAssigner" -> "DistributedWorkQueue" [ label = "Make Work 
available\nvia ZooKeeper" ];
-
-        "RemoveCompleteRecords" [ label = "RemoveCompleteReplicationRecords" ];
-        "ReplicationTable" -> "RemoveCompleteRecords" [ label = "Read all 
Status and Work\nrecords by file" ];
-        "RemoveCompleteRecords" -> "ReplicationTable" [ label = "Delete 
records for file if\nall are fully replicated" ];
-    }
-
-    subgraph cluster_gc {
-        label = "Garbage Collector";
-        "CloseWALs" [ label = "CloseWriteAheadLogs" ];
-        "MetadataTable" -> "CloseWALs" [ label = "Find all referenced WALs by 
tserver" ];
-        "CloseWALs" -> "MetadataTable" [ label = "Close replication 
records\nfor unreferenced WALs" ];
-    }
-}

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/docs/src/main/resources/state/table-lifecycle.png
----------------------------------------------------------------------
diff --git a/docs/src/main/resources/state/table-lifecycle.png 
b/docs/src/main/resources/state/table-lifecycle.png
deleted file mode 100644
index 43d7d21..0000000
Binary files a/docs/src/main/resources/state/table-lifecycle.png and /dev/null 
differ

http://git-wip-us.apache.org/repos/asf/accumulo/blob/e99ec9f0/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index 78a5712..975f078 100644
--- a/pom.xml
+++ b/pom.xml
@@ -82,7 +82,6 @@
   <modules>
     <module>assemble</module>
     <module>core</module>
-    <module>docs</module>
     <module>fate</module>
     <module>iterator-test-harness</module>
     <module>maven-plugin</module>
@@ -257,13 +256,6 @@
       </dependency>
       <dependency>
         <groupId>org.apache.accumulo</groupId>
-        <artifactId>accumulo-docs</artifactId>
-        <version>${project.version}</version>
-        <classifier>user-manual</classifier>
-        <type>html</type>
-      </dependency>
-      <dependency>
-        <groupId>org.apache.accumulo</groupId>
         <artifactId>accumulo-fate</artifactId>
         <version>${project.version}</version>
       </dependency>
@@ -742,11 +734,6 @@
           </configuration>
         </plugin>
         <plugin>
-          <groupId>org.asciidoctor</groupId>
-          <artifactId>asciidoctor-maven-plugin</artifactId>
-          <version>1.5.3</version>
-        </plugin>
-        <plugin>
           <groupId>org.codehaus.mojo</groupId>
           <artifactId>build-helper-maven-plugin</artifactId>
           <version>1.10</version>

Reply via email to