[10/16] accumulo-website git commit: ACCUMULO-4630 Refactored documentation for website

mwalch Mon, 22 May 2017 10:30:09 -0700

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/administration.md
----------------------------------------------------------------------
diff --git a/docs/master/administration.md b/docs/master/administration.md
deleted file mode 100644
index e780ce4..0000000
--- a/docs/master/administration.md
+++ /dev/null
@@ -1,1145 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Administration
-
-=== Hardware
-
-Because we are running essentially two or three systems simultaneously layered
-across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
-consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process 
can have
-at least one core and 2 - 4 GB each.
-
-One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
-typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB 
disks.
-
-It is possible to do with less than this, such as with 1u servers with 2 cores 
and 4GB
-each, but in this case it is recommended to only run up to two processes per
-machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but
-not all three. The constraint here is having enough available heap space for 
all the
-processes on a machine.
-
-=== Network
-
-Accumulo communicates via remote procedure calls over TCP/IP for both passing
-data and control messages. In addition, Accumulo uses HDFS clients to
-communicate with HDFS. To achieve good ingest and query performance, sufficient
-network bandwidth must be available between any two machines.
-
-In addition to needing access to ports associated with HDFS and ZooKeeper, 
Accumulo will
-use the following default ports. Please make sure that they are open, or change
-their value in accumulo-site.xml.
-
-.Accumulo default ports
-[width="75%",cols=">,^2,^2"]
-[options="header"]
-|====
-|Port | Description | Property Name
-|4445 | Shutdown Port (Accumulo MiniCluster) | n/a
-|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j
-|9995 | Accumulo HTTP monitor | monitor.port.client
-|9997 | Tablet Server | tserver.port.client
-|9998 | Accumulo GC | gc.port.client
-|9999 | Master Server | master.port.client
-|12234 | Accumulo Tracer | trace.port.client
-|42424 | Accumulo Proxy Server | n/a
-|10001 | Master Replication service | master.replication.coordinator.port
-|10002 | TabletServer Replication service | replication.receipt.service.port
-|====
-
-In addition, the user can provide +0+ and an ephemeral port will be chosen 
instead. This
-ephemeral port is likely to be unique and not already bound. Thus, configuring 
ports to
-use +0+ instead of an explicit value, should, in most cases, work around any 
issues of
-running multiple distinct Accumulo instances (or any other process which tries 
to use the
-same default ports) on the same hardware. Finally, the *.port.client 
properties will work
-with the port range syntax (M-N) allowing the user to specify a range of ports 
for the
-service to attempt to bind. The ports in the range will be tried in a 1-up 
manner starting
-at the low end of the range to, and including, the high end of the range.
-
-=== Installation
-
-Download a binary distribution of Accumulo and install it to a directory on a 
disk with
-sufficient space:
-
-  cd <install directory>
-  tar xzf accumulo-X.Y.Z-bin.tar.gz   # Replace 'X.Y.Z' with your Accumulo 
version
-  cd accumulo-X.Y.Z
-
-Repeat this step on each machine in your cluster. Typically, the same 
+<install directory>+
-is chosen for all machines in the cluster.
-
-There are four scripts in the `bin/` directory that are used to manage 
Accumulo:
-
-1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes
-2. `accumulo-service` - Runs Accumulo processes as services
-3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several 
nodes
-4. `accumulo-util` - Accumulo utilities for creating configuration, native 
libraries, etc.
-
-These scripts will be used in the remaining instructions to configure and run 
Accumulo.
-
-=== Dependencies
-
-Accumulo requires HDFS and ZooKeeper to be configured and running
-before starting. Password-less SSH should be configured between at least the
-Accumulo master and TabletServer machines. It is also a good idea to run 
Network
-Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too 
out of
-sync, which can cause problems with automatically timestamped data.
-
-=== Configuration
-
-The Accumulo tarball contains a +conf/+ directory where Accumulo looks for 
configuration. If you
-installed Accumulo using downstream packaging, the +conf/+ could be something 
else like
-+/etc/accumulo/+.
-
-Before starting Accumulo, the configuration files +accumulo-env.sh+ and 
+accumulo-site.xml+ must
-exist in +conf/+ and be properly configured. If you are using 
+accumulo-cluster+ to launch
-a cluster, the `conf/` directory must also contain hosts file for Accumulo 
services (i.e +gc+,
-+masters+, +monitor+, +tservers+, +tracers+). You can either create these 
files manually or run
-+accumulo-cluster create-config+.
-
-Logging is configured in +accumulo-env.sh+ to use three log4j configuration 
files in +conf/+. The
-file used depends on the Accumulo command or service being run. Logging for 
most Accumulo services
-(i.e Master, TabletServer, Garbage Collector) is configured by 
+log4j-service.properties+ except for
-the Monitor which is configured by +log4j-monitor.properties+. All Accumulo 
commands (i.e +init+,
-+shell+, etc) are configured by +log4j.properties+.
-
-==== Configure accumulo-env.sh
-
-Accumulo needs to know where to find the software it depends on. Edit 
accumulo-env.sh
-and specify the following:
-
-. Enter the location of Hadoop for +$HADOOP_PREFIX+
-. Enter the location of ZooKeeper for +$ZOOKEEPER_HOME+
-. Optionally, choose a different location for Accumulo logs using 
+$ACCUMULO_LOG_DIR+
-
-Accumulo uses +HADOOP_PREFIX+ and +ZOOKEEPER_HOME+ to locate Hadoop and 
Zookeeper jars
-and add them the +CLASSPATH+ variable. If you are running a vendor-specific 
release of Hadoop
-or Zookeeper, you may need to change how your +CLASSPATH+ is built in 
+accumulo-env.sh+. If
-Accumulo has problems later on finding jars, run +accumulo classpath -d+ to 
debug and print
-Accumulo's classpath.
-
-You may want to change the default memory settings for Accumulo's TabletServer 
which are
-by set in the +JAVA_OPTS+ settings for 'tservers' in +accumulo-env.sh+. Note 
the
-syntax is that of the Java JVM command line options. This value should be less 
than the
-physical memory of the machines running TabletServers.
-
-There are similar options for the master's memory usage and the garbage 
collector
-process. Reduce these if they exceed the physical RAM of your hardware and
-increase them, within the bounds of the physical RAM, if a process fails 
because of
-insufficient memory.
-
-Note that you will be specifying the Java heap space in accumulo-env.sh. You 
should
-make sure that the total heap space used for the Accumulo tserver and the 
Hadoop
-DataNode and TaskTracker is less than the available memory on each worker node 
in
-the cluster. On large clusters, it is recommended that the Accumulo master, 
Hadoop
-NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
-machines to allow them to use more heap space. If you are running these on the
-same machine on a small cluster, likewise make sure their heap space settings 
fit
-within the available memory.
-
-==== Native Map
-
-The tablet server uses a data structure called a MemTable to store sorted 
key/value
-pairs in memory when they are first received from the client. When a minor 
compaction
-occurs, this data structure is written to HDFS. The MemTable will default to 
using
-memory in the JVM but a JNI version, called the native map, can be used to 
significantly
-speed up performance by utilizing the memory space of the native operating 
system. The
-native map also avoids the performance implications brought on by garbage 
collection
-in the JVM by causing it to pause much less frequently.
-
-===== Building
-
-32-bit and 64-bit Linux and Mac OS X versions of the native map can be built 
by executing
-+accumulo-util build-native+. If your system's default compiler options are 
insufficient,
-you can add additional compiler options to the command line, such as options 
for the
-architecture. These will be passed to the Makefile in the environment variable 
+USERFLAGS+.
-
-Examples:
-
-  accumulo-util build-native
-  accumulo-util build-native -m32
-
-After building the native map from the source, you will find the artifact in
-+lib/native+. Upon starting up, the tablet server will look
-in this directory for the map library. If the file is renamed or moved from its
-target directory, the tablet server may not be able to find it. The system can
-also locate the native maps shared library by setting +LD_LIBRARY_PATH+
-(or +DYLD_LIBRARY_PATH+ on Mac OS X) in +accumulo-env.sh+.
-
-===== Native Maps Configuration
-
-As mentioned, Accumulo will use the native libraries if they are found in the 
expected
-location and +tserver.memory.maps.native.enabled+ is set to +true+ (which is 
the default).
-Using the native maps over JVM Maps nets a noticeable improvement in ingest 
rates; however,
-certain configuration variables are important to modify when increasing the 
size of the
-native map.
-
-To adjust the size of the native map, increase the value of 
+tserver.memory.maps.max+.
-By default, the maximum size of the native map is 1GB. When increasing this 
value, it is
-also important to adjust the values of +table.compaction.minor.logs.threshold+ 
and
-+tserver.walog.max.size+. +table.compaction.minor.logs.threshold+ is the 
maximum
-number of write-ahead log files that a tablet can reference before they will 
be automatically
-minor compacted. +tserver.walog.max.size+ is the maximum size of a write-ahead 
log.
-
-The maximum size of the native maps for a server should be less than the 
product
-of the write-ahead log maximum size and minor compaction threshold for log 
files:
-
-+$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= 
$tserver.memory.maps.max+
-
-This formula ensures that minor compactions won't be automatically triggered 
before the native
-maps can be completely saturated.
-
-Subsequently, when increasing the size of the write-ahead logs, it can also be 
important
-to increase the HDFS block size that Accumulo uses when creating the files for 
the write-ahead log.
-This is controlled via +tserver.wal.blocksize+. A basic recommendation is that 
when
-+tserver.walog.max.size+ is larger than 2GB in size, set 
+tserver.wal.blocksize+ to 2GB.
-Increasing the block size to a value larger than 2GB can result in decreased 
write
-performance to the write-ahead log file which will slow ingest.
-
-==== Cluster Specification
-
-If you are using +accumulo-cluster+ to start a cluster, configure the 
following on the
-machine that will serve as the Accumulo master:
-
-. Write the IP address or domain name of the Accumulo Master to the 
+conf/masters+ file.
-. Write the IP addresses or domain name of the machines that will be 
TabletServers in +conf/tservers+, one per line.
-
-Note that if using domain names rather than IP addresses, DNS must be 
configured
-properly for all machines participating in the cluster. DNS can be a confusing 
source
-of errors.
-
-==== Configure accumulo-site.xml
-
-Specify appropriate values for the following settings in +accumulo-site.xml+:
-
-[source,xml]
-<property>
-    <name>instance.zookeeper.host</name>
-    <value>zooserver-one:2181,zooserver-two:2181</value>
-    <description>list of zookeeper servers</description>
-</property>
-
-This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
-settings between processes and helps finalize TabletServer failure.
-
-[source,xml]
-<property>
-    <name>instance.secret</name>
-    <value>DEFAULT</value>
-</property>
-
-The instance needs a secret to enable secure communication between servers. 
Configure your
-secret and make sure that the +accumulo-site.xml+ file is not readable to 
other users.
-For alternatives to storing the +instance.secret+ in plaintext, please read the
-+Sensitive Configuration Values+ section.
-
-Some settings can be modified via the Accumulo shell and take effect 
immediately, but
-some settings require a process restart to take effect. See the configuration 
documentation
-(available in the docs directory of the tarball and in <<configuration>>) for 
details.
-
-==== Hostnames in configuration files
-
-Accumulo has a number of configuration files which can contain references to 
other hosts in your
-network. All of the "host" configuration files for Accumulo (+gc+, +masters+, 
+tservers+, +monitor+,
-+tracers+) as well as +instance.volumes+ in accumulo-site.xml must contain 
some host reference.
-
-While IP address, short hostnames, or fully qualified domain names (FQDN) are 
all technically valid, it
-is good practice to always use FQDNs for both Accumulo and other processes in 
your Hadoop cluster.
-Failing to consistently use FQDNs can have unexpected consequences in how 
Accumulo uses the FileSystem.
-
-A common way for this problem can be observed is via applications that use 
Bulk Ingest. The Accumulo
-Master coordinates moving the input files to Bulk Ingest to an 
Accumulo-managed directory. However,
-Accumulo cannot safely move files across different Hadoop FileSystems. This is 
problematic because
-Accumulo also cannot make reliable assertions across what is the same 
FileSystem which is specified
-with different names. Naively, while 127.0.0.1:8020 might be a valid 
identifier for an HDFS instance,
-Accumulo identifies +localhost:8020+ as a different HDFS instance than 
+127.0.0.1:8020+.
-
-==== Deploy Configuration
-
-Copy accumulo-env.sh and accumulo-site.xml from the +conf/+ directory on the 
master to all Accumulo
-tablet servers.  The "host" configuration files files +accumulo-cluster+ only 
need to be on servers
-where that command is run.
-
-==== Sensitive Configuration Values
-
-Accumulo has a number of properties that can be specified via the 
accumulo-site.xml
-file which are sensitive in nature, instance.secret and 
trace.token.property.password
-are two common examples. Both of these properties, if compromised, have the 
ability
-to result in data being leaked to users who should not have access to that 
data.
-
-In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as 
a common
-implementation to abstract away the storage and retrieval of passwords from 
plaintext
-storage in configuration files. Any Property marked with the +Sensitive+ 
annotation
-is a candidate for use with these CredentialProviders. For version of Hadoop 
which lack
-these classes, the feature will just be unavailable for use.
-
-A comma separated list of CredentialProviders can be configured using the 
Accumulo Property
-+general.security.credential.provider.paths+. Each configured URL will be 
consulted
-when the Configuration object for accumulo-site.xml is accessed.
-
-==== Using a JavaKeyStoreCredentialProvider for storage
-
-One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore 
CredentialProvider.
-Each entry in the KeyStore is the Accumulo Property key name. For example, to 
store the
-`instance.secret`, the following command can be used:
-
-  hadoop credential create instance.secret --provider 
jceks://file/etc/accumulo/conf/accumulo.jceks
-
-The command will then prompt you to enter the secret to use and create a 
keystore in: 
-
-  /path/to/accumulo/conf/accumulo.jceks
-
-Then, accumulo-site.xml must be configured to use this KeyStore as a 
CredentialProvider:
-
-[source,xml]
-<property>
-    <name>general.security.credential.provider.paths</name>
-    <value>jceks://file/path/to/accumulo/conf/accumulo.jceks</value>
-</property>
-
-This configuration will then transparently extract the +instance.secret+ from
-the configured KeyStore and alleviates a human readable storage of the 
sensitive
-property.
-
-A KeyStore can also be stored in HDFS, which will make the KeyStore readily 
available to
-all Accumulo servers. If the local filesystem is used, be aware that each 
Accumulo server
-will expect the KeyStore in the same location.
-
-[[ClientConfiguration]]
-==== Client Configuration
-
-In version 1.6.0, Accumulo included a new type of configuration file known as 
a client
-configuration file. One problem with the traditional "site.xml" file that is 
prevalent
-through Hadoop is that it is a single file used by both clients and servers. 
This makes
-it very difficult to protect secrets that are only meant for the server 
processes while
-allowing the clients to connect to the servers.
-
-The client configuration file is a subset of the information stored in 
accumulo-site.xml
-meant only for consumption by clients of Accumulo. By default, Accumulo checks 
a number
-of locations for a client configuration by default:
-
-* +/path/to/accumulo/conf/client.conf+
-* +/etc/accumulo/client.conf+
-* +/etc/accumulo/conf/client.conf+
-* +~/.accumulo/config+
-
-These files are https://en.wikipedia.org/wiki/.properties[Java Properties 
files]. These files
-can currently contain information about ZooKeeper servers, RPC properties 
(such as SSL or SASL
-connectors), distributed tracing properties. Valid properties are defined by 
the
-https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54[ClientProperty]
-enum contained in the client API.
-
-==== Custom Table Tags
-
-Accumulo has the ability for users to add custom tags to tables.  This allows
-applications to set application-level metadata about a table.  These tags can 
be
-anything from a table description, administrator notes, date created, etc.
-This is done by naming and setting a property with a prefix +table.custom.*+.
-
-Currently, table properties are stored in ZooKeeper. This means that the number
-and size of custom properties should be restricted on the order of 10's of 
properties
-at most without any properties exceeding 1MB in size. ZooKeeper's performance 
can be
-very sensitive to an excessive number of nodes and the sizes of the nodes. 
Applications
-which leverage the user of custom properties should take these warnings into
-consideration. There is no enforcement of these warnings via the API.
-
-==== Configuring the ClassLoader
-
-Accumulo builds its Java classpath in +accumulo-env.sh+.  After an Accumulo 
application has started, it will load classes from the locations
-specified in the deprecated +general.classpaths+ property. Additionally, 
Accumulo will load classes from the locations specified in the
-+general.dynamic.classpaths+ property and will monitor and reload them if they 
change. The reloading  feature is useful during the development
-and testing of iterators as new or modified iterator classes can be deployed 
to Accumulo without having to restart the database.
-/
-Accumulo also has an alternate configuration for the classloader which will 
allow it to load classes from remote locations. This mechanism
-uses Apache Commons VFS which enables locations such as http and hdfs to be 
used. This alternate configuration also uses the
-+general.classpaths+ property in the same manner described above. It differs 
in that you need to configure the
-+general.vfs.classpaths+ property instead of the +general.dynamic.classpath+ 
property. As in the default configuration, this alternate
-configuration will also monitor the vfs locations for changes and reload if 
necessary.
-
-The Accumulo classpath can be viewed in human readable format by running 
+accumulo classpath -d+.
-
-===== ClassLoader Contexts
-
-With the addition of the VFS based classloader, we introduced the notion of 
classloader contexts. A context is identified
-by a name and references a set of locations from which to load classes and can 
be specified in the accumulo-site.xml file or added
-using the +config+ command in the shell. Below is an example for specify the 
app1 context in the accumulo-site.xml file:
-
-[source,xml]
-<property>
-  <name>general.vfs.context.classpath.app1</name>
-  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
-  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
-</property>
-
-The default behavior follows the Java ClassLoader contract in that classes, if 
they exists, are loaded from the parent classloader first.
-You can override this behavior by delegating to the parent classloader after 
looking in this classloader first. An example of this
-configuration is:
-
-[source,xml]
-<property>
-  <name>general.vfs.context.classpath.app1.delegation=post</name>
-  
<value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value>
-  <description>Application A classpath, loads jars from HDFS and local file 
system</description>
-</property>
-
-To use contexts in your application you can set the +table.classpath.context+ 
on your tables or use the +setClassLoaderContext()+ method on Scanner
-and BatchScanner passing in the name of the context, app1 in the example 
above. Setting the property on the table allows your minc, majc, and scan 
-iterators to load classes from the locations defined by the context. Passing 
the context name to the scanners allows you to override the table setting
-to load only scan time iterators from a different location. 
-
-=== Initialization
-
-Accumulo must be initialized to create the structures it uses internally to 
locate
-data across the cluster. HDFS is required to be configured and running before
-Accumulo can be initialized.
-
-Once HDFS is started, initialization can be performed by executing
-+accumulo init+ . This script will prompt for a name
-for this instance of Accumulo. The instance name is used to identify a set of 
tables
-and instance-specific settings. The script will then write some information 
into
-HDFS so Accumulo can start properly.
-
-The initialization script will prompt you to set a root password. Once 
Accumulo is
-initialized it can be started.
-
-=== Running
-
-==== Starting Accumulo
-
-Make sure Hadoop is configured on all of the machines in the cluster, including
-access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
-Make sure ZooKeeper is configured and running on at least one machine in the
-cluster.
-Start Accumulo using +accumulo-cluster start+.
-
-To verify that Accumulo is running, check the Status page as described in
-<<monitoring>>. In addition, the Shell can provide some information about the 
status of
-tables via reading the metadata tables.
-
-==== Stopping Accumulo
-
-To shutdown cleanly, run +accumulo-cluster stop+ and the master will 
orchestrate the
-shutdown of all the tablet servers. Shutdown waits for all minor compactions 
to finish, so it may
-take some time for particular configurations.
-
-==== Adding a Tablet Server
-
-Update your +conf/tservers+ file to account for the addition.
-
-Next, ssh to each of the hosts you want to add and run:
-
-  accumulo-service tserver start
-
-Make sure the host in question has the new configuration, or else the tablet
-server won't start; at a minimum this needs to be on the host(s) being added,
-but in practice it's good to ensure consistent configuration across all nodes.
-
-==== Decomissioning a Tablet Server
-
-If you need to take a node out of operation, you can trigger a graceful 
shutdown of a tablet
-server. Accumulo will automatically rebalance the tablets across the available 
tablet servers.
-
-  accumulo admin stop <host(s)> {<host> ...}
-
-Alternatively, you can ssh to each of the hosts you want to remove and run:
-
-  accumulo-service tserver stop
-
-Be sure to update your +conf/tservers+ file to
-account for the removal of these hosts. Bear in mind that the monitor will not 
re-read the
-tservers file automatically, so it will report the decommissioned servers as 
down; it's
-recommended that you restart the monitor so that the node list is up to date.
-
-The steps described to decommission a node can also be used (without removal 
of the host
-from the +conf/tservers+ file) to gracefully stop a node. This will
-ensure that the tabletserver is cleanly stopped and recovery will not need to 
be performed
-when the tablets are re-hosted.
-
-==== Restarting process on a node
-
-Occasionally, it might be necessary to restart the processes on a specific 
node. In addition
-to the +accumulo-cluster+ script, Accumulo has a +accumulo-service+ script that
-can be use to start/stop processes on a node.
-
-===== A note on rolling restarts
-
-For sufficiently large Accumulo clusters, restarting multiple TabletServers 
within a short window can place significant 
-load on the Master server.  If slightly lower availability is acceptable, this 
load can be reduced by globally setting 
-+table.suspend.duration+ to a positive value.  
-
-With +table.suspend.duration+ set to, say, +5m+, Accumulo will wait 
-for 5 minutes for any dead TabletServer to return before reassigning that 
TabletServer's responsibilities to other TabletServers.
-If the TabletServer returns to the cluster before the specified timeout has 
elapsed, Accumulo will assign the TabletServer 
-its original responsibilities.
-
-It is important not to choose too large a value for +table.suspend.duration+, 
as during this time, all scans against the 
-data that TabletServer had hosted will block (or time out).
-
-==== Running multiple TabletServers on a single node
-
-With very powerful nodes, it may be beneficial to run more than one 
TabletServer on a given
-node. This decision should be made carefully and with much deliberation as 
Accumulo is designed
-to be able to scale to using 10's of GB of RAM and 10's of CPU cores.
-
-Accumulo TabletServers bind certain ports on the host to accommodate remote 
procedure calls to/from
-other nodes. Running more than one TabletServer on a host requires that you 
set the environment variable
-+ACCUMULO_SERVICE_INSTANCE+ to an instance number (i.e 1, 2) for each instance 
that is started. Also, set
-these properties in +accumulo-site.xml+:
-
-  <property>
-    <name>tserver.port.search</name>
-    <value>true</value>
-  </property>
-  <property>
-    <name>replication.receipt.service.port</name>
-    <value>0</value>
-  </property>
-
-[[monitoring]]
-=== Monitoring
-
-==== Accumulo Monitor
-The Accumulo Monitor provides an interface for monitoring the status and 
health of
-Accumulo components. The Accumulo Monitor provides a web UI for accessing this 
information at
-+http://_monitorhost_:9995/+.
-
-Things highlighted in yellow may be in need of attention.
-If anything is highlighted in red on the monitor page, it is something that 
definitely needs attention.
-
-The Overview page contains some summary information about the Accumulo 
instance, including the version, instance name, and instance ID.
-There is a table labeled Accumulo Master with current status, a table listing 
the active Zookeeper servers, and graphs displaying various metrics over time.
-These include ingest and scan performance and other useful measurements.
-
-The Master Server, Tablet Servers, and Tables pages display metrics grouped in 
different ways (e.g. by tablet server or by table).
-Metrics typically include number of entries (key/value pairs), ingest and 
query rates.
-The number of running scans, major and minor compactions are in the form 
_number_running_ (_number_queued_).
-Another important metric is hold time, which is the amount of time a tablet 
has been waiting but unable to flush its memory in a minor compaction.
-
-The Server Activity page graphically displays tablet server status, with each 
server represented as a circle or square.
-Different metrics may be assigned to the nodes' color and speed of oscillation.
-The Overall Avg metric is only used on the Server Activity page, and 
represents the average of all the other metrics (after normalization).
-Similarly, the Overall Max metric picks the metric with the maximum normalized 
value.
-
-The Garbage Collector page displays a list of garbage collection cycles, the 
number of files found of each type (including deletion candidates in use and 
files actually deleted), and the length of the deletion cycle.
-The Traces page displays data for recent traces performed (see the following 
section for information on <<tracing>>).
-The Recent Logs page displays warning and error logs forwarded to the monitor 
from all Accumulo processes.
-Also, the XML and JSON links provide metrics in XML and JSON formats, 
respectively.
-
-==== SSL
-SSL may be enabled for the monitor page by setting the following properties in 
the +accumulo-site.xml+ file:
-
-  monitor.ssl.keyStore
-  monitor.ssl.keyStorePassword
-  monitor.ssl.trustStore
-  monitor.ssl.trustStorePassword
-
-If the Accumulo conf directory has been configured (in particular the 
+accumulo-env.sh+ file must be set up), the 
-+accumulo-util gen-monitor-cert+ command can be used to create the keystore 
and truststore files with random passwords. The command
-will print out the properties that need to be added to the +accumulo-site.xml+ 
file. The stores can also be generated manually with the
-Java +keytool+ command, whose usage can be seen in the +accumulo-util+ script.
-
-If desired, the SSL ciphers allowed for connections can be controlled via the 
following properties in +accumulo-site.xml+:
-
-  monitor.ssl.include.ciphers
-  monitor.ssl.exclude.ciphers
-
-If SSL is enabled, the monitor URL can only be accessed via https.
-This also allows you to access the Accumulo shell through the monitor page.
-The left navigation bar will have a new link to Shell.
-An Accumulo user name and password must be entered for access to the shell.
-
-=== Metrics
-
-Accumulo can expose metrics through a legacy metrics library and using the 
Hadoop Metrics2 library.
-
-==== Legacy Metrics
-
-Accumulo has a legacy metrics library that can be exposes metrics using JMX 
endpoints or file-based logging. These metrics can
-be enabled by setting +general.legacy.metrics+ to +true+ in 
+accumulo-site.xml+ and placing the +accumulo-metrics.xml+
-configuration file on the classpath (which is typically done by placing the 
file in the +conf/+ directory). A template for
-+accumulo-metrics.xml+ can be found in +conf/templates+ of the Accumulo 
tarball.
-
-==== Hadoop Metrics2
-
-Hadoop Metrics2 is a library which allows for routing of metrics generated by 
registered MetricsSources to
-configured MetricsSinks. Examples of sinks that are implemented by Hadoop 
include file-based logging, Graphite and Ganglia.
-All metric sources are exposed via JMX when using Metrics2.
-
-Metrics2 is configured by examining the classpath for a file that matches 
+hadoop-metrics2*.properties+. The Accumulo tarball 
-contains an example +hadoop-metrics2-accumulo.properties+ file in 
+conf/templates+ which can be copied to +conf/+ to place
-on classpath. This file is used to enable file, Graphite or Ganglia sinks 
(some minimal configuration required for Graphite
-and Ganglia). Because the Hadoop configuration is also on the Accumulo 
classpath, be sure that you do not have multiple
-Metrics2 configuration files. It is recommended to consolidate metrics in a 
single properties file in a central location to
-remove ambiguity. The contents of +hadoop-metrics2-accumulo.properties+ can be 
added to a central +hadoop-metrics2.properties+
-in +$HADOOP_CONF_DIR+.
-
-As a note for configuring the file sink, the provided path should be absolute. 
A relative path or file name will be created relative
-to the directory in which the Accumulo process was started. External tools, 
such as logrotate, can be used to prevent these files
-from growing without bound.
-
-Each server process should have log messages from the Metrics2 library about 
the sinks that were created. Be sure to check
-the Accumulo processes log files when debugging missing metrics output.
-
-For additional information on configuring Metrics2, visit the
-https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html[Javadoc
 page for Metrics2].
-
-[[tracing]]
-=== Tracing
-
-It can be difficult to determine why some operations are taking longer
-than expected. For example, you may be looking up items with very low
-latency, but sometimes the lookups take much longer. Determining the
-cause of the delay is difficult because the system is distributed, and
-the typical lookup is fast.
-
-Accumulo has been instrumented to record the time that various
-operations take when tracing is turned on. The fact that tracing is
-enabled follows all the requests made on behalf of the user throughout
-the distributed infrastructure of accumulo, and across all threads of
-execution.
-
-These time spans will be inserted into the +trace+ table in
-Accumulo. You can browse recent traces from the Accumulo monitor
-page. You can also read the +trace+ table directly like any
-other table.
-
-The design of Accumulo's distributed tracing follows that of
-http://research.google.com/pubs/pub36356.html[Google's Dapper].
-
-==== Tracers
-
-To collect traces, Accumulo needs at least one tracer server running. If you 
are using +accumulo-cluster+ to start your cluster,
-configure your server in +conf/tracers+. The server collects traces from 
clients and writes them to the +trace+ table. The Accumulo
-user that the tracer connects to Accumulo with can be configured with the 
following properties (see the <<configuration,Configuration>> 
-section for setting Accumulo server properties)
-
-  trace.user
-  trace.token.property.password
-
-Other tracer configuration properties include
-
-  trace.port.client - port tracer listens on
-  trace.table - table tracer writes to
-  trace.zookeeper.path - zookeeper path where tracers register
-
-The zookeeper path is configured to /tracers by default.  If
-multiple Accumulo instances are sharing the same ZooKeeper
-quorum, take care to configure Accumulo with unique values for
-this property.
-
-==== Configuring Tracing
-
-Traces are collected via SpanReceivers. The default SpanReceiver
-configured is org.apache.accumulo.core.trace.ZooTraceClient, which
-sends spans to an Accumulo Tracer process, as discussed in the
-previous section. This default can be changed to a different span
-receiver, or additional span receivers can be added in a
-comma-separated list, by modifying the property
-
-  trace.span.receivers
-
-Individual span receivers may require their own configuration
-parameters, which are grouped under the trace.span.receiver.*
-prefix.  ZooTraceClient uses the following properties.  The first
-three properties are populated from other Accumulo properties,
-while the remaining ones should be prefixed with
-trace.span.receiver. when set in the Accumulo configuration.
-
-  tracer.zookeeper.host - populated from instance.zookeepers
-  tracer.zookeeper.timeout - populated from instance.zookeeper.timeout
-  tracer.zookeeper.path - populated from trace.zookeeper.path
-  tracer.send.timer.millis - timer for flushing send queue (in ms, default 
1000)
-  tracer.queue.size - max queue size (default 5000)
-  tracer.span.min.ms - minimum span length to store (in ms, default 1)
-
-Note that to configure an Accumulo client for tracing, including
-the Accumulo shell, the client configuration must be given the same
-trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path
-properties as the servers have.
-
-Hadoop can also be configured to send traces to Accumulo, as of
-Hadoop 2.6.0, by setting properties in Hadoop's core-site.xml
-file.  Instead of using the trace.span.receiver.* prefix, Hadoop
-uses hadoop.htrace.*.  The Hadoop configuration does not have
-access to Accumulo's properties, so the
-hadoop.htrace.tracer.zookeeper.host property must be specified.
-The zookeeper timeout defaults to 30000 (30 seconds), and the
-zookeeper path defaults to /tracers.  An example of configuring
-Hadoop to send traces to ZooTraceClient is
-
-  <property>
-    <name>hadoop.htrace.spanreceiver.classes</name>
-    <value>org.apache.accumulo.core.trace.ZooTraceClient</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.tracer.zookeeper.host</name>
-    <value>zookeeperHost:2181</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.tracer.zookeeper.path</name>
-    <value>/tracers</value>
-  </property>
-  <property>
-    <name>hadoop.htrace.tracer.span.min.ms</name>
-    <value>1</value>
-  </property>
-
-The accumulo-core, accumulo-tracer, accumulo-fate and libthrift
-jars must also be placed on Hadoop's classpath.
-
-===== Adding additional SpanReceivers
-https://github.com/openzipkin/zipkin[Zipkin]
-has a SpanReceiver supported by HTrace and popularized by Twitter
-that users looking for a more graphical trace display may opt to use.
-The following steps configure Accumulo to use 
+org.apache.htrace.impl.ZipkinSpanReceiver+
-in addition to the Accumulo's default ZooTraceClient, and they serve as a 
template
-for adding any SpanReceiver to Accumulo:
-
-1. Add the Jar containing the ZipkinSpanReceiver class file to the
-+lib/+ directory.  It is critical that the Jar is placed in
-+lib/+ and NOT in +lib/ext/+ so that the new SpanReceiver class
-is visible to the same class loader of htrace-core.
-
-2. Add the following to +accumulo-site.xml+:
-
-  <property>
-    <name>trace.span.receivers</name>
-    
<value>org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver</value>
-  </property>
-
-3. Restart your Accumulo tablet servers.
-
-In order to use ZipkinSpanReceiver from a client as well as the Accumulo 
server,
-
-1. Ensure your client can see the ZipkinSpanReceiver class at runtime. For 
Maven projects,
-this is easily done by adding to your client's pom.xml (taking care to specify 
a good version)
-
-  <dependency>
-    <groupId>org.apache.htrace</groupId>
-    <artifactId>htrace-zipkin</artifactId>
-    <version>3.1.0-incubating</version>
-    <scope>runtime</scope>
-  </dependency>
-
-2. Add the following to your ClientConfiguration
-(see the <<ClientConfiguration>> section)
-
-  
trace.span.receivers=org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver
-
-3. Instrument your client as in the next section.
-
-Your SpanReceiver may require additional properties, and if so these should 
likewise
-be placed in the ClientConfiguration (if applicable) and Accumulo's 
+accumulo-site.xml+.
-Two such properties for ZipkinSpanReceiver, listed with their default values, 
are
-
-  <property>
-    <name>trace.span.receiver.zipkin.collector-hostname</name>
-    <value>localhost</value>
-  </property>
-  <property>
-    <name>trace.span.receiver.zipkin.collector-port</name>
-    <value>9410</value>
-  </property>
-
-==== Instrumenting a Client
-Tracing can be used to measure a client operation, such as a scan, as
-the operation traverses the distributed system. To enable tracing for
-your application call
-
-[source,java]
-import org.apache.accumulo.core.trace.DistributedTrace;
-...
-DistributedTrace.enable(hostname, "myApplication");
-// do some tracing
-...
-DistributedTrace.disable();
-
-Once tracing has been enabled, a client can wrap an operation in a trace.
-
-[source,java]
-import org.apache.htrace.Sampler;
-import org.apache.htrace.Trace;
-import org.apache.htrace.TraceScope;
-...
-TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS);
-BatchScanner scanner = conn.createBatchScanner(...);
-// Configure your scanner
-for (Entry entry : scanner) {
-}
-scope.close();
-
-The user can create additional Spans within a Trace.
-
-The sampler (such as +Sampler.ALWAYS+) for the trace should only be specified 
with a top-level span,
-and subsequent spans will be collected depending on whether that first span 
was sampled.
-Don't forget to specify a Sampler at the top-level span
-because the default Sampler only samples when part of a pre-existing trace,
-which will never occur in a client that never specifies a Sampler.
-
-[source,java]
-TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS);
-...
-TraceScope readScope = Trace.startSpan("Read");
-...
-readScope.close();
-...
-TraceScope writeScope = Trace.startSpan("Write");
-...
-writeScope.close();
-scope.close();
-
-Like Dapper, Accumulo tracing supports user defined annotations to associate 
additional data with a Trace.
-Checking whether currently tracing is necessary when using a sampler other 
than Sampler.ALWAYS.
-
-[source,java]
-...
-int numberOfEntriesRead = 0;
-TraceScope readScope = Trace.startSpan("Read");
-// Do the read, update the counter
-...
-if (Trace.isTracing)
-  readScope.getSpan().addKVAnnotation("Number of Entries 
Read".getBytes(StandardCharsets.UTF_8),
-      String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8));
-
-It is also possible to add timeline annotations to your spans.
-This associates a string with a given timestamp between the start and stop 
times for a span.
-
-[source,java]
-...
-writeScope.getSpan().addTimelineAnnotation("Initiating Flush");
-
-Some client operations may have a high volume within your
-application. As such, you may wish to only sample a percentage of
-operations for tracing. As seen below, the CountSampler can be used to
-help enable tracing for 1-in-1000 operations
-
-[source,java]
-import org.apache.htrace.impl.CountSampler;
-...
-Sampler sampler = new CountSampler(HTraceConfiguration.fromMap(
-    Collections.singletonMap(CountSampler.SAMPLER_FREQUENCY_CONF_KEY, 
"1000")));
-...
-TraceScope readScope = Trace.startSpan("Read", sampler);
-...
-readScope.close();
-
-Remember to close all spans and disable tracing when finished.
-
-[source,java]
-DistributedTrace.disable();
-
-==== Viewing Collected Traces
-
-To view collected traces, use the "Recent Traces" link on the Monitor
-UI. You can also programmatically access and print traces using the
-+TraceDump+ class.
-
-===== Trace Table Format
-
-This section is for developers looking to use data recorded in the trace table
-directly, above and beyond the default services of the Accumulo monitor.
-Please note the trace table format and its supporting classes
-are not in the public API and may be subject to change in future versions.
-
-Each span received by a tracer's ZooTraceClient is recorded in the trace table
-in the form of three entries: span entries, index entries, and start time 
entries.
-Span and start time entries record full span information,
-whereas index entries provide indexing into span information
-useful for quickly finding spans by type or start time.
-
-Each entry is illustrated by a description and sample of data.
-In the description, a token in quotes is a String literal,
-whereas other other tokens are span variables.
-Parentheses group parts together, to distinguish colon characters inside the
-column family or qualifier from the colon that separates column family and 
qualifier.
-We use the format +row columnFamily:columnQualifier columnVisibility    value+
-(omitting timestamp which records the time an entry is written to the trace 
table).
-
-Span entries take the following form:
-
-  traceId        "span":(parentSpanId:spanId)            []    
spanBinaryEncoding
-  63b318de80de96d1 span:4b8f66077df89de1:3778c6739afe4e1 []    %18;%09;...
-
-The parentSpanId is "" for the root span of a trace.
-The spanBinaryEncoding is a compact Apache Thrift encoding of the original 
Span object.
-This allows clients (and the Accumulo monitor) to recover all the details of 
the original Span
-at a later time, by scanning the trace table and decoding the value of span 
entries
-via +TraceFormatter.getRemoteSpan(entry)+.
-
-The trace table has a formatter class by default 
(org.apache.accumulo.tracer.TraceFormatter)
-that changes how span entries appear from the Accumulo shell.
-Normal scans to the trace table do not use this formatter representation;
-it exists only to make span entries easier to view inside the Accumulo shell.
-
-Index entries take the following form:
-
-  "idx":service:startTime description:sender  []    traceId:elapsedTime
-  idx:tserver:14f3828f58b startScan:localhost []    63b318de80de96d1:1
-
-The service and sender are set by the first call of each Accumulo process
-(and instrumented client processes) to +DistributedTrace.enable(...)+
-(the sender is autodetected if not specified).
-The description is specified in each span.
-Start time and the elapsed time (start - stop, 1 millisecond in the example 
above)
-are recorded in milliseconds as long values serialized to a string in hex.
-
-Start time entries take the following form:
-
-  "start":startTime "id":traceId        []    spanBinaryEncoding
-  start:14f3828a351 id:63b318de80de96d1 []    %18;%09;...
-
-The following classes may be run while Accumulo is running to provide insight 
into trace statistics. These require
-accumulo-trace-VERSION.jar to be provided on the Accumulo classpath (+lib/ext+ 
is fine).
-
-  $ accumulo org.apache.accumulo.tracer.TraceTableStats -u username -p 
password -i instancename
-  $ accumulo org.apache.accumulo.tracer.TraceDump -u username -p password -i 
instancename -r
-
-==== Tracing from the Shell
-You can enable tracing for operations run from the shell by using the
-+trace on+ and +trace off+ commands.
-
-----
-root@test test> trace on
-
-root@test test> scan
-a b:c []    d
-
-root@test test> trace off
-Waiting for trace information
-Waiting for trace information
-Trace started at 2013/08/26 13:24:08.332
-Time  Start  Service@Location       Name
- 3628+0      shell@localhost shell:root
-    8+1690     shell@localhost scan
-    7+1691       shell@localhost scan:location
-    6+1692         tserver@localhost startScan
-    5+1692           tserver@localhost tablet read ahead 6
-----
-
-=== Logging
-
-Accumulo processes each write to a set of log files. By default, these logs 
are found at directory
-set by +ACCUMULO_LOG_DIR+ in +accumulo-env.sh+.
-
-=== Recovery
-
-In the event of TabletServer failure or error on shutting Accumulo down, some
-mutations may not have been minor compacted to HDFS properly. In this case,
-Accumulo will automatically reapply such mutations from the write-ahead log
-either when the tablets from the failed server are reassigned by the Master 
(in the
-case of a single TabletServer failure) or the next time Accumulo starts (in 
the event of
-failure during shutdown).
-
-Recovery is performed by asking a tablet server to sort the logs so that 
tablets can easily find their missing
-updates. The sort status of each file is displayed on
-Accumulo monitor status page. Once the recovery is complete any
-tablets involved should return to an ``online'' state. Until then those 
tablets will be
-unavailable to clients.
-
-The Accumulo client library is configured to retry failed mutations and in many
-cases clients will be able to continue processing after the recovery process 
without
-throwing an exception.
-
-=== Migrating Accumulo from non-HA Namenode to HA Namenode
-
-The following steps will allow a non-HA instance to be migrated to an HA 
instance. Consider an HDFS URL
-+hdfs://namenode.example.com:8020+ which is going to be moved to 
+hdfs://nameservice1+.
-
-Before moving HDFS over to the HA namenode, use +accumulo admin volumes+ to 
confirm
-that the only volume displayed is the volume from the current namenode's HDFS 
URL.
-
-    Listing volumes referenced in zookeeper
-            Volume : hdfs://namenode.example.com:8020/accumulo
-
-    Listing volumes referenced in accumulo.root tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
-
-    Listing volumes referenced in accumulo.metadata tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-
-    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
-
-After verifying the current volume is correct, shut down the cluster and 
transition HDFS to the HA nameservice.
-
-Edit +accumulo-site.xml+ to notify accumulo that a volume is being replaced. 
First,
-add the new nameservice volume to the +instance.volumes+ property. Next, add 
the
-+instance.volumes.replacements+ property in the form of +old new+. It's 
important to not include
-the volume that's being replaced in +instance.volumes+, otherwise it's 
possible accumulo could continue
-to write to the volume.
-
-[source,xml]
-<!-- instance.dfs.uri and instance.dfs.dir should not be set-->
-<property>
-  <name>instance.volumes</name>
-  <value>hdfs://nameservice1/accumulo</value>
-</property>
-<property>
-  <name>instance.volumes.replacements</name>
-  <value>hdfs://namenode.example.com:8020/accumulo 
hdfs://nameservice1/accumulo</value>
-</property>
-
-Run +accumulo init --add-volumes+ and start up the accumulo cluster. Verify 
that the
-new nameservice volume shows up with +accumulo admin volumes+.
-
-
-    Listing volumes referenced in zookeeper
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-
-    Listing volumes referenced in accumulo.root tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-    Listing volumes referenced in accumulo.root deletes section (volume 
replacement occurrs at deletion time)
-
-    Listing volumes referenced in accumulo.metadata tablets section
-            Volume : hdfs://namenode.example.com:8020/accumulo
-            Volume : hdfs://nameservice1/accumulo
-    Listing volumes referenced in accumulo.metadata deletes section (volume 
replacement occurrs at deletion time)
-
-Some erroneous GarbageCollector messages may still be seen for a small period 
while data is transitioning to
-the new volumes. This is expected and can usually be ignored.
-
-=== Achieving Stability in a VM Environment
-
-For testing, demonstration, and even operation uses, Accumulo is often
-installed and run in a virtual machine (VM) environment. The majority of
-long-term operational uses of Accumulo are on bare-metal cluster. However, the
-core design of Accumulo and its dependencies do not preclude running stably for
-long periods within a VM. Many of Accumuloâs operational robustness features 
to
-handle failures like periodic network partitioning in a large cluster carry
-over well to VM environments. This guide covers general recommendations for
-maximizing stability in a VM environment, including some of the common failure
-modes that are more common when running in VMs.
-
-==== Known failure modes: Setup and Troubleshooting
-
-In addition to the general failure modes of running Accumulo, VMs can 
introduce a
-couple of environmental challenges that can affect process stability. Clock
-drift is something that is more common in VMs, especially when VMs are
-suspended and resumed. Clock drift can cause Accumulo servers to assume that
-they have lost connectivity to the other Accumulo processes and/or lose their
-locks in Zookeeper. VM environments also frequently have constrained resources,
-such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally
-deals well with constrained resources from a stability perspective (optimizing
-performance will require additional tuning, which is not covered in this
-section), however there are some limits.
-
-===== Physical Memory
-
-One of those limits has to do with the Linux out of memory killer. A common
-failure mode in VM environments (and in some bare metal installations) is when
-the Linux out of memory killer decides to kill processes in order to avoid a
-kernel panic when provisioning a memory page. This often happens in VMs due to
-the large number of processes that must run in a small memory footprint. In
-addition to the Linux core processes, a single-node Accumulo setup requires a
-Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper
-server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer.
-Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop
-ResourceManager, a Hadoop NodeManager, provisioning software, and client
-applications. Between all of these processes, it is not uncommon to
-over-subscribe the available RAM in a VM. We recommend setting up VMs without
-swap enabled, so rather than performance grinding to a halt when physical
-memory is exhausted the kernel will randomly* select processes to kill in order
-to free up memory.
-
-Calculating the maximum possible memory usage is essential in creating a stable
-Accumulo VM setup. Safely engineering memory allocation for stability is a
-matter of then bringing the calculated maximum memory usage under the physical
-memory by a healthy margin. The margin is to account for operating system-level
-operations, such as managing process, maintaining virtual memory pages, and
-file system caching. When the java out-of-memory killer finds your process, you
-will probably only see evidence of that in /var/log/messages. Out-of-memory
-process kills do not show up in Accumulo or Hadoop logs.
-
-To calculate the max memory usage of all java virtual machine (JVM) processes
-add the maximum heap size (often limited by a -Xmx... argument, such as in
-accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage
-includes the following:
-
-* "Permanent Space", where the JVM stores Classes, Methods, and other code 
elements. This can be limited by a JVM flag such as +-XX:MaxPermSize:100m+, and 
is typically tens of megabytes.
-* Code generation space, where the JVM stores just-in-time compiled code. This 
is typically small enough to ignore
-* Socket buffers, where the JVM stores send and receive buffers for each 
socket.
-* Thread stacks, where the JVM allocates memory to manage each thread.
-* Direct memory space and JNI code, where applications can allocate memory 
outside of the JVM-managed space. For Accumulo, this includes the native 
in-memory maps that are allocated with the memory.maps.max parameter in 
accumulo-site.xml.
-* Garbage collection space, where the JVM stores information used for garbage 
collection.
-
-You can assume that each Hadoop and Accumulo process will use ~100-150MB for
-Off-heap memory, plus the in-memory map of the Accumulo TServer process. A
-simple calculation for physical memory requirements follows:
-
-....
-  Physical memory needed
-    = (per-process off-heap memory) + (heap memory) + (other processes) + 
(margin) 
-    = (number of java processes * 150M + native map) + (sum of -Xmx settings 
for java process) + (total applications memory, provisioning memory, etc.) + 
(1G)
-    = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M 
+512M) + (2G) + (1G)
-    = (2150M) + (7G) + (2G) + (1G)
-    = ~12GB
-....
-
-These calculations can add up quickly with the large number of processes,
-especially in constrained VM environments. To reduce the physical memory
-requirements, it is a good idea to reduce maximum heap limits and turn off
-unnecessary processes. If you're not using YARN in your application, you can
-turn off the ResourceManager and NodeManager. If you're not expecting to
-re-provision the cluster frequently you can turn off or reduce provisioning
-processes such as Salt Stack minions and masters.
-
-===== Disk Space
-Disk space is primarily used for two operations: storing data and storing logs.
-While Accumulo generally stores all of its key/value data in HDFS, Accumulo,
-Hadoop, and Zookeeper all store a significant amount of logs in a directory on
-a local file system. Care should be taken to make sure that (a) limitations to
-the amount of logs generated are in place, and (b) enough space is available to
-host the generated logs on the partitions that they are assigned. When space is
-not available to log, processes will hang. This can cause interruptions in
-availability of Accumulo, as well as cascade into failures of various
-processes.
-
-Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of
-them has a way of limiting the logs and directing them to a particular
-directory. Logs are generated independently for each process, so when
-considering the total space you need to add up the maximum logs generated by
-each process. Typically, a rolling log setup in which each process can generate
-something like 10 100MB files is instituted, resulting in a maximum file system
-usage of 1GB per process. Default setups for Hadoop and Zookeeper are often
-unbounded, so it is important to set these limits in the logging configuration
-files for each subsystem. Consult the user manual for each system for
-instructions on how to limit generated logs.
-
-===== Zookeeper Interaction
-Accumulo is designed to scale up to thousands of nodes. At that scale,
-intermittent interruptions in network service and other rare failures of
-compute nodes become more common. To limit the impact of node failures on
-overall service availability, Accumulo uses a heartbeat monitoring system that
-leverages Zookeeper's ephemeral locks. There are several conditions that can
-occur that cause Accumulo process to lose their Zookeeper locks, some of which
-are true interruptions to availability and some of which are false positives.
-Several of these conditions become more common in VM environments, where they
-can be exacerbated by resource constraints and clock drift.
-
-==== Tested Versions
-Each release of Accumulo is built with a specific version of Apache
-Hadoop, Apache ZooKeeper and Apache Thrift.  We expect Accumulo to
-work with versions that are API compatible with those versions.
-However this compatibility is not guaranteed because Hadoop, ZooKeeper
-and Thift may not provide guarantees between their own versions. We
-have also found that certain versions of Accumulo and Hadoop included
-bugs that greatly affected overall stability.  Thrift is particularly
-prone to compatibility changes between versions and you must use the
-same version your Accumulo is built with.
-
-Please check the release notes for your Accumulo version or use the
-mailing lists at https://accumulo.apache.org for more info.


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/analytics.md
----------------------------------------------------------------------
diff --git a/docs/master/analytics.md b/docs/master/analytics.md
deleted file mode 100644
index 3b428bb..0000000
--- a/docs/master/analytics.md
+++ /dev/null
@@ -1,229 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Analytics
-
-Accumulo supports more advanced data processing than simply keeping keys
-sorted and performing efficient lookups. Analytics can be developed by using
-MapReduce and Iterators in conjunction with Accumulo tables.
-
-=== MapReduce
-
-Accumulo tables can be used as the source and destination of MapReduce jobs. To
-use an Accumulo table with a MapReduce job (specifically with the new Hadoop 
API
-as of version 0.20), configure the job parameters to use the 
AccumuloInputFormat
-and AccumuloOutputFormat. Accumulo specific parameters can be set via these
-two format classes to do the following:
-
-* Authenticate and provide user credentials for the input
-* Restrict the scan to a range of rows
-* Restrict the input to a subset of available columns
-
-==== Mapper and Reducer classes
-
-To read from an Accumulo table create a Mapper with the following class
-parameterization and be sure to configure the AccumuloInputFormat.
-
-[source,java]
-class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
-    public void map(Key k, Value v, Context c) {
-        // transform key and value data here
-    }
-}
-
-To write to an Accumulo table, create a Reducer with the following class
-parameterization and be sure to configure the AccumuloOutputFormat. The key
-emitted from the Reducer identifies the table to which the mutation is sent. 
This
-allows a single Reducer to write to more than one table if desired. A default 
table
-can be configured using the AccumuloOutputFormat, in which case the output 
table
-name does not have to be passed to the Context object within the Reducer.
-
-[source,java]
-class MyReducer extends Reducer<WritableComparable, Writable, Text, Mutation> {
-    public void reduce(WritableComparable key, Iterable<Text> values, Context 
c) {
-        Mutation m;
-        // create the mutation based on input key and value
-        c.write(new Text("output-table"), m);
-    }
-}
-
-The Text object passed as the output should contain the name of the table to 
which
-this mutation should be applied. The Text can be null in which case the 
mutation
-will be applied to the default table name specified in the AccumuloOutputFormat
-options.
-
-==== AccumuloInputFormat options
-
-[source,java]
-----
-Job job = new Job(getConf());
-AccumuloInputFormat.setInputInfo(job,
-        "user",
-        "passwd".getBytes(),
-        "table",
-        new Authorizations());
-
-AccumuloInputFormat.setZooKeeperInstance(job, "myinstance",
-        "zooserver-one,zooserver-two");
-----
-
-*Optional Settings:*
-
-To restrict Accumulo to a set of row ranges:
-
-[source,java]
-ArrayList<Range> ranges = new ArrayList<Range>();
-// populate array list of row ranges ...
-AccumuloInputFormat.setRanges(job, ranges);
-
-To restrict Accumulo to a list of columns:
-
-[source,java]
-ArrayList<Pair<Text,Text>> columns = new ArrayList<Pair<Text,Text>>();
-// populate list of columns
-AccumuloInputFormat.fetchColumns(job, columns);
-
-To use a regular expression to match row IDs:
-
-[source,java]
-IteratorSetting is = new IteratorSetting(30, RexExFilter.class);
-RegExFilter.setRegexs(is, ".*suffix", null, null, null, true);
-AccumuloInputFormat.addIterator(job, is);
-
-==== AccumuloMultiTableInputFormat options
-
-The AccumuloMultiTableInputFormat allows the scanning over multiple tables
-in a single MapReduce job. Separate ranges, columns, and iterators can be
-used for each table.
-
-[source,java]
-InputTableConfig tableOneConfig = new InputTableConfig();
-InputTableConfig tableTwoConfig = new InputTableConfig();
-
-To set the configuration objects on the job:
-
-[source,java]
-Map<String, InputTableConfig> configs = new HashMap<String,InputTableConfig>();
-configs.put("table1", tableOneConfig);
-configs.put("table2", tableTwoConfig);
-AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);
-
-*Optional settings:*
-
-To restrict to a set of ranges:
-
-[source,java]
-ArrayList<Range> tableOneRanges = new ArrayList<Range>();
-ArrayList<Range> tableTwoRanges = new ArrayList<Range>();
-// populate array lists of row ranges for tables...
-tableOneConfig.setRanges(tableOneRanges);
-tableTwoConfig.setRanges(tableTwoRanges);
-
-To restrict Accumulo to a list of columns:
-
-[source,java]
-ArrayList<Pair<Text,Text>> tableOneColumns = new ArrayList<Pair<Text,Text>>();
-ArrayList<Pair<Text,Text>> tableTwoColumns = new ArrayList<Pair<Text,Text>>();
-// populate lists of columns for each of the tables ...
-tableOneConfig.fetchColumns(tableOneColumns);
-tableTwoConfig.fetchColumns(tableTwoColumns);
-
-To set scan iterators:
-
-[source,java]
-List<IteratorSetting> tableOneIterators = new ArrayList<IteratorSetting>();
-List<IteratorSetting> tableTwoIterators = new ArrayList<IteratorSetting>();
-// populate the lists of iterator settings for each of the tables ...
-tableOneConfig.setIterators(tableOneIterators);
-tableTwoConfig.setIterators(tableTwoIterators);
-
-
-The name of the table can be retrieved from the input split:
-
-[source,java]
-class MyMapper extends Mapper<Key,Value,WritableComparable,Writable> {
-    public void map(Key k, Value v, Context c) {
-        RangeInputSplit split = (RangeInputSplit)c.getInputSplit();
-        String tableName = split.getTableName();
-        // do something with table name
-    }
-}
-
-
-==== AccumuloOutputFormat options
-
-[source,java]
-----
-boolean createTables = true;
-String defaultTable = "mytable";
-
-AccumuloOutputFormat.setOutputInfo(job,
-        "user",
-        "passwd".getBytes(),
-        createTables,
-        defaultTable);
-
-AccumuloOutputFormat.setZooKeeperInstance(job, "myinstance",
-        "zooserver-one,zooserver-two");
-----
-
-*Optional Settings:*
-
-[source,java]
-AccumuloOutputFormat.setMaxLatency(job, 300000); // milliseconds
-AccumuloOutputFormat.setMaxMutationBufferSize(job, 50000000); // bytes
-
-The 
https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md[MapReduce
 example]
-contains a complete example of using MapReduce with Accumulo.
-
-=== Combiners
-
-Many applications can benefit from the ability to aggregate values across 
common
-keys. This can be done via Combiner iterators and is similar to the Reduce 
step in
-MapReduce. This provides the ability to define online, incrementally updated
-analytics without the overhead or latency associated with batch-oriented
-MapReduce jobs.
-
-All that is needed to aggregate values of a table is to identify the fields 
over which
-values will be grouped, insert mutations with those fields as the key, and 
configure
-the table with a combining iterator that supports the summarizing operation
-desired.
-
-The only restriction on an combining iterator is that the combiner developer
-should not assume that all values for a given key have been seen, since new
-mutations can be inserted at anytime. This precludes using the total number of
-values in the aggregation such as when calculating an average, for example.
-
-==== Feature Vectors
-
-An interesting use of combining iterators within an Accumulo table is to store
-feature vectors for use in machine learning algorithms. For example, many
-algorithms such as k-means clustering, support vector machines, anomaly 
detection,
-etc. use the concept of a feature vector and the calculation of distance 
metrics to
-learn a particular model. The columns in an Accumulo table can be used to 
efficiently
-store sparse features and their weights to be incrementally updated via the 
use of an
-combining iterator.
-
-=== Statistical Modeling
-
-Statistical models that need to be updated by many machines in parallel could 
be
-similarly stored within an Accumulo table. For example, a MapReduce job that is
-iteratively updating a global statistical model could have each map or reduce 
worker
-reference the parts of the model to be read and updated through an embedded
-Accumulo client.
-
-Using Accumulo this way enables efficient and fast lookups and updates of small
-pieces of information in a random access pattern, which is complementary to
-MapReduce's sequential access model.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/clients.md
----------------------------------------------------------------------
diff --git a/docs/master/clients.md b/docs/master/clients.md
deleted file mode 100644
index a18ab91..0000000
--- a/docs/master/clients.md
+++ /dev/null
@@ -1,403 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Writing Accumulo Clients
-
-=== Running Client Code
-
-There are multiple ways to run Java code that uses Accumulo. Below is a list
-of the different ways to execute client code.
-
-* using the +java+ command
-* using the +accumulo+ command
-* using the +accumulo-util hadoop-jar+ command
-
-==== Using the java command
-
-To run Accumulo client code using the +java+ command, use the +accumulo 
classpath+ command 
-to include all of Accumulo's dependencies on your classpath:
-
-  java -classpath /path/to/my.jar:/path/to/dep.jar:$(accumulo classpath) 
com.my.Main arg1 arg2
-
-If you would like to review which jars are included, the +accumulo classpath+ 
command can
-output a more human readable format using the +-d+ option which enables 
debugging:
-
-  accumulo classpath -d
-
-==== Using the accumulo command
-
-Another option for running your code is to use the Accumulo script which can 
execute a
-main class (if it exists on its classpath):
-
-  accumulo com.foo.Client arg1 arg2
-
-While the Accumulo script will add all of Accumulo's dependencies to the 
classpath, you
-will need to add any jars that your create or depend on beyond what Accumulo 
already
-depends on. This can be accomplished by either adding the jars to the 
+lib/ext+ directory
-of your Accumulo installation or by adding jars to the CLASSPATH variable 
before calling
-the accumulo command.
-
-  export CLASSPATH=/path/to/my.jar:/path/to/dep.jar; accumulo com.foo.Client 
arg1 arg2
-
-==== Using the 'accumulo-util hadoop-jar' command
-
-If you are writing map reduce job that accesses Accumulo, then you can use
-+accumulo-util hadoop-jar+ to run those jobs. See the map reduce example.
-
-=== Connecting
-
-All clients must first identify the Accumulo instance to which they will be
-communicating. Code to do this is as follows:
-
-[source,java]
-----
-String instanceName = "myinstance";
-String zooServers = "zooserver-one,zooserver-two"
-Instance inst = new ZooKeeperInstance(instanceName, zooServers);
-
-Connector conn = inst.getConnector("user", new PasswordToken("passwd"));
-----
-
-The PasswordToken is the most common implementation of an 
`AuthenticationToken`.
-This general interface allow authentication as an Accumulo user to come from
-a variety of sources or means. The CredentialProviderToken leverages the Hadoop
-CredentialProviders (new in Hadoop 2.6).
-
-For example, the CredentialProviderToken can be used in conjunction with a Java
-KeyStore to alleviate passwords stored in cleartext. When stored in HDFS, a 
single
-KeyStore can be used across an entire instance. Be aware that KeyStores stored 
on
-the local filesystem must be made available to all nodes in the Accumulo 
cluster.
-
-[source,java]
-----
-KerberosToken token = new KerberosToken();
-Connector conn = inst.getConnector(token.getPrincipal(), token);
-----
-
-The KerberosToken can be provided to use the authentication provided by 
Kerberos.
-Using Kerberos requires external setup and additional configuration, but 
provides
-a single point of authentication through HDFS, YARN and ZooKeeper and allowing
-for password-less authentication with Accumulo.
-
-=== Writing Data
-
-Data are written to Accumulo by creating Mutation objects that represent all 
the
-changes to the columns of a single row. The changes are made atomically in the
-TabletServer. Clients then add Mutations to a BatchWriter which submits them to
-the appropriate TabletServers.
-
-Mutations can be created thus:
-
-[source,java]
-----
-Text rowID = new Text("row1");
-Text colFam = new Text("myColFam");
-Text colQual = new Text("myColQual");
-ColumnVisibility colVis = new ColumnVisibility("public");
-long timestamp = System.currentTimeMillis();
-
-Value value = new Value("myValue".getBytes());
-
-Mutation mutation = new Mutation(rowID);
-mutation.put(colFam, colQual, colVis, timestamp, value);
-----
-
-==== BatchWriter
-
-The BatchWriter is highly optimized to send Mutations to multiple TabletServers
-and automatically batches Mutations destined for the same TabletServer to
-amortize network overhead. Care must be taken to avoid changing the contents of
-any Object passed to the BatchWriter since it keeps objects in memory while
-batching.
-
-Mutations are added to a BatchWriter thus:
-
-[source,java]
-----
-// BatchWriterConfig has reasonable defaults
-BatchWriterConfig config = new BatchWriterConfig();
-config.setMaxMemory(10000000L); // bytes available to batchwriter for 
buffering mutations
-
-BatchWriter writer = conn.createBatchWriter("table", config)
-
-writer.addMutation(mutation);
-
-writer.close();
-----
-
-For more example code, see the 
https://github.com/apache/accumulo-examples/blob/master/docs/batch.md[batch 
writing and scanning example].
-
-==== ConditionalWriter
-
-The ConditionalWriter enables efficient, atomic read-modify-write operations on
-rows.  The ConditionalWriter writes special Mutations which have a list of per
-column conditions that must all be met before the mutation is applied.  The
-conditions are checked in the tablet server while a row lock is
-held (Mutations written by the BatchWriter will not obtain a row
-lock).  The conditions that can be checked for a column are equality and
-absence.  For example a conditional mutation can require that column A is
-absent inorder to be applied.  Iterators can be applied when checking
-conditions.  Using iterators, many other operations besides equality and
-absence can be checked.  For example, using an iterator that converts values
-less than 5 to 0 and everything else to 1, its possible to only apply a
-mutation when a column is less than 5.
-
-In the case when a tablet server dies after a client sent a conditional
-mutation, its not known if the mutation was applied or not.  When this happens
-the ConditionalWriter reports a status of UNKNOWN for the ConditionalMutation.
-In many cases this situation can be dealt with by simply reading the row again
-and possibly sending another conditional mutation.  If this is not sufficient,
-then a higher level of abstraction can be built by storing transactional
-information within a row.
-
-See the 
https://github.com/apache/accumulo-examples/blob/master/docs/reservations.md[reservations
 example]
-for example code that uses the conditional writer.
-
-==== Durability
-
-By default, Accumulo writes out any updates to the Write-Ahead Log (WAL). 
Every change
-goes into a file in HDFS and is sync'd to disk for maximum durability. In
-the event of a failure, writes held in memory are replayed from the WAL. Like
-all files in HDFS, this file is also replicated. Sending updates to the
-replicas, and waiting for a permanent sync to disk can significantly write 
speeds.
-
-Accumulo allows users to use less tolerant forms of durability when writing.
-These levels are:
-
-* none: no durability guarantees are made, the WAL is not used
-* log: the WAL is used, but not flushed; loss of the server probably means 
recent writes are lost
-* flush: updates are written to the WAL, and flushed out to replicas; loss of 
a single server is unlikely to result in data loss.
-* sync: updates are written to the WAL, and synced to disk on all replicas 
before the write is acknowledge. Data will not be lost even if the entire 
cluster suddenly loses power.
-
-The user can set the default durability of a table in the shell.  When
-writing, the user can configure the BatchWriter or ConditionalWriter to use
-a different level of durability for the session. This will override the
-default durability setting.
-
-[source,java]
-----
-BatchWriterConfig cfg = new BatchWriterConfig();
-// We don't care about data loss with these writes:
-// This is DANGEROUS:
-cfg.setDurability(Durability.NONE);
-
-Connection conn = ... ;
-BatchWriter bw = conn.createBatchWriter(table, cfg);
-
-----
-
-=== Reading Data
-
-Accumulo is optimized to quickly retrieve the value associated with a given 
key, and
-to efficiently return ranges of consecutive keys and their associated values.
-
-==== Scanner
-
-To retrieve data, Clients use a Scanner, which acts like an Iterator over
-keys and values. Scanners can be configured to start and stop at particular 
keys, and
-to return a subset of the columns available.
-
-[source,java]
-----
-// specify which visibilities we are allowed to see
-Authorizations auths = new Authorizations("public");
-
-Scanner scan =
-    conn.createScanner("table", auths);
-
-scan.setRange(new Range("harry","john"));
-scan.fetchColumnFamily(new Text("attributes"));
-
-for(Entry<Key,Value> entry : scan) {
-    Text row = entry.getKey().getRow();
-    Value value = entry.getValue();
-}
-----
-
-==== Isolated Scanner
-
-Accumulo supports the ability to present an isolated view of rows when
-scanning. There are three possible ways that a row could change in Accumulo :
-
-* a mutation applied to a table
-* iterators executed as part of a minor or major compaction
-* bulk import of new files
-
-Isolation guarantees that either all or none of the changes made by these
-operations on a row are seen. Use the IsolatedScanner to obtain an isolated
-view of an Accumulo table. When using the regular scanner it is possible to see
-a non isolated view of a row. For example if a mutation modifies three
-columns, it is possible that you will only see two of those modifications.
-With the isolated scanner either all three of the changes are seen or none.
-
-The IsolatedScanner buffers rows on the client side so a large row will not
-crash a tablet server. By default rows are buffered in memory, but the user
-can easily supply their own buffer if they wish to buffer to disk when rows are
-large.
-
-See the 
https://github.com/apache/accumulo-examples/blob/master/docs/isolation.md[isolation
 example]
-for example code that uses the IsolatedScanner.
-
-==== BatchScanner
-
-For some types of access, it is more efficient to retrieve several ranges
-simultaneously. This arises when accessing a set of rows that are not 
consecutive
-whose IDs have been retrieved from a secondary index, for example.
-
-The BatchScanner is configured similarly to the Scanner; it can be configured 
to
-retrieve a subset of the columns available, but rather than passing a single 
Range,
-BatchScanners accept a set of Ranges. It is important to note that the keys 
returned
-by a BatchScanner are not in sorted order since the keys streamed are from 
multiple
-TabletServers in parallel.
-
-[source,java]
-----
-ArrayList<Range> ranges = new ArrayList<Range>();
-// populate list of ranges ...
-
-BatchScanner bscan =
-    conn.createBatchScanner("table", auths, 10);
-bscan.setRanges(ranges);
-bscan.fetchColumnFamily("attributes");
-
-for(Entry<Key,Value> entry : bscan) {
-    System.out.println(entry.getValue());
-}
-----
-
-For more example code, see the 
https://github.com/apache/accumulo-examples/blob/master/docs/batch.md[batch 
writing and scanning example].
-
-At this time, there is no client side isolation support for the BatchScanner.
-You may consider using the WholeRowIterator with the BatchScanner to achieve
-isolation. The drawback of this approach is that entire rows are read into
-memory on the server side. If a row is too big, it may crash a tablet server.
-
-=== Proxy
-
-The proxy API allows the interaction with Accumulo with languages other than 
Java.
-A proxy server is provided in the codebase and a client can further be 
generated.
-The proxy API can also be used instead of the traditional ZooKeeperInstance 
class to
-provide a single TCP port in which clients can be securely routed through a 
firewall,
-without requiring access to all tablet servers in the cluster.
-
-==== Prerequisites
-
-The proxy server can live on any node in which the basic client API would 
work. That
-means it must be able to communicate with the Master, ZooKeepers, NameNode, 
and the
-DataNodes. A proxy client only needs the ability to communicate with the proxy 
server.
-
-==== Configuration
-
-The configuration options for the proxy server live inside of a properties 
file. At
-the very least, you need to supply the following properties:
-
-  protocolFactory=org.apache.thrift.protocol.TCompactProtocol$Factory
-  tokenClass=org.apache.accumulo.core.client.security.tokens.PasswordToken
-  port=42424
-  instance=test
-  zookeepers=localhost:2181
-
-You can find a sample configuration file in your distribution at 
+proxy/proxy.properties+.
-
-This sample configuration file further demonstrates an ability to back the 
proxy server
-by MockAccumulo or the MiniAccumuloCluster.
-
-==== Running the Proxy Server
-
-After the properties file holding the configuration is created, the proxy 
server
-can be started using the following command in the Accumulo distribution 
(assuming
-your properties file is named +config.properties+):
-
-  accumulo proxy -p config.properties
-
-==== Creating a Proxy Client
-
-Aside from installing the Thrift compiler, you will also need the 
language-specific library
-for Thrift installed to generate client code in that language. Typically, your 
operating
-system's package manager will be able to automatically install these for you 
in an expected
-location such as +/usr/lib/python/site-packages/thrift+.
-
-You can find the thrift file for generating the client at +proxy/proxy.thrift+.
-
-After a client is generated, the port specified in the configuration 
properties above will be
-used to connect to the server.
-
-==== Using a Proxy Client
-
-The following examples have been written in Java and the method signatures may 
be
-slightly different depending on the language specified when generating client 
with
-the Thrift compiler. After initiating a connection to the Proxy (see Apache 
Thrift's
-documentation for examples of connecting to a Thrift service), the methods on 
the
-proxy client will be available. The first thing to do is log in:
-
-[source,java]
-Map password = new HashMap<String,String>();
-password.put("password", "secret");
-ByteBuffer token = client.login("root", password);
-
-Once logged in, the token returned will be used for most subsequent calls to 
the client.
-Let's create a table, add some data, scan the table, and delete it.
-
-
-First, create a table.
-
-[source,java]
-client.createTable(token, "myTable", true, TimeType.MILLIS);
-
-
-Next, add some data:
-
-[source,java]
-----
-// first, create a writer on the server
-String writer = client.createWriter(token, "myTable", new WriterOptions());
-
-//rowid
-ByteBuffer rowid = ByteBuffer.wrap("UUID".getBytes());
-
-//mutation like class
-ColumnUpdate cu = new ColumnUpdate();
-cu.setColFamily("MyFamily".getBytes());
-cu.setColQualifier("MyQualifier".getBytes());
-cu.setColVisibility("VisLabel".getBytes());
-cu.setValue("Some Value.".getBytes());
-
-List<ColumnUpdate> updates = new ArrayList<ColumnUpdate>();
-updates.add(cu);
-
-// build column updates
-Map<ByteBuffer, List<ColumnUpdate>> cellsToUpdate = new HashMap<ByteBuffer, 
List<ColumnUpdate>>();
-cellsToUpdate.put(rowid, updates);
-
-// send updates to the server
-client.updateAndFlush(writer, "myTable", cellsToUpdate);
-
-client.closeWriter(writer);
-----
-
-
-Scan for the data and batch the return of the results on the server:
-
-[source,java]
-----
-String scanner = client.createScanner(token, "myTable", new ScanOptions());
-ScanResult results = client.nextK(scanner, 100);
-
-for(KeyValue keyValue : results.getResultsIterator()) {
-  // do something with results
-}
-
-client.closeScanner(scanner);
-----

[10/16] accumulo-website git commit: ACCUMULO-4630 Refactored documentation for website

Reply via email to