[03/16] accumulo-website git commit: ACCUMULO-4630 Copied docs from Accumulo repo

mwalch Mon, 22 May 2017 10:30:03 -0700

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/kerberos.md
----------------------------------------------------------------------
diff --git a/docs/master/kerberos.md b/docs/master/kerberos.md
new file mode 100644
index 0000000..743285a
--- /dev/null
+++ b/docs/master/kerberos.md
@@ -0,0 +1,663 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Kerberos
+
+=== Overview
+
+Kerberos is a network authentication protocol that provides a secure way for
+peers to prove their identity over an unsecure network in a client-server 
model.
+A centralized key-distribution center (KDC) is the service that coordinates
+authentication between a client and a server. Clients and servers use 
"tickets",
+obtained from the KDC via a password or a special file called a "keytab", to
+communicate with the KDC and prove their identity. A KDC administrator must
+create the principal (name for the client/server identiy) and the password
+or keytab, securely passing the necessary information to the actual 
user/service.
+Properly securing the KDC and generated ticket material is central to the 
security
+model and is mentioned only as a warning to administrators running their own 
KDC.
+
+To interact with Kerberos programmatically, GSSAPI and SASL are two standards
+which allow cross-language integration with Kerberos for authentication. 
GSSAPI,
+the generic security service application program interface, is a standard which
+Kerberos implements. In the Java programming language, the language itself 
also implements
+GSSAPI which is leveraged by other applications, like Apache Hadoop and Apache 
Thrift.
+SASL, simple authentication and security layer, is a framework for 
authentication and
+and security over the network. SASL provides a number of mechanisms for 
authentication,
+one of which is GSSAPI. Thus, SASL provides the transport which authenticates
+using GSSAPI that Kerberos implements.
+
+Kerberos is a very complicated software application and is deserving of much
+more description than can be provided here. An 
http://www.roguelynn.com/words/explain-like-im-5-kerberos/[explain like
+I'm 5] blog post is very good at distilling the basics, while 
http://web.mit.edu/kerberos/[MIT Kerberos's project page]
+contains lots of documentation for users or administrators. Various Hadoop 
"vendors"
+also provide free documentation that includes step-by-step instructions for
+configuring Hadoop and ZooKeeper (which will be henceforth considered as 
prerequisites).
+
+=== Within Hadoop
+
+Out of the box, HDFS and YARN have no ability to enforce that a user is who
+they claim they are. Thus, any basic Hadoop installation should be treated as
+unsecure: any user with access to the cluster has the ability to access any 
data.
+Using Kerberos to provide authentication, users can be strongly identified, 
delegating
+to Kerberos to determine who a user is and enforce that a user is who they 
claim to be.
+As such, Kerberos is widely used across the entire Hadoop ecosystem for strong
+authentication. Since server processes accessing HDFS or YARN are required
+to use Kerberos to authenticate with HDFS, it makes sense that they also 
require
+Kerberos authentication from their clients, in addition to other features 
provided
+by SASL.
+
+A typical deployment involves the creation of Kerberos principals for all 
server
+processes (Hadoop datanodes and namenode(s), ZooKeepers), the creation of a 
keytab
+file for each principal and then proper configuration for the Hadoop site xml 
files.
+Users also need Kerberos principals created for them; however, a user typically
+uses a password to identify themselves instead of a keytab. Users can obtain a
+ticket granting ticket (TGT) from the KDC using their password which allows 
them
+to authenticate for the lifetime of the TGT (typically one day by default) and 
alleviates
+the need for further password authentication.
+
+For client server applications, like web servers, a keytab can be created which
+allow for fully-automated Kerberos identification removing the need to enter 
any
+password, at the cost of needing to protect the keytab file. These principals
+will apply directly to authentication for clients accessing Accumulo and the
+Accumulo processes accessing HDFS.
+
+=== Delegation Tokens
+
+MapReduce, a common way that clients interact with Accumulo, does not map well 
to the
+client-server model that Kerberos was originally designed to support. 
Specifically, the parallelization
+of tasks across many nodes introduces the problem of securely sharing the user 
credentials across
+these tasks in as safe a manner as possible. To address this problem, Hadoop 
introduced the notion
+of a delegation token to be used in distributed execution settings.
+
+A delegation token is nothing more than a short-term, on-the-fly password 
generated after authenticating with the user's
+credentials.  In Hadoop itself, the Namenode and ResourceManager, for HDFS and 
YARN respectively, act as the gateway for
+delegation tokens requests. For example, before a YARN job is submitted, the 
implementation will request delegation
+tokens from the NameNode and ResourceManager so the YARN tasks can communicate 
with HDFS and YARN. In the same manner,
+support has been added in the Accumulo Master to generate delegation tokens to 
enable interaction with Accumulo via
+MapReduce when Kerberos authentication is enabled in a manner similar to HDFS 
and YARN.
+
+Generating an expiring password is, arguably, more secure than distributing 
the user's
+credentials across the cluster as only access to HDFS, YARN and Accumulo would 
be
+compromised in the case of the token being compromised as opposed to the entire
+Kerberos credential. Additional details for clients and servers will be covered
+in subsequent sections.
+
+=== Configuring Accumulo
+
+To configure Accumulo for use with Kerberos, both client-facing and 
server-facing
+changes must be made for a functional system on secured Hadoop. As previously 
mentioned,
+numerous guidelines already exist on the subject of configuring Hadoop and 
ZooKeeper for
+use with Kerberos and won't be covered here. It is assumed that you have 
functional
+Hadoop and ZooKeeper already installed.
+
+Note that on an existing cluster the server side changes will require a full 
cluster shutdown and restart. You should
+wait to restart the TraceServers until after you've completed the rest of the 
cluster set up and provisioned
+a trace user with appropriate permissions.
+
+==== Servers
+
+The first step is to obtain a Kerberos identity for the Accumulo server 
processes.
+When running Accumulo with Kerberos enabled, a valid Kerberos identity will be 
required
+to initiate any RPC between Accumulo processes (e.g. Master and TabletServer) 
in addition
+to any HDFS action (e.g. client to HDFS or TabletServer to HDFS).
+
+===== Generate Principal and Keytab
+
+In the +kadmin.local+ shell or using the +-q+ option on +kadmin.local+, create 
a
+principal for Accumulo for all hosts that are running Accumulo processes. A 
Kerberos
+principal is of the form "primary/instance@REALM". "accumulo" is commonly the 
"primary"
+(although not required) and the "instance" is the fully-qualified domain name 
for
+the host that will be running the Accumulo process -- this is required.
+
+----
+kadmin.local -q "addprinc -randkey accumulo/host.domain.com"
+----
+
+Perform the above for each node running Accumulo processes in the instance, 
modifying
+"host.domain.com" for your network. The +randkey+ option generates a random 
password
+because we will use a keytab for authentication, not a password, since the 
Accumulo
+server processes don't have an interactive console to enter a password into.
+
+----
+kadmin.local -q "xst -k accumulo.hostname.keytab accumulo/host.domain.com"
+----
+
+To simplify deployments, at thet cost of security, all Accumulo principals 
could
+be globbed into a single keytab
+
+----
+kadmin.local -q "xst -k accumulo.service.keytab -glob accumulo*"
+----
+
+To ensure that the SASL handshake can occur from clients to servers and 
servers to servers,
+all Accumulo servers must share the same instance and realm principal 
components as the
+"client" needs to know these to set up the connection with the "server".
+
+===== Server Configuration
+
+A number of properties need to be changed to account to properly configure 
servers
+in +accumulo-site.xml+.
+
+[options="header"]
+|=================================================================
+|Key | Default Value | Description
+|general.kerberos.keytab                 
|/etc/security/keytabs/accumulo.service.keytab |
+The path to the keytab for Accumulo on local filesystem. Change the value to 
the actual path on your system.
+
+|general.kerberos.principal              |accumulo/_HOST@REALM |
+The Kerberos principal for Accumulo, needs to match the keytab. "_HOST" can be 
used instead of the actual hostname in the principal and will be automatically 
expanded to the current FQDN which reduces the configuration file burden.
+
+|instance.rpc.sasl.enabled               |true |
+Enables SASL for the Thrift Servers (supports GSSAPI)
+
+|rpc.sasl.qop                            |auth |
+One of "auth", "auth-int", or "auth-conf". These map to the SASL defined 
properties for
+quality of protection. "auth" is authentication only. "auth-int" is 
authentication and data
+integrity. "auth-conf" is authentication, data integrity and confidentiality.
+
+
+|instance.security.authenticator         |
+org.apache.accumulo.server.security.
+handler.KerberosAuthenticator |
+Configures Accumulo to use the Kerberos principal as the Accumulo 
username/principal
+
+|instance.security.authorizor            |
+org.apache.accumulo.server.security.
+handler.KerberosAuthorizor |
+Configures Accumulo to use the Kerberos principal for authorization purposes
+
+|instance.security.permissionHandler     |
+org.apache.accumulo.server.security.
+handler.KerberosPermissionHandler|
+Configures Accumulo to use the Kerberos principal for permission purposes
+
+|trace.token.type                        |
+org.apache.accumulo.core.client.
+security.tokens.KerberosToken |
+Configures the Accumulo Tracer to use the KerberosToken for authentication 
when serializing traces to the trace table.
+
+|trace.user                              |accumulo/_HOST@REALM |
+The tracer process needs valid credentials to serialize traces to Accumulo. 
While the other server processes are
+creating a SystemToken from the provided keytab and principal, we can still 
use a normal KerberosToken and the same
+keytab/principal to serialize traces. Like non-Kerberized instances, the table 
must be created and permissions granted
+to the trace.user. The same +_HOST+ replacement is performed on this value, 
substituted the FQDN for +_HOST+.
+
+|trace.token.property.keytab             ||
+You can optionally specify the path to a keytab file for the principal given 
in the +trace.user+ property. If you don't
+set this path, it will default to the value given in 
+general.kerberos.principal+.
+
+|general.delegation.token.lifetime       |7d |
+The length of time that the server-side secret used to create delegation 
tokens is valid. After a server-side secret
+expires, a delegation token created with that secret is no longer valid.
+
+|general.delegation.token.update.interval|1d |
+The frequency in which new server-side secrets should be generated to create 
delegation tokens for clients. Generating
+new secrets reduces the likelihood of cryptographic attacks.
+
+|=================================================================
+
+Although it should be a prerequisite, it is ever important that you have DNS 
properly
+configured for your nodes and that Accumulo is configured to use the FQDN. It
+is extremely important to use the FQDN in each of the "hosts" files for each
+Accumulo process: +masters+, +monitors+, +tservers+, +tracers+, and +gc+.
+
+Normally, no changes are needed in +accumulo-env.sh+ to enable Kerberos. 
Typically, the +krb5.conf+
+is installed on the local machine in +/etc/+, and the Java library 
implementations will look
+here to find the necessary configuration to communicate with the KDC. Some 
installations
+may require a different +krb5.conf+ to be used for Accumulo which can be 
accomplished 
+by adding the JVM system property 
+-Djava.security.krb5.conf=/path/to/other/krb5.conf+ to
++JAVA_OPTS+ in +accumulo-env.sh+.
+
+===== KerberosAuthenticator
+
+The +KerberosAuthenticator+ is an implementation of the pluggable security 
interfaces
+that Accumulo provides. It builds on top of what the default ZooKeeper-based 
implementation,
+but removes the need to create user accounts with passwords in Accumulo for 
clients. As
+long as a client has a valid Kerberos identity, they can connect to and 
interact with
+Accumulo, but without any permissions (e.g. cannot create tables or write 
data). Leveraging
+ZooKeeper removes the need to change the permission handler and authorizor, so 
other Accumulo
+functions regarding permissions and cell-level authorizations do not change.
+
+It is extremely important to note that, while user operations like 
+SecurityOperations.listLocalUsers()+,
++SecurityOperations.dropLocalUser()+, and 
+SecurityOperations.createLocalUser()+ will not return
+errors, these methods are not equivalent to normal installations, as they will 
only operate on
+users which have, at one point in time, authenticated with Accumulo using 
their Kerberos identity.
+The KDC is still the authoritative entity for user management. The previously 
mentioned methods
+are provided as they simplify management of users within Accumulo, especially 
with respect
+to granting Authorizations and Permissions to new users.
+
+===== Administrative User
+
+Out of the box (without Kerberos enabled), Accumulo has a single user with 
administrative permissions "root".
+This users is used to "bootstrap" other users, creating less-privileged users 
for applications using
+the system. In Kerberos, to authenticate with the system, it's required that 
the client presents Kerberos
+credentials for the principal (user) the client is trying to authenticate as.
+
+Because of this, an administrative user named "root" would be useless in an 
instance using Kerberos,
+because it is very unlikely to have Kerberos credentials for a principal named 
`root`. When Kerberos is
+enabled, Accumulo will prompt for the name of a user to grant the same 
permissions as what the `root`
+user would normally have. The name of the Accumulo user to grant 
administrative permissions to can
+also be given by the `-u` or `--user` options.
+
+If you are enabling Kerberos on an existing cluster, you will need to 
reinitialize the security system in
+order to replace the existing "root" user with one that can be used with 
Kerberos. These steps should be
+completed after you have done the previously described configuration changes 
and will require access to
+a complete +accumulo-site.xml+, including the instance secret. Note that this 
process will delete all
+existing users in the system; you will need to reassign user permissions based 
on Kerberos principals.
+
+1. Ensure Accumulo is not running.
+2. Given the path to a +accumulo-site.xml+ with the instance secret, run the 
security reset tool. If you are
+prompted for a password you can just hit return, since it won't be used.
+3. Start the Accumulo cluster
+
+----
+$ accumulo-cluster stop
+...
+$ accumulo init --reset-security
+Running against secured HDFS
+Principal (user) to grant administrative privileges to : 
acculumo_ad...@example.com
+Enter initial password for accumulo_ad...@example.com (this may not be 
applicable for your security setup):
+Confirm initial password for accumulo_ad...@example.com:
+$ accumulo-cluster start
+...
+$
+----
+
+===== Verifying secure access
+
+To verify that servers have correctly started with Kerberos enabled, ensure 
that the processes
+are actually running (they should exit immediately if login fails) and verify 
that you see
+something similar to the following in the application log.
+
+----
+2015-01-07 11:57:56,826 [security.SecurityUtil] INFO : Attempting to login 
with keytab as accumulo/hostn...@example.com
+2015-01-07 11:57:56,830 [security.UserGroupInformation] INFO : Login 
successful for user accumulo/hostn...@example.com using keytab file 
/etc/security/keytabs/accumulo.service.keytab
+----
+
+===== Impersonation
+
+Impersonation is functionality which allows a certain user to act as another. 
One direct application
+of this concept within Accumulo is the Thrift proxy. The Thrift proxy is 
configured to accept
+user requests and pass them onto Accumulo, enabling client access to Accumulo 
via any thrift-compatible
+language. When the proxy is running with SASL transports, this enforces that 
clients present a valid
+Kerberos identity to make a connection. In this situation, the Thrift proxy 
server does not have
+access to the secret key material in order to make a secure connection to 
Accumulo as the client,
+it can only connect to Accumulo as itself. Impersonation, in this context, 
refers to the ability
+of the proxy to authenticate to Accumulo as itself, but act on behalf of an 
Accumulo user.
+
+Accumulo supports basic impersonation of end-users by a third party via static 
rules in Accumulo's
+site configuration file. These two properties are semi-colon separated 
properties which are aligned
+by index. This first element in the user impersonation property value matches 
the first element
+in the host impersonation property value, etc.
+
+----
+<property>
+  <name>instance.rpc.sasl.allowed.user.impersonation</name>
+  <value>$PROXY_USER:*</value>
+</property>
+
+<property>
+  <name>instance.rpc.sasl.allowed.host.impersonation</name>
+  <value>*</value>
+</property>
+----
+
+Here, +$PROXY_USER+ can impersonate any user from any host.
+
+The following is an example of specifying a subset of users +$PROXY_USER+ can 
impersonate and also
+limiting the hosts from which +$PROXY_USER+ can initiate requests from.
+
+----
+<property>
+  <name>instance.rpc.sasl.allowed.user.impersonation</name>
+  <value>$PROXY_USER:user1,user2;$PROXY_USER2:user2,user4</value>
+</property>
+
+<property>
+  <name>instance.rpc.sasl.allowed.host.impersonation</name>
+  <value>host1.domain.com,host2.domain.com;*</value>
+</property>
+----
+
+Here, +$PROXY_USER+ can impersonate user1 and user2 only from host1.domain.com 
or host2.domain.com.
++$PROXY_USER2+ can impersonate user2 and user4 from any host.
+
+In these examples, the value +$PROXY_USER+ is the Kerberos principal of the 
server which is acting on behalf of a user.
+Impersonation is enforced by the Kerberos principal and the host from which 
the RPC originated (from the perspective
+of the Accumulo TabletServers/Masters). An asterisk (*) can be used to specify 
all users or all hosts (depending on the context).
+
+===== Delegation Tokens
+
+Within Accumulo services, the primary task to implement delegation tokens is 
the generation and distribution
+of a shared secret among all Accumulo tabletservers and the master. The secret 
key allows for generation
+of delegation tokens for users and verification of delegation tokens presented 
by clients. If a server
+process is unaware of the secret key used to create a delegation token, the 
client cannot be authenticated.
+As ZooKeeper distribution is an asynchronous operation (typically on the order 
of seconds), the
+value for `general.delegation.token.update.interval` should be on the order of 
hours to days to reduce the
+likelihood of servers rejecting valid clients because the server did not yet 
see a new secret key.
+
+Supporting authentication with both Kerberos credentials and delegation 
tokens, the SASL thrift
+server accepts connections with either `GSSAPI` and `DIGEST-MD5` mechanisms 
set. The `DIGEST-MD5` mechanism
+enables authentication as a normal username and password exchange which 
`DelegationToken`s leverages.
+
+Since delegation tokens are a weaker form of authentication than Kerberos 
credentials, user access
+to obtain delegation tokens from Accumulo is protected with the 
`DELEGATION_TOKEN` system permission. Only
+users with the system permission are allowed to obtain delegation tokens. It 
is also recommended
+to configure confidentiality with SASL, using the `rpc.sasl.qop=auth-conf` 
configuration property, to
+ensure that prying eyes cannot view the `DelegationToken` as it passes over 
the network.
+
+----
+# Check a user's permissions
+admin@REALM@accumulo> userpermissions -u user@REALM
+
+# Grant the DELEGATION_TOKEN system permission to a user
+admin@REALM@accumulo> grant System.DELEGATION_TOKEN -s -u user@REALM
+----
+
+==== Clients
+
+===== Create client principal
+
+Like the Accumulo servers, clients must also have a Kerberos principal created 
for them. The
+primary difference between a server principal is that principals for users are 
created
+with a password and also not qualified to a specific instance (host).
+
+----
+kadmin.local -q "addprinc $user"
+----
+
+The above will prompt for a password for that user which will be used to 
identify that $user.
+The user can verify that they can authenticate with the KDC using the command 
`kinit $user`.
+Upon entering the correct password, a local credentials cache will be made 
which can be used
+to authenticate with Accumulo, access HDFS, etc.
+
+The user can verify the state of their local credentials cache by using the 
command `klist`.
+
+----
+$ klist
+Ticket cache: FILE:/tmp/krb5cc_123
+Default principal: u...@example.com
+
+Valid starting       Expires              Service principal
+01/07/2015 11:56:35  01/08/2015 11:56:35  krbtgt/example....@example.com
+       renew until 01/14/2015 11:56:35
+----
+
+===== Configuration
+
+The second thing clients need to do is to set up their client configuration 
file. By
+default, this file is stored in +~/.accumulo/config+ or 
+/path/to/accumulo/client.conf+.
+Accumulo utilities also allow you to provide your own copy of this file in any 
location
+using the +--config-file+ command line option.
+
+Three items need to be set to enable access to Accumulo:
+
+* +instance.rpc.sasl.enabled+=_true_
+* +rpc.sasl.qop+=_auth_
+* +kerberos.server.primary+=_accumulo_
+
+Each of these properties *must* match the configuration of the accumulo 
servers; this is
+required to set up the SASL transport.
+
+===== Verifying Administrative Access
+
+At this point you should have enough configured on the server and client side 
to interact with
+the system. You should verify that the administrative user you chose earlier 
can successfully
+interact with the sytem.
+
+While this example logs in via +kinit+ with a password, any login method that 
caches Kerberos tickets
+should work.
+
+----
+$ kinit accumulo_ad...@example.com
+Password for accumulo_ad...@example.com: ******************************
+$ accumulo shell
+
+Shell - Apache Accumulo Interactive Shell
+-
+- version: 1.7.2
+- instance name: MYACCUMULO
+- instance id: 483b9038-889f-4b2d-b72b-dfa2bb5dbd07
+-
+- type 'help' for a list of available commands
+-
+accumulo_ad...@example.com@MYACCUMULO> userpermissions
+System permissions: System.GRANT, System.CREATE_TABLE, System.DROP_TABLE, 
System.ALTER_TABLE, System.CREATE_USER, System.DROP_USER, System.ALTER_USER, 
System.SYSTEM, System.CREATE_NAMESPACE, System.DROP_NAMESPACE, 
System.ALTER_NAMESPACE, System.OBTAIN_DELEGATION_TOKEN
+
+Namespace permissions (accumulo): Namespace.READ, Namespace.ALTER_TABLE
+
+Table permissions (accumulo.metadata): Table.READ, Table.ALTER_TABLE
+Table permissions (accumulo.replication): Table.READ
+Table permissions (accumulo.root): Table.READ, Table.ALTER_TABLE
+
+accumulo_ad...@example.com@MYACCUMULO> quit
+$ kdestroy
+$
+----
+
+===== DelegationTokens with MapReduce
+
+To use DelegationTokens in a custom MapReduce job, the call to 
`setConnectorInfo()` method
+on `AccumuloInputFormat` or `AccumuloOutputFormat` should be the only 
necessary change. Instead
+of providing an instance of a `KerberosToken`, the user must call 
`SecurityOperations.getDelegationToken`
+using a `Connector` obtained with that `KerberosToken`, and pass the 
`DelegationToken` to
+`setConnectorInfo` instead of the `KerberosToken`. It is expected that the 
user launching
+the MapReduce job is already logged in via Kerberos via a keytab or via a 
locally-cached
+Kerberos ticket-granting-ticket (TGT).
+
+[source,java]
+----
+Instance instance = getInstance();
+KerberosToken kt = new KerberosToken();
+Connector conn = instance.getConnector(principal, kt);
+DelegationToken dt = conn.securityOperations().getDelegationToken();
+
+// Reading from Accumulo
+AccumuloInputFormat.setConnectorInfo(job, principal, dt);
+
+// Writing to Accumulo
+AccumuloOutputFormat.setConnectorInfo(job, principal, dt);
+----
+
+If the user passes a `KerberosToken` to the `setConnectorInfo` method, the 
implementation will
+attempt to obtain a `DelegationToken` automatically, but this does have 
limitations
+based on the other MapReduce configuration methods already called and 
permissions granted
+to the calling user. It is best for the user to acquire the DelegationToken on 
their own
+and provide it directly to `setConnectorInfo`.
+
+Users must have the `DELEGATION_TOKEN` system permission to call the 
`getDelegationToken`
+method. The obtained delegation token is only valid for the requesting user 
for a period
+of time dependent on Accumulo's configuration 
(`general.delegation.token.lifetime`).
+
+It is also possible to obtain and use `DelegationToken`s outside of the context
+of MapReduce.
+
+[source,java]
+----
+String principal = "user@REALM";
+Instance instance = getInstance();
+Connector connector = instance.getConnector(principal, new KerberosToken());
+DelegationToken delegationToken = 
connector.securityOperations().getDelegationToken();
+
+Connector dtConnector = instance.getConnector(principal, delegationToken);
+----
+
+Use of the `dtConnector` will perform each operation as the original user, but 
without
+their Kerberos credentials.
+
+For the duration of validity of the `DelegationToken`, the user *must* take 
the necessary precautions
+to protect the `DelegationToken` from prying eyes as it can be used by any 
user on any host to impersonate
+the user who requested the `DelegationToken`. YARN ensures that passing the 
delegation token from the client
+JVM to each YARN task is secure, even in multi-tenant instances.
+
+==== Debugging
+
+*Q*: I have valid Kerberos credentials and a correct client configuration file 
but
+I still get errors like:
+
+----
+java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]
+----
+
+*A*: When you have a valid client configuration and Kerberos TGT, it is 
possible that the search
+path for your local credentials cache is incorrect. Check the value of the 
KRB5CCNAME environment
+value, and ensure it matches the value reported by `klist`.
+
+----
+$ echo $KRB5CCNAME
+
+$ klist
+Ticket cache: FILE:/tmp/krb5cc_123
+Default principal: u...@example.com
+
+Valid starting       Expires              Service principal
+01/07/2015 11:56:35  01/08/2015 11:56:35  krbtgt/example....@example.com
+       renew until 01/14/2015 11:56:35
+$ export KRB5CCNAME=/tmp/krb5cc_123
+$ echo $KRB5CCNAME
+/tmp/krb5cc_123
+----
+
+*Q*: I thought I had everything configured correctly, but my client/server 
still fails to log in.
+I don't know what is actually failing.
+
+*A*: Add the following system property to the JVM invocation:
+
+----
+-Dsun.security.krb5.debug=true
+----
+
+This will enable lots of extra debugging at the JVM level which is often 
sufficient to
+diagnose some high-level configuration problem. Client applications can add 
this system property by
+hand to the command line and Accumulo server processes or applications started 
using the `accumulo`
+script by adding the property to +JAVA_OPTS+ in +accumulo-env.sh+.
+
+Additionally, you can increase the log4j levels on 
+org.apache.hadoop.security+, which includes the
+Hadoop +UserGroupInformation+ class, which will include some high-level debug 
statements. This
+can be controlled in your client application, or using 
+log4j-service.properties+
+
+*Q*: All of my Accumulo processes successfully start and log in with their
+keytab, but they are unable to communicate with each other, showing the
+following errors:
+
+----
+2015-01-12 14:47:27,055 [transport.TSaslTransport] ERROR: SASL negotiation 
failure
+javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Server not found 
in Kerberos database (7) - LOOKING_UP_SERVER)]
+        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
+        at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
+        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
+        at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:53)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport$1.run(UGIAssumingTransport.java:49)
+        at java.security.AccessController.doPrivileged(Native Method)
+        at javax.security.auth.Subject.doAs(Subject.java:415)
+        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransport.open(UGIAssumingTransport.java:49)
+        at 
org.apache.accumulo.core.rpc.ThriftUtil.createClientTransport(ThriftUtil.java:357)
+        at 
org.apache.accumulo.core.rpc.ThriftUtil.createTransport(ThriftUtil.java:255)
+        at 
org.apache.accumulo.server.master.LiveTServerSet$TServerConnection.getTableMap(LiveTServerSet.java:106)
+        at 
org.apache.accumulo.master.Master.gatherTableInformation(Master.java:996)
+        at org.apache.accumulo.master.Master.access$600(Master.java:160)
+        at 
org.apache.accumulo.master.Master$StatusThread.updateStatus(Master.java:911)
+        at org.apache.accumulo.master.Master$StatusThread.run(Master.java:901)
+Caused by: GSSException: No valid credentials provided (Mechanism level: 
Server not found in Kerberos database (7) - LOOKING_UP_SERVER)
+        at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
+        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
+        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
+        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
+        ... 16 more
+Caused by: KrbException: Server not found in Kerberos database (7) - 
LOOKING_UP_SERVER
+        at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
+        at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
+        at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
+        at 
sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:309)
+        at 
sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
+        at 
sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:454)
+        at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
+        ... 19 more
+Caused by: KrbException: Identifier doesn't match expected value (906)
+        at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
+        at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)
+        at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:61)
+        at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
+        ... 25 more
+----
+
+or
+
+----
+2015-01-12 14:47:29,440 [server.TThreadPoolServer] ERROR: Error occurred 
during processing of message.
+java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
Peer indicated failure: GSS initiate failed
+        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
+        at java.security.AccessController.doPrivileged(Native Method)
+        at javax.security.auth.Subject.doAs(Subject.java:356)
+        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
+        at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
+        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
+        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
+        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
+        at java.lang.Thread.run(Thread.java:745)
+Caused by: org.apache.thrift.transport.TTransportException: Peer indicated 
failure: GSS initiate failed
+        at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
+        at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
+        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
+        at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
+        at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
+        ... 10 more
+----
+
+*A*: As previously mentioned, the hostname, and subsequently the address each 
Accumulo process is bound/listening
+on, is extremely important when negotiating an SASL connection. This problem 
commonly arises when the Accumulo
+servers are not configured to listen on the address denoted by their FQDN.
+
+The values in the Accumulo "hosts" files (In +accumulo/conf+: +masters+, 
+monitors+, +tservers+, +tracers+,
+and +gc+) should match the instance componentof the Kerberos server principal 
(e.g. +host+ in +accumulo/h...@example.com+).
+
+*Q*: After configuring my system for Kerberos, server processes come up 
normally and I can interact with the system. However,
+when I attempt to use the "Recent Traces" page on the Monitor UI I get a 
stacktrace similar to:
+
+----
+                                                                         
java.lang.AssertionError: AuthenticationToken should not be null
+                                                                   at 
org.apache.accumulo.monitor.servlets.trace.Basic.getScanner(Basic.java:139)
+                                                                  at 
org.apache.accumulo.monitor.servlets.trace.Summary.pageBody(Summary.java:164)
+                                                                  at 
org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:63)
+                                                                           at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
+                                                                           at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
+                                                                      at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:738)
+                                                                    at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:551)
+                                                                  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
+                                                                   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:568)
+                                                                at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
+                                                                at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
+                                                                    at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:478)
+                                                                 at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
+                                                                at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
+                                                                  at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
+                                                                  at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
+                                                                             
at org.eclipse.jetty.server.Server.handle(Server.java:462)
+                                                                        at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:279)
+                                                                   at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:232)
+                                                                    at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
+                                                                 at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
+                                                                 at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
+                                                                               
       at java.lang.Thread.run(Thread.java:745)
+
+----
+
+*A*: This indicates that the Monitor has not been able to successfully log in 
a client-side user to read from the +trace+ table. Accumulo allows the 
TraceServer to rely on the property +general.kerberos.keytab+ as a fallback 
when logging in the trace user if the +trace.token.property.keytab+ property 
isn't defined. Some earlier versions of Accumulo did not do this same fallback 
for the Monitor's use of the trace user. The end result is that if you 
configure +general.kerberos.keytab+ and not +trace.token.property.keytab+ you 
will end up with a system that properly logs trace information but can't view 
it.
+
+Ensure you have set +trace.token.property.keytab+ to point to a keytab for the 
principal defined in +trace.user+ in the +accumulo-site.xml+ file for the 
Monitor, since that should work in all versions of Accumulo.


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/multivolume.md
----------------------------------------------------------------------
diff --git a/docs/master/multivolume.md b/docs/master/multivolume.md
new file mode 100644
index 0000000..8921f2d
--- /dev/null
+++ b/docs/master/multivolume.md
@@ -0,0 +1,82 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Multi-Volume Installations
+
+This is an advanced configuration setting for very large clusters
+under a lot of write pressure.
+
+The HDFS NameNode holds all of the metadata about the files in
+HDFS. For fast performance, all of this information needs to be stored
+in memory.  A single NameNode with 64G of memory can store the
+metadata for tens of millions of files.However, when scaling beyond a
+thousand nodes, an active Accumulo system can generate lots of updates
+to the file system, especially when data is being ingested.  The large
+number of write transactions to the NameNode, and the speed of a
+single edit log, can become the limiting factor for large scale
+Accumulo installations.
+
+You can see the effect of slow write transactions when the Accumulo
+Garbage Collector takes a long time (more than 5 minutes) to delete
+the files Accumulo no longer needs.  If your Garbage Collector
+routinely runs in less than a minute, the NameNode is performing well.
+
+However, if you do begin to experience slow-down and poor GC
+performance, Accumulo can be configured to use multiple NameNode
+servers.  The configuration ``instance.volumes'' should be set to a
+comma-separated list, using full URI references to different NameNode
+servers:
+
+[source,xml]
+<property>
+    <name>instance.volumes</name>
+    <value>hdfs://ns1:9001,hdfs://ns2:9001</value>
+</property>
+
+The introduction of multiple volume support in 1.6 changed the way Accumulo
+stores pointers to files.  It now stores fully qualified URI references to
+files.  Before 1.6, Accumulo stored paths that were relative to a table
+directory.   After an upgrade these relative paths will still exist and are
+resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in
+the same way they were before 1.6.
+
+If the URI for a namenode changes (e.g. namenode was running on host1 and its
+moved to host2), then Accumulo will no longer function.  Even if Hadoop and
+Accumulo configurations are changed, the fully qualified URIs stored in
+Accumulo will still contain the old URI.  To handle this Accumulo has the
+following configuration property for replacing URI stored in its metadata.  The
+example configuration below will replace ns1 with nsA and ns2 with nsB in
+Accumulo metadata.  For this property to take affect, Accumulo will need to be
+restarted.
+
+[source,xml]
+<property>
+    <name>instance.volumes.replacements</name>
+    <value>hdfs://ns1:9001 hdfs://nsA:9001, hdfs://ns2:9001 
hdfs://nsB:9001</value>
+</property>
+
+Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for
+managing the fully qualified URIs stored in Accumulo.  Viewfs and HA namenode
+both introduce a level of indirection in the Hadoop configuration.   For
+example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration.
+If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to
+hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo.
+A production system should probably use a HA namenode.  Viewfs may be useful on
+a test system with a single non HA namenode.
+
+You may also want to configure your cluster to use Federation,
+available in Hadoop 2.0, which allows DataNodes to respond to multiple
+NameNode servers, so you do not have to partition your DataNodes by
+NameNode.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/replication.md
----------------------------------------------------------------------
diff --git a/docs/master/replication.md b/docs/master/replication.md
new file mode 100644
index 0000000..fd8905e
--- /dev/null
+++ b/docs/master/replication.md
@@ -0,0 +1,399 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Replication
+
+=== Overview
+
+Replication is a feature of Accumulo which provides a mechanism to 
automatically
+copy data to other systems, typically for the purpose of disaster recovery,
+high availability, or geographic locality. It is best to consider this feature
+as a framework for automatic replication instead of the ability to copy data
+from to another Accumulo instance as copying to another Accumulo cluster is
+only an implementation detail. The local Accumulo cluster is hereby referred
+to as the +primary+ while systems being replicated to are known as
++peers+.
+
+This replication framework makes two Accumulo instances, where one instance
+replicates to another, eventually consistent between one another, as opposed
+to the strong consistency that each single Accumulo instance still holds. That
+is to say, attempts to read data from a table on a peer which has pending 
replication
+from the primary will not wait for that data to be replicated before running 
the scan.
+This is desirable for a number of reasons, the most important is that the 
replication
+framework is not limited by network outages or offline peers, but only by the 
HDFS
+space available on the primary system.
+
+Replication configurations can be considered as a directed graph which allows 
cycles.
+The systems in which data was replicated from is maintained in each Mutation 
which
+allow each system to determine if a peer has already has the data in which
+the system wants to send.
+
+Data is replicated by using the Write-Ahead logs (WAL) that each TabletServer 
is
+already maintaining. TabletServers records which WALs have data that need to be
+replicated to the +accumulo.metadata+ table. The Master uses these records,
+combined with the local Accumulo table that the WAL was used with, to create 
records
+in the +replication+ table which track which peers the given WAL should be
+replicated to. The Master latter uses these work entries to assign the actual
+replication task to a local TabletServer using ZooKeeper. A TabletServer will 
get
+a lock in ZooKeeper for the replication of this file to a peer, and proceed to
+replicate to the peer, recording progress in the +replication+ table as
+data is successfully replicated on the peer. Later, the Master and Garbage 
Collector
+will remove records from the +accumulo.metadata+ and +replication+ tables
+and files from HDFS, respectively, after replication to all peers is complete.
+
+=== Configuration
+
+Configuration of Accumulo to replicate data to another system can be 
categorized
+into the following sections.
+
+==== Site Configuration
+
+Each system involved in replication (even the primary) needs a name that 
uniquely
+identifies it across all peers in the replication graph. This should be 
considered
+fixed for an instance, and set in +accumulo-site.xml+.
+
+----
+<property>
+    <name>replication.name</name>
+    <value>primary</value>
+    <description>Unique name for this system used by replication</description>
+</property>
+----
+
+==== Instance Configuration
+
+For each peer of this system, Accumulo needs to know the name of that peer,
+the class used to replicate data to that system and some configuration 
information
+to connect to this remote peer. In the case of Accumulo, this additional data
+is the Accumulo instance name and ZooKeeper quorum; however, this varies on the
+replication implementation for the peer.
+
+These can be set in the site configuration to ease deployments; however, as 
they may
+change, it can be useful to set this information using the Accumulo shell.
+
+To configure a peer with the name +peer1+ which is an Accumulo system with an 
instance name of +accumulo_peer+
+and a ZooKeeper quorum of +10.0.0.1,10.0.2.1,10.0.3.1+, invoke the following
+command in the shell.
+
+----
+root@accumulo_primary> config -s
+replication.peer.peer1=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,accumulo_peer,10.0.0.1,10.0.2.1,10.0.3.1
+----
+
+Since this is an Accumulo system, we also want to set a username and password
+to use when authenticating with this peer. On our peer, we make a special user
+which has permission to write to the tables we want to replicate data into, 
"replication"
+with a password of "password". We then need to record this in the primary's 
configuration.
+
+----
+root@accumulo_primary> config -s replication.peer.user.peer1=replication
+root@accumulo_primary> config -s replication.peer.password.peer1=password
+----
+
+Alternatively, when configuring replication on Accumulo running Kerberos, a 
keytab
+file per peer can be configured instead of a password. The provided keytabs 
must be readable
+by the unix user running Accumulo. They keytab for a peer can be unique from 
the
+keytab used by Accumulo or any keytabs for other peers.
+
+----
+accum...@example.com@accumulo_primary> config -s 
replication.peer.user.peer1=replicat...@example.com
+accum...@example.com@accumulo_primary> config -s 
replication.peer.keytab.peer1=/path/to/replication.keytab
+----
+
+==== Table Configuration
+
+Now, we presently have a peer defined, so we just need to configure which 
tables will
+replicate to that peer. We also need to configure an identifier to determine 
where
+this data will be replicated on the peer. Since we're replicating to another 
Accumulo
+cluster, this is a table ID. In this example, we want to enable replication on
++my_table+ and configure our peer +accumulo_peer+ as a target, sending
+the data to the table with an ID of +2+ in +accumulo_peer+.
+
+----
+root@accumulo_primary> config -t my_table -s table.replication=true
+root@accumulo_primary> config -t my_table -s 
table.replication.target.accumulo_peer=2
+----
+
+To replicate a single table on the primary to multiple peers, the second 
command
+in the above shell snippet can be issued, for each peer and remote identifier 
pair.
+
+=== Monitoring
+
+Basic information about replication status from a primary can be found on the 
Accumulo
+Monitor server, using the +Replication+ link the sidebar.
+
+On this page, information is broken down into the following sections:
+
+1. Files pending replication by peer and target
+2. Files queued for replication, with progress made
+
+=== Work Assignment
+
+Depending on the schema of a table, different implementations of the 
WorkAssigner used could
+be configured. The implementation is controlled via the property 
+replication.work.assigner+
+and the full class name for the implementation. This can be configured via the 
shell or
++accumulo-site.xml+.
+
+----
+<property>
+    <name>replication.work.assigner</name>
+    
<value>org.apache.accumulo.master.replication.SequentialWorkAssigner</value>
+    <description>Implementation used to assign work for 
replication</description>
+</property>
+----
+
+----
+root@accumulo_primary> config -t my_table -s 
replication.work.assigner=org.apache.accumulo.master.replication.SequentialWorkAssigner
+----
+
+Two implementations are provided. By default, the +SequentialWorkAssigner+ is 
configured for an
+instance. The SequentialWorkAssigner ensures that, per peer and each remote 
identifier, each WAL is
+replicated in the order in which they were created. This is sufficient to 
ensure that updates to a table
+will be replayed in the correct order on the peer. This implementation has the 
downside of only replicating
+a single WAL at a time.
+
+The second implementation, the +UnorderedWorkAssigner+ can be used to overcome 
the limitation
+of only a single WAL being replicated to a target and peer at any time. 
Depending on the table schema,
+it's possible that multiple versions of the same Key with different values are 
infrequent or nonexistent.
+In this case, parallel replication to a peer and target is possible without 
any downsides. In the case
+where this implementation is used were column updates are frequent, it is 
possible that there will be
+an inconsistency between the primary and the peer.
+
+=== ReplicaSystems
+
++ReplicaSystem+ is the interface which allows abstraction of replication of 
data
+to peers of various types. Presently, only an +AccumuloReplicaSystem+ is 
provided
+which will replicate data to another Accumulo instance. A +ReplicaSystem+ 
implementation
+is run inside of the TabletServer process, and can be configured as mentioned 
in the 
++Instance Configuration+ section of this document. Theoretically, an 
implementation
+of this interface could send data to other filesystems, databases, etc.
+
+==== AccumuloReplicaSystem
+
+The +AccumuloReplicaSystem+ uses Thrift to communicate with a peer Accumulo 
instance
+and replicate the necessary data. The TabletServer running on the primary will 
communicate
+with the Master on the peer to request the address of a TabletServer on the 
peer which
+this TabletServer will use to replicate the data.
+
+The TabletServer on the primary will then replicate data in batches of a 
configurable
+size (+replication.max.unit.size+). The TabletServer on the peer will report 
how many
+records were applied back to the primary, which will be used to record how 
many records
+were successfully replicated. The TabletServer on the primary will continue to 
replicate
+data in these batches until no more data can be read from the file.
+
+=== Other Configuration
+
+There are a number of configuration values that can be used to control how
+the implementation of various components operate.
+
+[width="75%",cols=">,^2,^2"]
+[options="header"]
+|====
+|Property | Description | Default
+|replication.max.work.queue | Maximum number of files queued for replication 
at one time | 1000
+|replication.work.assignment.sleep | Time between invocations of the 
WorkAssigner | 30s
+|replication.worker.threads | Size of threadpool used to replicate data to 
peers | 4
+|replication.receipt.service.port | Thrift service port to listen for 
replication requests, can use '0' for a random port | 10002
+|replication.work.attempts | Number of attempts to replicate to a peer before 
aborting the attempt | 10
+|replication.receiver.min.threads | Minimum number of idle threads for 
handling incoming replication | 1
+|replication.receiver.threadcheck.time | Time between attempting adjustments 
of thread pool for incoming replications | 30s
+|replication.max.unit.size | Maximum amount of data to be replicated in one 
RPC | 64M
+|replication.work.assigner | Work Assigner implementation | 
org.apache.accumulo.master.replication.SequentialWorkAssigner
+|tserver.replication.batchwriter.replayer.memory| Size of BatchWriter cache to 
use in applying replication requests | 50M
+|====
+
+=== Example Practical Configuration
+
+A real-life example is now provided to give concrete application of 
replication configuration. This
+example is a two instance Accumulo system, one primary system and one peer 
system. They are called
+primary and peer, respectively. Each system also have a table of the same 
name, "my_table". The instance
+name for each is also the same (primary and peer), and both have ZooKeeper 
hosts on a node with a hostname
+with that name as well (primary:2181 and peer:2181).
+
+We want to configure these systems so that "my_table" on "primary" replicates 
to "my_table" on "peer".
+
+==== accumulo-site.xml
+
+We can assign the "unique" name that identifies this Accumulo instance among 
all others that might participate
+in replication together. In this example, we will use the names provided in 
the description.
+
+===== Primary
+
+----
+<property>
+  <name>replication.name</name>
+  <value>primary</value>
+  <description>Defines the unique name</description>
+</property>
+----
+
+===== Peer
+
+----
+<property>
+  <name>replication.name</name>
+  <value>peer</value>
+</property>
+----
+
+==== masters and tservers files
+
+Be *sure* to use non-local IP addresses. Other nodes need to connect to it and 
using localhost will likely result in
+a local node talking to another local node.
+
+==== Start both instances
+
+The rest of the configuration is dynamic and is best configured on the fly (in 
ZooKeeper) than in accumulo-site.xml.
+
+==== Peer
+
+The next series of command are to be run on the peer system. Create a user 
account for the primary instance called
+"peer". The password for this account will need to be saved in the 
configuration on the primary
+
+----
+root@peer> createtable my_table
+root@peer> createuser peer
+root@peer> grant -t my_table -u peer Table.WRITE
+root@peer> grant -t my_table -u peer Table.READ
+root@peer> tables -l
+----
+
+Remember what the table ID for 'my_table' is. You'll need that to configured 
the primary instance.
+
+==== Primary
+
+Next, configure the primary instance.
+
+===== Set up the table
+
+----
+root@primary> createtable my_table
+----
+
+===== Define the Peer as a replication peer to the Primary
+
+We're defining the instance with replication.name of 'peer' as a peer. We 
provide the implementation of ReplicaSystem
+that we want to use, and the configuration for the AccumuloReplicaSystem. In 
this case, the configuration is the Accumulo
+Instance name for 'peer' and the ZooKeeper quorum string. The configuration 
key is of the form
+"replication.peer.$peer_name".
+
+----
+root@primary> config -s 
replication.peer.peer=org.apache.accumulo.tserver.replication.AccumuloReplicaSystem,peer,$peer_zk_quorum
+----
+
+===== Set the authentication credentials
+
+We want to use that special username and password that we created on the peer, 
so we have a means to write data to
+the table that we want to replicate to. The configuration key is of the form 
"replication.peer.user.$peer_name".
+
+----
+root@primary> config -s replication.peer.user.peer=peer
+root@primary> config -s replication.peer.password.peer=peer
+----
+
+===== Enable replication on the table
+
+Now that we have defined the peer on the primary and provided the 
authentication credentials, we need to configure
+our table with the implementation of ReplicaSystem we want to use to replicate 
to the peer. In this case, our peer 
+is an Accumulo instance, so we want to use the AccumuloReplicaSystem.
+
+The configuration for the AccumuloReplicaSystem is the table ID for the table 
on the peer instance that we
+want to replicate into. Be sure to use the correct value for $peer_table_id. 
The configuration key is of
+the form "table.replication.target.$peer_name".
+
+----
+root@primary> config -t my_table -s 
table.replication.target.peer=$peer_table_id
+----
+
+Finally, we can enable replication on this table.
+
+----
+root@primary> config -t my_table -s table.replication=true
+----
+
+=== Extra considerations for use
+
+While this feature is intended for general-purpose use, its implementation 
does carry some baggage. Like any software,
+replication is a feature that operates well within some set of use cases but 
is not meant to support all use cases.
+For the benefit of the users, we can enumerate these cases.
+
+==== Latency
+
+As previously mentioned, the replication feature uses the Write-Ahead Log 
files for a number of reasons, one of which
+is to prevent the need for data to be written to RFiles before it is available 
to be replicated. While this can help
+reduce the latency for a batch of Mutations that have been written to 
Accumulo, the latency is at least seconds to tens
+of seconds for replication once ingest is active. For a table which 
replication has just been enabled on, this is likely
+to take a few minutes before replication will begin.
+
+Once ingest is active and flowing into the system at a regular rate, 
replication should be occurring at a similar rate, 
+given sufficient computing resources. Replication attempts to copy data at a 
rate that is to be considered low latency
+but is not a replacement for custom indexing code which can ensure near 
real-time referential integrity on secondary indexes.
+
+==== Table-Configured Iterators
+
+Accumulo Iterators tend to be a heavy hammer which can be used to solve a 
variety of problems. In general, it is highly
+recommended that Iterators which are applied at major compaction time are both 
idempotent and associative due to the
+non-determinism in which some set of files for a Tablet might be compacted. In 
practice, this translates to common patterns,
+such as aggregation, which are implemented in a manner resilient to 
duplication (such as using a Set instead of a List).
+
+Due to the asynchronous nature of replication and the expectation that 
hardware failures and network partitions will exist,
+it is generally not recommended to not configure replication on a table which 
has Iterators set which are not idempotent.
+While the replication implementation can make some simple assertions to try to 
avoid re-replication of data, it is not
+presently guaranteed that all data will only be sent to a peer once. Data will 
be replicated at least once. Typically,
+this is not a problem as the VersioningIterator will automaticaly deduplicate 
this over-replication because they will
+have the same timestamp; however, certain Combiners may result in inaccurate 
aggregations.
+
+As a concrete example, consider a table which has the SummingCombiner 
configured to sum all values for
+multiple versions of the same Key. For some key, consider a set of numeric 
values that are written to a table on the
+primary: [1, 2, 3]. On the primary, all of these are successfully written and 
thus the current value for the given key
+would be 6, (1 + 2 + 3). Consider, however, that each of these updates to the 
peer were done independently (because
+other data was also included in the write-ahead log that needed to be 
replicated). The update with a value of 1 was
+successfully replicated, and then we attempted to replicate the update with a 
value of 2 but the remote server never
+responded. The primary does not know whether the update with a value of 2 was 
actually applied or not, so the
+only recourse is to re-send the update. After we receive confirmation that the 
update with a value of 2 was replicated,
+we will then replicate the update with 3. If the peer did never apply the 
first update of '2', the summation is accurate.
+If the update was applied but the acknowledgement was lost for some reason 
(system failure, network partition), the
+update will be resent to the peer. Because addition is non-idempotent, we have 
created an inconsistency between the
+primary and peer. As such, the SummingCombiner wouldn't be recommended on a 
table being replicated.
+
+While there are changes that could be made to the replication implementation 
which could attempt to mitigate this risk,
+presently, it is not recommended to configure Iterators or Combiners which are 
not idempotent to support cases where
+inaccuracy of aggregations is not acceptable.
+
+==== Duplicate Keys
+
+In Accumulo, when more than one key exists that are exactly the same, keys 
that are equal down to the timestamp,
+the retained value is non-deterministic. Replication introduces another level 
of non-determinism in this case.
+For a table that is being replicated and has multiple equal keys with 
different values inserted into it, the final
+value in that table on the primary instance is not guaranteed to be the final 
value on all replicas.
+
+For example, say the values that were inserted on the primary instance were 
+value1+ and +value2+ and the final
+value was +value1+, it is not guaranteed that all replicas will have +value1+ 
like the primary. The final value is
+non-deterministic for each instance.
+
+As is the recommendation without replication enabled, if multiple values for 
the same key (sans timestamp) are written to
+Accumulo, it is strongly recommended that the value in the timestamp properly 
reflects the intended version by
+the client. That is to say, newer values inserted into the table should have 
larger timestamps. If the time between
+writing updates to the same key is significant (order minutes), this concern 
can likely be ignored.
+
+==== Bulk Imports
+
+Currently, files that are bulk imported into a table configured for 
replication are not replicated. There is no
+technical reason why it was not implemented, it was simply omitted from the 
initial implementation. This is considered a
+fair limitation because bulk importing generated files multiple locations is 
much simpler than bifurcating "live" ingest
+data into two instances. Given some existing bulk import process which creates 
files and them imports them into an
+Accumulo instance, it is trivial to copy those files to a new HDFS instance 
and import them into another Accumulo
+instance using the same process. Hadoop's +distcp+ command provides an easy 
way to copy large amounts of data to another
+HDFS instance which makes the problem of duplicating bulk imports very easy to 
solve.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/sampling.md
----------------------------------------------------------------------
diff --git a/docs/master/sampling.md b/docs/master/sampling.md
new file mode 100644
index 0000000..237f43c
--- /dev/null
+++ b/docs/master/sampling.md
@@ -0,0 +1,86 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Sampling
+
+=== Overview
+
+Accumulo has the ability to generate and scan a per table set of sample data.
+This sample data is kept up to date as a table is mutated.  What key values are
+placed in the sample data is configurable per table.
+
+This feature can be used for query estimation and optimization.  For an example
+of estimation assume an Accumulo table is configured to generate a sample
+containing one millionth of a tables data.   If a query is executed against the
+sample and returns one thousand results, then the same query against all the
+data would probably return a billion results.  A nice property of having
+Accumulo generate the sample is that its always up to date.  So estimations
+will be accurate even when querying the most recently written data.
+
+An example of a query optimization is an iterator using sample data to get an
+estimate, and then making decisions based on the estimate.
+
+=== Configuring
+
+Inorder to use sampling, an Accumulo table must be configured with a class that
+implements +org.apache.accumulo.core.sample.Sampler+ along with options for
+that class.  For guidance on implementing a Sampler see that interface's
+javadoc.  Accumulo provides a few implementations out of the box.   For
+information on how to use the samplers that ship with Accumulo look in the
+package `org.apache.accumulo.core.sample` and consult the javadoc of the
+classes there.  See the 
https://github.com/apache/accumulo-examples/blob/master/docs/sample.md[sampling 
example]
+for examples of how to configure a Sampler on a table.
+
+Once a table is configured with a sampler all writes after that point will
+generate sample data.  For data written before sampling was configured sample
+data will not be present.  A compaction can be initiated that only compacts the
+files in the table that do not have sample data.   The example readme shows how
+to do this.
+
+If the sampling configuration of a table is changed, then Accumulo will start
+generating new sample data with the new configuration.   However old data will
+still have sample data generated with the previous configuration.  A selective
+compaction can also be issued in this case to regenerate the sample data.
+
+=== Scanning sample data
+
+Inorder to scan sample data, use the +setSamplerConfiguration(...)+  method on
++Scanner+ or +BatchScanner+.  Please consult this methods javadocs for more
+information.
+
+Sample data can also be scanned from within an Accumulo 
+SortedKeyValueIterator+.
+To see how to do this, look at the example iterator referenced in the
+https://github.com/apache/accumulo-examples/blob/master/docs/sample.md[sampling
 example].
+Also, consult the javadoc on 
+org.apache.accumulo.core.iterators.IteratorEnvironment.cloneWithSamplingEnabled()+.
+
+Map reduce jobs using the +AccumuloInputFormat+ can also read sample data.  See
+the javadoc for the +setSamplerConfiguration()+ method on
++AccumuloInputFormat+.
+
+Scans over sample data will throw a +SampleNotPresentException+ in the 
following cases :
+
+. sample data is not present,
+. sample data is present but was generated with multiple configurations
+. sample data is partially present
+
+So a scan over sample data can only succeed if all data written has sample data
+generated with the same configuration.
+
+=== Bulk import
+
+When generating rfiles to bulk import into Accumulo, those rfiles can contain
+sample data.  To use this feature, look at the javadoc on the
++AccumuloFileOutputFormat.setSampler(...)+ method.
+

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/security.md
----------------------------------------------------------------------
diff --git a/docs/master/security.md b/docs/master/security.md
new file mode 100644
index 0000000..4a57f8a
--- /dev/null
+++ b/docs/master/security.md
@@ -0,0 +1,182 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Security
+
+Accumulo extends the BigTable data model to implement a security mechanism
+known as cell-level security. Every key-value pair has its own security label, 
stored
+under the column visibility element of the key, which is used to determine 
whether
+a given user meets the security requirements to read the value. This enables 
data of
+various security levels to be stored within the same row, and users of varying
+degrees of access to query the same table, while preserving data 
confidentiality.
+
+=== Security Label Expressions
+
+When mutations are applied, users can specify a security label for each value. 
This is
+done as the Mutation is created by passing a ColumnVisibility object to the 
put()
+method:
+
+[source,java]
+----
+Text rowID = new Text("row1");
+Text colFam = new Text("myColFam");
+Text colQual = new Text("myColQual");
+ColumnVisibility colVis = new ColumnVisibility("public");
+long timestamp = System.currentTimeMillis();
+
+Value value = new Value("myValue");
+
+Mutation mutation = new Mutation(rowID);
+mutation.put(colFam, colQual, colVis, timestamp, value);
+----
+
+=== Security Label Expression Syntax
+
+Security labels consist of a set of user-defined tokens that are required to 
read the
+value the label is associated with. The set of tokens required can be 
specified using
+syntax that supports logical AND +&+ and OR +|+ combinations of terms, as
+well as nesting groups +()+ of terms together.
+
+Each term is comprised of one to many alpha-numeric characters, hyphens, 
underscores or
+periods. Optionally, each term may be wrapped in quotation marks
+which removes the restriction on valid characters. In quoted terms, quotation 
marks
+and backslash characters can be used as characters in the term by escaping them
+with a backslash.
+
+For example, suppose within our organization we want to label our data values 
with
+security labels defined in terms of user roles. We might have tokens such as:
+
+  admin
+  audit
+  system
+
+These can be specified alone or combined using logical operators:
+
+----
+// Users must have admin privileges
+admin
+
+// Users must have admin and audit privileges
+admin&audit
+
+// Users with either admin or audit privileges
+admin|audit
+
+// Users must have audit and one or both of admin or system
+(admin|system)&audit
+----
+
+When both +|+ and +&+ operators are used, parentheses must be used to specify
+precedence of the operators.
+
+=== Authorization
+
+When clients attempt to read data from Accumulo, any security labels present 
are
+examined against the set of authorizations passed by the client code when the
+Scanner or BatchScanner are created. If the authorizations are determined to be
+insufficient to satisfy the security label, the value is suppressed from the 
set of
+results sent back to the client.
+
+Authorizations are specified as a comma-separated list of tokens the user 
possesses:
+
+[source,java]
+----
+// user possesses both admin and system level access
+Authorization auths = new Authorization("admin","system");
+
+Scanner s = connector.createScanner("table", auths);
+----
+
+=== User Authorizations
+
+Each Accumulo user has a set of associated security labels. To manipulate
+these in the shell while using the default authorizor, use the setuaths and 
getauths commands.
+These may also be modified for the default authorizor using the java security 
operations API.
+
+When a user creates a scanner a set of Authorizations is passed. If the
+authorizations passed to the scanner are not a subset of the users
+authorizations, then an exception will be thrown.
+
+To prevent users from writing data they can not read, add the visibility
+constraint to a table. Use the -evc option in the createtable shell command to
+enable this constraint. For existing tables use the following shell command to
+enable the visibility constraint. Ensure the constraint number does not
+conflict with any existing constraints.
+
+  config -t table -s 
table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint
+
+Any user with the alter table permission can add or remove this constraint.
+This constraint is not applied to bulk imported data, if this a concern then
+disable the bulk import permission.
+
+=== Pluggable Security
+
+New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken 
into three actions --
+authentication, authorization, and permission handling. By default all of 
these are handled in
+Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is 
worth noting at this
+point, that it is a new feature in 1.5 and may be adjusted in future releases 
without the standard
+deprecation cycle.
+
+Authentication simply handles the ability for a user to verify their 
integrity. A combination of
+principal and authentication token are used to verify a user is who they say 
they are. An
+authentication token should be constructed, either directly through its 
constructor, but it is
+advised to use the +init(Property)+ method to populate an authentication 
token. It is expected that a
+user knows what the appropriate token to use for their system is. The default 
token is
++PasswordToken+.
+
+Once a user is authenticated by the Authenticator, the user has access to the 
other actions within
+Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by 
the Permission
+Handler. This is what manages all of the permissions, which are divided in 
system and per table
+level. From there, if a user is doing an action which requires authorizations, 
the Authorizor is
+queried to determine what authorizations the user has.
+
+This setup allows a variety of different mechanisms to be used for handling 
different aspects of
+Accumulo's security. A system like Kerberos can be used for authentication, 
then a system like LDAP
+could be used to determine if a user has a specific permission, and then it 
may default back to the
+default ZookeeperAuthorizor to determine what Authorizations a user is 
ultimately allowed to use.
+This is a pluggable system so custom components can be created depending on 
your need.
+
+=== Secure Authorizations Handling
+
+For applications serving many users, it is not expected that an Accumulo user
+will be created for each application user. In this case an Accumulo user with
+all authorizations needed by any of the applications users must be created. To
+service queries, the application should create a scanner with the application
+user's authorizations. These authorizations could be obtained from a trusted 
3rd
+party.
+
+Often production systems will integrate with Public-Key Infrastructure (PKI) 
and
+designate client code within the query layer to negotiate with PKI servers in 
order
+to authenticate users and retrieve their authorization tokens (credentials). 
This
+requires users to specify only the information necessary to authenticate 
themselves
+to the system. Once user identity is established, their credentials can be 
accessed by
+the client code and passed to Accumulo outside of the reach of the user.
+
+=== Query Services Layer
+
+Since the primary method of interaction with Accumulo is through the Java API,
+production environments often call for the implementation of a Query layer. 
This
+can be done using web services in containers such as Apache Tomcat, but is not 
a
+requirement. The Query Services Layer provides a mechanism for providing a
+platform on which user facing applications can be built. This allows the 
application
+designers to isolate potentially complex query logic, and enables a convenient 
point
+at which to perform essential security functions.
+
+Several production environments choose to implement authentication at this 
layer,
+where users identifiers are used to retrieve their access credentials which 
are then
+cached within the query layer and presented to Accumulo through the
+Authorizations mechanism.
+
+Typically, the query services layer sits between Accumulo and user 
workstations.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/32ac61d3/docs/master/shell.md
----------------------------------------------------------------------
diff --git a/docs/master/shell.md b/docs/master/shell.md
new file mode 100644
index 0000000..4d53e3d
--- /dev/null
+++ b/docs/master/shell.md
@@ -0,0 +1,159 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+== Accumulo Shell
+Accumulo provides a simple shell that can be used to examine the contents and
+configuration settings of tables, insert/update/delete values, and change
+configuration settings.
+
+The shell can be started by the following command:
+
+  accumulo shell -u [username]
+
+The shell will prompt for the corresponding password to the username specified
+and then display the following prompt:
+
+  Shell - Apache Accumulo Interactive Shell
+  -
+  - version: 2.x.x
+  - instance name: myinstance
+  - instance id: 00000000-0000-0000-0000-000000000000
+  -
+  - type 'help' for a list of available commands
+  -
+  root@myinstance>
+
+=== Basic Administration
+
+The Accumulo shell can be used to create and delete tables, as well as to 
configure
+table and instance specific options.
+
+----
+root@myinstance> tables
+accumulo.metadata
+accumulo.root
+
+root@myinstance> createtable mytable
+
+root@myinstance mytable>
+
+root@myinstance mytable> tables
+accumulo.metadata
+accumulo.root
+mytable
+
+root@myinstance mytable> createtable testtable
+
+root@myinstance testtable>
+
+root@myinstance testtable> deletetable testtable
+deletetable { testtable } (yes|no)? yes
+Table: [testtable] has been deleted.
+
+root@myinstance>
+----
+
+The Shell can also be used to insert updates and scan tables. This is useful 
for
+inspecting tables.
+
+----
+root@myinstance mytable> scan
+
+root@myinstance mytable> insert row1 colf colq value1
+insert successful
+
+root@myinstance mytable> scan
+row1 colf:colq [] value1
+----
+
+The value in brackets ``[]'' would be the visibility labels. Since none were 
used, this is empty for this row.
+You can use the +-st+ option to scan to see the timestamp for the cell, too.
+
+=== Table Maintenance
+
+The *compact* command instructs Accumulo to schedule a compaction of the table 
during which
+files are consolidated and deleted entries are removed.
+
+  root@myinstance mytable> compact -t mytable
+  07 16:13:53,201 [shell.Shell] INFO : Compaction of table mytable started for 
given range
+
+The *flush* command instructs Accumulo to write all entries currently in 
memory for a given table
+to disk.
+
+  root@myinstance mytable> flush -t mytable
+  07 16:14:19,351 [shell.Shell] INFO : Flush of table mytable
+  initiated...
+
+=== User Administration
+
+The Shell can be used to add, remove, and grant privileges to users.
+
+----
+root@myinstance mytable> createuser bob
+Enter new password for 'bob': *********
+Please confirm new password for 'bob': *********
+
+root@myinstance mytable> authenticate bob
+Enter current password for 'bob': *********
+Valid
+
+root@myinstance mytable> grant System.CREATE_TABLE -s -u bob
+
+root@myinstance mytable> user bob
+Enter current password for 'bob': *********
+
+bob@myinstance mytable> userpermissions
+System permissions: System.CREATE_TABLE
+Table permissions (accumulo.metadata): Table.READ
+Table permissions (mytable): NONE
+
+bob@myinstance mytable> createtable bobstable
+
+bob@myinstance bobstable>
+
+bob@myinstance bobstable> user root
+Enter current password for 'root': *********
+
+root@myinstance bobstable> revoke System.CREATE_TABLE -s -u bob
+----
+
+=== JSR-223 Support in the Shell
+
+The script command can be used to invoke programs written in languages 
supported by installed JSR-223
+engines. You can get a list of installed engines with the -l argument. Below 
is an example of the output
+of the command when running the Shell with Java 7.
+
+----
+root@fake> script -l
+    Engine Alias: ECMAScript
+    Engine Alias: JavaScript
+    Engine Alias: ecmascript
+    Engine Alias: javascript
+    Engine Alias: js
+    Engine Alias: rhino
+    Language: ECMAScript (1.8)
+    Script Engine: Mozilla Rhino (1.7 release 3 PRERELEASE)
+ScriptEngineFactory Info
+----
+
+ A list of compatible languages can be found at 
https://en.wikipedia.org/wiki/List_of_JVM_languages. The
+rhino javascript engine is provided with the JVM. Typically putting a jar on 
the classpath is all that is
+needed to install a new engine.
+
+ When writing scripts to run in the shell, you will have a variable called 
connection already available
+to you. This variable is a reference to an Accumulo Connector object, the same 
connection that the Shell
+is using to communicate with the Accumulo servers. At this point you can use 
any of the public API methods
+within your script. Reference the script command help to see all of the 
execution options. Script and script
+invocation examples can be found in ACCUMULO-1399.

[03/16] accumulo-website git commit: ACCUMULO-4630 Copied docs from Accumulo repo

Reply via email to