Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT

Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/0d874d05
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/0d874d05
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/0d874d05

Branch: refs/heads/1.6.0-SNAPSHOT
Commit: 0d874d05a67950d9d5d9bec96de22e71f946b62c
Parents: 8f92585 e9423ae
Author: Eric Newton <eric.new...@gmail.com>
Authored: Wed Dec 11 09:15:55 2013 -0500
Committer: Eric Newton <eric.new...@gmail.com>
Committed: Wed Dec 11 09:15:55 2013 -0500

----------------------------------------------------------------------
 .../chapters/administration.tex                 | 32 ++++++++++++++++++++
 .../src/main/resources/docs/administration.html | 23 ++++++++++++++
 2 files changed, 55 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/0d874d05/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
----------------------------------------------------------------------

http://git-wip-us.apache.org/repos/asf/accumulo/blob/0d874d05/server/monitor/src/main/resources/docs/administration.html
----------------------------------------------------------------------
diff --cc server/monitor/src/main/resources/docs/administration.html
index 96f4a8b,0000000..51b1c31
mode 100644,000000..100644
--- a/server/monitor/src/main/resources/docs/administration.html
+++ b/server/monitor/src/main/resources/docs/administration.html
@@@ -1,148 -1,0 +1,171 @@@
 +<!--
 +  Licensed to the Apache Software Foundation (ASF) under one or more
 +  contributor license agreements.  See the NOTICE file distributed with
 +  this work for additional information regarding copyright ownership.
 +  The ASF licenses this file to You under the Apache License, Version 2.0
 +  (the "License"); you may not use this file except in compliance with
 +  the License.  You may obtain a copy of the License at
 +
 +      http://www.apache.org/licenses/LICENSE-2.0
 +
 +  Unless required by applicable law or agreed to in writing, software
 +  distributed under the License is distributed on an "AS IS" BASIS,
 +  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 +  See the License for the specific language governing permissions and
 +  limitations under the License.
 +-->
 +<html>
 +<head>
 +<title>Accumulo Administration</title>
 +<link rel='stylesheet' type='text/css' href='documentation.css' 
media='screen'/>
 +</head>
 +<body>
 +
 +<h1>Apache Accumulo Documentation : Administration</h1>
 +
 +<h3>Starting accumulo for the first time</h3>
 +
 +<p>For the most part, accumulo is ready to go out of the box. To start it, 
first you must distribute and install
 +the accumulo software to each machine in the cloud that you wish to run on. 
The software should be installed
 +in the same directory on each machine and configured identically (or at least 
similarly... see the configuration
 +sections for more details). Select one machine to be your bootstrap machine, 
the one that you will start accumulo
 +with. Note that you must have passphrase-less ssh access to each machine from 
your bootstrap machine. On this machine,
 +create a conf/masters and conf/slaves file. In the masters file, type the 
hostname of the machine you wish to run the master on (probably localhost).
 +In the slaves file, type the hostnames, separated by newlines of each machine 
you wish to participate in accumulo as a tablet server. If you neglect
 +to create these files, the startup scripts will assume you are trying to run 
on localhost only, and will instantiate a single-node instance only.
 +It is probably a good idea to back up these files, or distribute them to the 
other nodes as well, so that you can easily boot up accumulo
 +from another machine, if necessary. You can also make create a 
<code>conf/accumulo-env.sh</code> file if you want to configure any custom 
environment variables.
 +
 +<p>Once properly configured, you can initialize or prepare an instance of 
accumulo by running: <code>bin/accumulo&nbsp;init</code><br />
 +Follow the prompts and you are ready to go. This step only prepares accumulo 
to run, it does not start up accumulo.
 +
 +<h3>Starting accumulo</h3>
 +
 +<p>Once you have configured accumulo to your liking, and distributed the 
appropriate configuration to each machine, you can start accumulo with
 +bin/start-all.sh. If at any time, you wish to bring accumulo servers online 
after one or more have been shutdown, you can run bin/start-all.sh again.
 +This step will only start services that are not already running. Be aware 
that if you run this command on more than one machine, you may unintentionally
 +start an extra copy of the garbage collector service and the monitoring 
service, since each of these will run on the server on which you run this 
script.
 +
 +<h3>Stopping accumulo</h3>
 +
 +<p>Similar to the start-all.sh script, we provide a bin/stop-all.sh script to 
shut down accumulo. This will prompt for the root password so that it can
 +ask the master to shut down the tablet servers gracefully. If the tablet 
servers do not respond, or the master takes too long, you can force a shutdown 
by hitting Ctrl-C
 +at the password prompt, and waiting 15 seconds for the script to force a 
shutdown. Normally, once the shutdown happens gracefully, unresponsive tablet 
servers are
 +forcibly shut down after 5 seconds.
 +
++<h3>Adding a Node</h3>
++
++<p>Update your <code>$ACCUMULO_HOME/conf/slaves</code> (or 
<code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the addition; at a 
minimum this needs to be on the host(s) being added, but in practice it's good 
to ensure consistent configuration across all nodes.</p>
++
++<pre>
++$ACCUMULO_HOME/bin/accumulo admin start &gt;host(s)&gt; {&lt;host&gt; ...}
++</pre>
++
++<p>Alternatively, you can ssh to each of the hosts you want to add and run 
<code>$ACCUMULO_HOME/bin/start-here.sh</code>.</p>
++
++<p>Make sure the host in question has the new configuration, or else the 
tablet server won't start.</p>
++
++<h3>Decomissioning a Node</h3>
++
++<p>If you need to take a node out of operation, you can trigger a graceful 
shutdown of a tablet server. Accumulo will automatically rebalance the tablets 
across the available tablet servers.</p>
++
++<pre>
++$ACCUMULO_HOME/bin/accumulo admin stop &gt;host(s)&gt; {&lt;host&gt; ...}
++</pre>
++
++<p>Alternatively, you can ssh to each of the hosts you want to remove and run 
<code>$ACCUMULO_HOME/bin/stop-here.sh</code>.</p>
++
++<p>Be sure to update your <code>$ACCUMULO_HOME/conf/slaves</code> (or 
<code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the removal of 
these hosts. Bear in mind that the monitor will not re-read the slaves file 
automatically, so it will report the decomissioned servers as down; it's 
recommended that you restart the monitor so that the node list is up to 
date.</p>
 +
 +<h3>Configuration</h3>
 +<p>Accumulo configuration information is stored in a xml file and ZooKeeper.  
System wide
 +configuration information is stored in accumulo-site.xml. In order for 
accumulo to
 +find this file its directory must be on the classpath.  Accumulo will log a 
warning if it can not find 
 +it, and will use built-in default values. The accumulo scripts try to put the 
config directory on the classpath.  
 +
 +<p>Starting with version 1.0, per-table configuration was
 +introduced. This information is stored in ZooKeeper. This information
 +can be manipulated using the config command in the accumulo
 +shell. ZooKeeper will notify all tablet servers when config properties
 +are modified. This makes it possible to change major compaction
 +settings, for example, for a table while accumulo is running.
 +
 +<p>Per-table configuration settings override system settings. 
 +
 +<p>See the possible configuration options and their default values <a 
href='config.html'>here</a>
 +
 +<h3>Managing system resources</h3>
 +
 +<p>It is very important how disk and memory usage are allocated across the 
cluster and how servers processes are allocated across the cluster. 
 +
 +<ul>
 + <li> On larger clusters, run the namenode, secondary namenode, jobtracker, 
accumulo master, and zookeepers on dedicated nodes.  On a smaller cluster you 
may want to run all master processes on one node.  When doing this ensure that 
the max total memory that could be used by all master processes does not exceed 
system memory.  Swapping on your single master node would not be good.
 + <li> Accumulo 1.2 and earlier rely on zookeeper but do not use it heavily.  
On a large cluster setting up 3 or 5 zookeepers should be plenty.  Since there 
is no performance gain when running more zookeepers, fault tolerance is the 
only benefit.
 + <li> On slave nodes ensure the memory used by all slave processes is less 
than system memory.  For example the following slave node config could use up 
to 38G of RAM : tablet server 3G, logger 1G, data node 2G, up to 10 mappers 
each using 2G, and up 6 reducers each using 2G.  If the slave nodes only have 
32G, then using 38G will result in swapping which could cause tablet server to 
lose their lock in zookeeper and die.  Even if swapping does not cause tablet 
servers to die, it will kill performance.
 + <li>Accumulo and map reduce will work with less memory, but it has an 
impact.  Accumulo will minor compact more frequently when it has less map 
memory, resulting in more major compactions.  The minor and major compactions 
both use CPU and HDFS I/O.   The same goes for map reduce, the less memory you 
give it, the more it has to sort and spill.  Try to minimize spilling and 
compactions as much as possible without causing swapping.
 + <li>Accumulo writes data to disk before it sorts it in memory.  This allows 
data that was in memory when a tablet server crashes to be recovered.  Each 
slave node needs a local directory to write this data to.  Ensure the file 
system holding this directory has at least 100G free on all nodes.  Also, if 
this directory is in a filesystem used by map reduce or hdfs they may effect 
each others performance.
 +</ul>
 +
 +<p>There are a few settings that determine how much memory accumulo tablet
 +servers use.  In accumulo-env.sh there is a setting called
 +ACCUMULO_TSERVER_OPTS.  By default this is set to something like "-Xmx512m
 +-Xms512m".  These are Java jvm options asking Java to use 512 megabytes of
 +memory.  By default accumulo stores data written to it outside of the Java
 +memory space in order to avoid pauses caused by the Java garbage collector.  
The
 +amount of memory it uses for this data is determined by the accumulo setting
 +"tserver.memory.maps.max".  Since this memory is outside of the Java managed
 +memory, the process can grow larger than the -Xmx setting.  So if -Xmx is set
 +to 512M and tserver.memory.maps.max is set to 1G, a tablet server process can
 +be expected to use 1.5G.  If tserver.memory.maps.native.enabled is set to
 +false, then accumulo will only use memory managed by Java and the process will
 +not use more than what -Xmx is set to.  In this case the
 +tserver.memory.maps.max setting should be 75% of the -Xmx setting. 
 +
 +<h3>Swappiness</h3>
 +
 +<p>The linux kernel will swap out memory of running programs to increase
 +the size of the disk buffers.  This tendency to swap out is controlled by
 +a kernel setting called "swappiness."  This behavior does not work well for
 +large java servers.  When a java process runs a garbage collection, it touches
 +lots of pages forcing all swapped out pages back into memory.  It is suggested
 +that swappiness be set to zero.
 +
 +<pre>
 + # sysctl -w vm.swappiness=0
 + # echo "vm.swappiness = 0" &gt;&gt; /etc/sysctl.conf
 +</pre>
 +
 +<h3>Hadoop timeouts</h3>
 +
 +<p>In order to detect failed datanodes, use shorter timeouts.  Add the 
following to your
 +hdfs-site.xml file:
 +
 +<pre>
 +
 +  &lt;property&gt;
 +    &lt;name&gt;dfs.socket.timeout&lt;/name&gt;
 +    &lt;value&gt;3000&lt;/value&gt;
 +  &lt;/property&gt;
 +
 +  &lt;property&gt;
 +    &lt;name&gt;dfs.socket.write.timeout&lt;/name&gt;
 +    &lt;value&gt;5000&lt;/value&gt;
 +  &lt;/property&gt;
 +
 +  &lt;property&gt;
 +    &lt;name&gt;ipc.client.connect.timeout&lt;/name&gt;
 +    &lt;value&gt;1000&lt;/value&gt;
 +  &lt;/property&gt;
 +
 +  &lt;property&gt;
 +    &lt;name&gt;ipc.clident.connect.max.retries.on.timeouts&lt;/name&gt;
 +    &lt;value&gt;2&lt;/value&gt;
 +  &lt;/property&gt;
 +
 +
 +
 +</pre>
 +
 +
 +</body>
 +</html>

Reply via email to