1.5.2.html

buildbot Fri, 19 Sep 2014 14:12:06 -0700

Author: buildbot
Date: Fri Sep 19 21:10:53 2014
New Revision: 922883

Log:
Staging update by buildbot for accumulo


Modified:
    websites/staging/accumulo/trunk/content/   (props changed)
    websites/staging/accumulo/trunk/content/release_notes/1.5.2.html

Propchange: websites/staging/accumulo/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Sep 19 21:10:53 2014
@@ -1 +1 @@
-1626327
+1626335

Modified: websites/staging/accumulo/trunk/content/release_notes/1.5.2.html
==============================================================================
--- websites/staging/accumulo/trunk/content/release_notes/1.5.2.html (original)
+++ websites/staging/accumulo/trunk/content/release_notes/1.5.2.html Fri Sep 19 
21:10:53 2014
@@ -204,16 +204,48 @@ to benefit from the improvements.</p>
 to the 1.5 line as development has already shifted towards the 1.6 line. For 
those
 who cannot or do not want to upgrade to 1.6, 1.5.2 is still an excellent choice
 over earlier versions in the 1.5 line.</p>
-<h2 id="notable-improvements">Notable Improvements</h2>
-<p>While new features are typically not added in a bug-fix release as 1.5.2, 
the
-community does create a variety of improvements that are API compatible. 
Contained
-here are some of the more notable improvements.</p>
-<h3 id="performance-improvements">Performance improvements</h3>
+<h2 id="performance-improvements">Performance Improvements</h2>
+<p>Apache Accumulo 1.5.2 includes a number of performance-related fixes over 
previous versions.</p>
+<h3 id="write-ahead-log-sync-performance">Write-Ahead Log sync performance</h3>
 <p>The Write-Ahead Log (WAL) files are used to ensure durability of updates 
made to Accumulo.
 A "sync" is called on the file in HDFS to make sure that the changes to the 
WAL are persisted
 to disk, which allows Accumulo to recover in the case of failure. <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-2766";>ACCUMULO-2766</a> 
fixed
 an issue where an operation against a WAL would unnecessarily wait for 
multiple syncs, slowing
 down the ingest on the system.</p>
+<h3 id="minor-compactions-not-aggressive-enough">Minor-Compactions not 
aggressive enough</h3>
+<p>On a system with ample memory provided to Accumulo, long hold-times were 
observed which
+blocks the ingest of new updates. Trying to free more server-side memory by 
running minor
+compactions more frequently increased the overall throughput on the node. 
These changes
+were made in <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-2905";>ACCUMULO-2905</a>.</p>
+<h3 id="heapiterator-optimization">HeapIterator optimization</h3>
+<p>Iterators, a notable feature of Accumulo, are provided to users as a 
server-side programming
+construct, but are also used internally for numerous server operations. One of 
these system iterator 
+is the HeapIterator which implements a PriorityQueue of other Iterators. One 
way this iterator is
+used is to merge multiple files in HDFS to present a single, sorted stream of 
Key-Value pairs. <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-2827";>ACCUMULO-2827</a>
+introduces a performance optimization to the HeapIterator which can improve 
the speed of the
+HeapIterator in common cases.</p>
+<h3 id="write-ahead-log-sync-implementation">Write-Ahead log sync 
implementation</h3>
+<p>In Hadoop-2, two implementation of "sync" are provider: hflush and hsync. 
Both of these
+methods provide a way to request that the datanodes write the data to the 
underlying
+medium and not just hold it in memory (the 'fsync' syscall). While both of 
these methods
+inform the Datanodes to sync the relevant block(s), hflush does not wait for 
acknowledgement
+from the Datanodes that the sync finished, where hsync does. To provide the 
most reliable system
+"out of the box", Accumulo defaults to hsync so that your data is as secure as 
possible in 
+a variety of situations (notably, unexpected power outages).</p>
+<p>The downside is that performance tends to suffer because waiting for a sync 
to disk is a very
+expensive operation. <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-2842";>ACCUMULO-2842</a> 
introduces a new system property, tserver.wal.sync.method,
+that lets users to change the HDFS sync implementation from 'hsync' to 
'hflush'. Using 'hflush' instead
+of 'hsync' should result in about a 30% increase in ingest performance.</p>
+<p>For users upgrading from Hadoop-1 or Hadoop-0.20 releases, "hflush" is the 
equivalent of how
+sync was implemented and should give equivalent performance.</p>
+<h3 id="server-side-mutation-queue-size">Server-side mutation queue size</h3>
+<p>When users desire writes to be as durable as possible, using 'hsync', the 
ingest performance
+of the system can be improved by increasing the tserver.mutation.queue.max 
property. The cost
+of this change is that it will cause TabletServers to use additional memory 
per writer. In 1.5.1,
+the value of this parameter defaulted to a conservative 256K, which resulted 
in sub-par ingest
+performance.</p>
+<p>1.5.2 and <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-3018";>ACCUMULO-3018</a> 
increases this buffer to 1M which has a noticeable impact on
+ingest performance with a minimal increase in TabletServer memory usage.</p>
 <h2 id="notable-bug-fixes">Notable Bug Fixes</h2>
 <h3 id="fixes-mapreduce-package-name-change">Fixes MapReduce package name 
change</h3>
 <p>1.5.1 inadvertently included a change to RangeInputSplit which created an 
incompatibility
@@ -240,6 +272,11 @@ never returns. Most of these are related
 <p>The Writable interface methods on the RangeInputSplit class accidentally 
omitted
 calls to serialize the IteratorSettings configured for the Job. <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-2962";>ACCUMULO-2962</a>
 fixes the serialization and adds some additional tests.</p>
+<h3 id="constraint-violation-causes-hung-scans">Constraint violation causes 
hung scans</h3>
+<p>A failed bulk import transaction had the ability to create an infinitely 
retrying
+loop due to a constraint violation. This directly prevents scans from 
completing,
+but will also hang compactions. <a 
href="https://issues.apache.org/jira/browse/ACCUMULO-3096";>ACCUMULO-3096</a> 
fixes the issue so that the
+constraint no longer hangs the entire system.</p>
 <h2 id="documentation">Documentation</h2>
 <p>The following documentation updates were made: </p>
 <ul>

svn commit: r922883 - in /websites/staging/accumulo/trunk/content: ./ release_notes/1.5.2.html

Reply via email to