Author: elserj Date: Fri Sep 19 21:10:47 2014 New Revision: 1626335 URL: http://svn.apache.org/r1626335 Log: More 1.5.2 release note additions
Modified: accumulo/site/trunk/content/release_notes/1.5.2.mdtext Modified: accumulo/site/trunk/content/release_notes/1.5.2.mdtext URL: http://svn.apache.org/viewvc/accumulo/site/trunk/content/release_notes/1.5.2.mdtext?rev=1626335&r1=1626334&r2=1626335&view=diff ============================================================================== --- accumulo/site/trunk/content/release_notes/1.5.2.mdtext (original) +++ accumulo/site/trunk/content/release_notes/1.5.2.mdtext Fri Sep 19 21:10:47 2014 @@ -31,14 +31,12 @@ who cannot or do not want to upgrade to over earlier versions in the 1.5 line. -## Notable Improvements +## Performance Improvements -While new features are typically not added in a bug-fix release as 1.5.2, the -community does create a variety of improvements that are API compatible. Contained -here are some of the more notable improvements. +Apache Accumulo 1.5.2 includes a number of performance-related fixes over previous versions. -### Performance improvements +### Write-Ahead Log sync performance The Write-Ahead Log (WAL) files are used to ensure durability of updates made to Accumulo. A "sync" is called on the file in HDFS to make sure that the changes to the WAL are persisted @@ -46,6 +44,50 @@ to disk, which allows Accumulo to recove an issue where an operation against a WAL would unnecessarily wait for multiple syncs, slowing down the ingest on the system. +### Minor-Compactions not aggressive enough + +On a system with ample memory provided to Accumulo, long hold-times were observed which +blocks the ingest of new updates. Trying to free more server-side memory by running minor +compactions more frequently increased the overall throughput on the node. These changes +were made in [ACCUMULO-2905][10]. + +### HeapIterator optimization + +Iterators, a notable feature of Accumulo, are provided to users as a server-side programming +construct, but are also used internally for numerous server operations. One of these system iterator +is the HeapIterator which implements a PriorityQueue of other Iterators. One way this iterator is +used is to merge multiple files in HDFS to present a single, sorted stream of Key-Value pairs. [ACCUMULO-2827][11] +introduces a performance optimization to the HeapIterator which can improve the speed of the +HeapIterator in common cases. + +### Write-Ahead log sync implementation + +In Hadoop-2, two implementation of "sync" are provider: hflush and hsync. Both of these +methods provide a way to request that the datanodes write the data to the underlying +medium and not just hold it in memory (the 'fsync' syscall). While both of these methods +inform the Datanodes to sync the relevant block(s), hflush does not wait for acknowledgement +from the Datanodes that the sync finished, where hsync does. To provide the most reliable system +"out of the box", Accumulo defaults to hsync so that your data is as secure as possible in +a variety of situations (notably, unexpected power outages). + +The downside is that performance tends to suffer because waiting for a sync to disk is a very +expensive operation. [ACCUMULO-2842][12] introduces a new system property, tserver.wal.sync.method, +that lets users to change the HDFS sync implementation from 'hsync' to 'hflush'. Using 'hflush' instead +of 'hsync' should result in about a 30% increase in ingest performance. + +For users upgrading from Hadoop-1 or Hadoop-0.20 releases, "hflush" is the equivalent of how +sync was implemented and should give equivalent performance. + +### Server-side mutation queue size + +When users desire writes to be as durable as possible, using 'hsync', the ingest performance +of the system can be improved by increasing the tserver.mutation.queue.max property. The cost +of this change is that it will cause TabletServers to use additional memory per writer. In 1.5.1, +the value of this parameter defaulted to a conservative 256K, which resulted in sub-par ingest +performance. + +1.5.2 and [ACCUMULO-3018][13] increases this buffer to 1M which has a noticeable impact on +ingest performance with a minimal increase in TabletServer memory usage. ## Notable Bug Fixes @@ -84,6 +126,13 @@ The Writable interface methods on the Ra calls to serialize the IteratorSettings configured for the Job. [ACCUMULO-2962][8] fixes the serialization and adds some additional tests. +### Constraint violation causes hung scans + +A failed bulk import transaction had the ability to create an infinitely retrying +loop due to a constraint violation. This directly prevents scans from completing, +but will also hang compactions. [ACCUMULO-3096][14] fixes the issue so that the +constraint no longer hangs the entire system. + ## Documentation The following documentation updates were made: @@ -130,4 +179,9 @@ and, in HDFS High-Availability instances [6]: https://issues.apache.org/jira/browse/ACCUMULO-2985 [7]: https://issues.apache.org/jira/browse/ACCUMULO-3055 [8]: https://issues.apache.org/jira/browse/ACCUMULO-2962 -[9]: https://issues.apache.org/jira/browse/ACCUMULO-2766 \ No newline at end of file +[9]: https://issues.apache.org/jira/browse/ACCUMULO-2766 +[10]: https://issues.apache.org/jira/browse/ACCUMULO-2905 +[11]: https://issues.apache.org/jira/browse/ACCUMULO-2827 +[12]: https://issues.apache.org/jira/browse/ACCUMULO-2842 +[13]: https://issues.apache.org/jira/browse/ACCUMULO-3018 +[14]: https://issues.apache.org/jira/browse/ACCUMULO-3096 \ No newline at end of file