Repository: accumulo Updated Branches: refs/heads/gh-pages be06c7629 -> e70549671
Added sampling to release notes Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/e7054967 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/e7054967 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/e7054967 Branch: refs/heads/gh-pages Commit: e705496714f39f5bf1383710ba253adb695948d7 Parents: be06c76 Author: Keith Turner <ktur...@apache.org> Authored: Tue Sep 6 11:18:07 2016 -0400 Committer: Keith Turner <ktur...@apache.org> Committed: Tue Sep 6 11:18:07 2016 -0400 ---------------------------------------------------------------------- release_notes/1.8.0.md | 55 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 43 insertions(+), 12 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/e7054967/release_notes/1.8.0.md ---------------------------------------------------------------------- diff --git a/release_notes/1.8.0.md b/release_notes/1.8.0.md index b191dfb..6dfe2ad 100644 --- a/release_notes/1.8.0.md +++ b/release_notes/1.8.0.md @@ -47,8 +47,8 @@ default. Root tablet assignment can not be suspended. See [ACCUMULO-4353] for mo ### Run multiple Tablet Servers on one node -[ACCUMULO-4328] introduces the capability of running multiple tservers on a single node. This intended for nodes with a large -amount of memory. This feature is disabled by default. There are several related tickets: [ACCUMULO-4072], [ACCUMULO-4331] +[ACCUMULO-4328] introduces the capability of running multiple tservers on a single node. This is intended for nodes with a large +amounts of memory and/or disk. This feature is disabled by default. There are several related tickets: [ACCUMULO-4072], [ACCUMULO-4331] and [ACCUMULO-4406]. Note that when this is enabled, the names of the log files change. Previous log file names were defined in the generic_logger.xml as `${org.apache.accumulo.core.application}_{org.apache.accumulo.core.ip.localhost.hostname}.log`. The files will now include the instance id after the application with @@ -60,11 +60,32 @@ names do not change if this feature is not used. ### Rate limiting Major Compactions -Major Compactions can significantly increase the amount of load on TabletServers. [ACCUMULO-4187] take a cue from Apache +Major Compactions can significantly increase the amount of load on TabletServers. [ACCUMULO-4187] takes a cue from Apache Cassandra and restricts the rate at which data is read and written when performing major compactions. This has a direct effect on the IO load caused by major compactions with a similar effect on the CPU utilization. This behavior is controlled by a new property `tserver.compaction.major.throughput` with a defaults of 0B which disables the rate limiting. +### Sampling + +Queryable sample data was added by [ACCUMULO-3913]. This allows users to configure a pluggable +function to generate sample data. At scan time, the sample data can optionally be scanned. +Iterators also have access to sample data. Iterators can access all data and sample data, this +allows an iterator to use sample data for query optimizations. The new user level RFile API +supports writing RFiles with sample data for bulk import. + +A simple configurable sampler function is included with Accumulo. This sampler uses hashing and +can be configured to use a subset of Key fields. For example if it was desired to have entire rows +in the sample, then this sampler would be configured to hash+mod the row. Then when a row is +selected for the sample, all of its columns and all of its updates will be in the sample data. +Another scenario is one in which a document id is in the column qualifier. In this scenario, one +would either want all data related to a document in the sample data or none. To achieve this, the +sample could be configured to hash+mod on the column qualifier. See the sample [Readme +example][sample] and javadocs on the new APIs for more information. + +For sampling to work, all tablets scanned must have pre-generated sample data that was generated in +the same way. If this is not the case then scans will fail. For existing tables, samples can be +generated by configuring sampling on the table and compacting the table. + ### Upgrade to Apache Thrift 0.9.3 Accumulo relies on Apache Thrift to implement remote procedure calls between Accumulo services. @@ -74,7 +95,7 @@ on the changes to Thrift. ### Iterator Test Harness Users often write iterators without fully understanding its limits and lifetime. Previously, Accumulo did -not provide any means in which a user could test iterators to catch common issues that only become apparant +not provide any means in which a user could test iterators to catch common issues that only become apparent in multi-node production deployments. [ACCUMULO-626] provides a framework and a collection of initial tests which can be used to simulate common issues with Iterators that only appear in production deployments. This test harness can be used directly by users as a supplemental tool to unit tests and integration tests with MiniAccumuloCluster. @@ -93,14 +114,18 @@ defaults out of the ephemeral range, we can guarantee that the Monitor and GC wi ## Other Notable Changes - * [ACCUMULO-1055][ACCUMULO-1055] Configurable maximum file size for merging minor compactions - * [ACCUMULO-1124][ACCUMULO-1124] Optimization of RFile index - * [ACCUMULO-2883][ACCUMULO-2883] API to fetch current tablet assignments - * [ACCUMULO-3871][ACCUMULO-3871] Support for running integration tests in MapReduce - * [ACCUMULO-3920][ACCUMULO-3920] Deprecate the MockAccumulo class and remove usage in our tests - * [ACCUMULO-4339][ACCUMULO-4339] Make hadoop-minicluster optional dependency of acccumulo-minicluster - * [ACCUMULO-4354][ACCUMULO-4354] Bump dependency versions to include gson, jetty, and sl4j - * [ACCUMULO-3735][ACCUMULO-3735] Bulk Import status page on the monitor + * [ACCUMULO-1055] Configurable maximum file size for merging minor compactions + * [ACCUMULO-1124] Optimization of RFile index + * [ACCUMULO-2883] API to fetch current tablet assignments + * [ACCUMULO-3871] Support for running integration tests in MapReduce + * [ACCUMULO-3920] Deprecate the MockAccumulo class and remove usage in our tests + * [ACCUMULO-4339] Make hadoop-minicluster optional dependency of acccumulo-minicluster + * [ACCUMULO-4318] BatchWriter, ConditionalWriter, and ScannerBase now extend AutoCloseable + * [ACCUMULO-4326] Value constructor now accepts Strings (and Charsequences) + * [ACCUMULO-4354] Bump dependency versions to include gson, jetty, and sl4j + * [ACCUMULO-3735] Bulk Import status page on the monitor + * [ACCUMULO-4066] Reduced time to processes conditional mutations. + * [ACCUMULO-4164] Reduced seek time for cached data. ## Testing @@ -127,11 +152,16 @@ HDFS High-Availability instances, forcing NameNode failover. [ACCUMULO-3423]: https://issues.apache.org/jira/browse/ACCUMULO-3423 [ACCUMULO-3735]: https://issues.apache.org/jira/browse/ACCUMULO-3735 [ACCUMULO-3871]: https://issues.apache.org/jira/browse/ACCUMULO-3871 +[ACCUMULO-3913]: https://issues.apache.org/jira/browse/ACCUMULO-3913 [ACCUMULO-3920]: https://issues.apache.org/jira/browse/ACCUMULO-3920 [ACCUMULO-4072]: https://issues.apache.org/jira/browse/ACCUMULO-4072 [ACCUMULO-4077]: https://issues.apache.org/jira/browse/ACCUMULO-4077 +[ACCUMULO-4066]: https://issues.apache.org/jira/browse/ACCUMULO-4066 +[ACCUMULO-4164]: https://issues.apache.org/jira/browse/ACCUMULO-4164 [ACCUMULO-4165]: https://issues.apache.org/jira/browse/ACCUMULO-4165 [ACCUMULO-4187]: https://issues.apache.org/jira/browse/ACCUMULO-4187 +[ACCUMULO-4318]: https://issues.apache.org/jira/browse/ACCUMULO-4318 +[ACCUMULO-4326]: https://issues.apache.org/jira/browse/ACCUMULO-4326 [ACCUMULO-4328]: https://issues.apache.org/jira/browse/ACCUMULO-4328 [ACCUMULO-4331]: https://issues.apache.org/jira/browse/ACCUMULO-4331 [ACCUMULO-4339]: https://issues.apache.org/jira/browse/ACCUMULO-4339 @@ -144,4 +174,5 @@ HDFS High-Availability instances, forcing NameNode failover. [THRIFT-0.9.3-RN]: https://github.com/apache/thrift/blob/0.9.3/CHANGES [api]: https://github.com/apache/accumulo/blob/1.8/README.md#api [semver]: http://semver.org +[sample]: http://accumulo.apache.org/1.8/examples/sample [ITER_TEST]: https://accumulo.apache.org/1.8/accumulo_user_manual.html#_iterator_testing