Hi all, Lily is a data/content repository that integrates HBase with SOLR: flexible content storage and automatic index maintenance - at scale. It's available under the Apache license.
This release is the result of 3 months of hard work since Lily 0.2 last October. Our focus was stabilization, performance and robustness, providing a platform we can continue building upon. More than 50 tickets were resolved during this development sprint, and we're slowly readying ourselves for the 1.0 release. Lily 0.3 brings many gradual improvements over Lily 0.2. It has a more solid implementation of the blob fields, automatic retry of operations that fail due to I/O exceptions (between Lily client and Lily server), and other miscellaneous improvements, all listed underneath. Everything Lily can be found at www.lilyproject.org. We're now also sharing details of our commercial software subscription service with select prospects, let us know if you're interested! Here's a concise list of improvements since Lily 0.2: - Repository - Performance / space improvements - Shorter column key encoding (field id's) - Reduction of number of column families used - Avoid duplicate values in the table: make use of sparseness of the table - Drop the use of HBase rowlocks, which do not survive region splits/moves. - Use byte[] as keys in RecordType FieldType cache - API - Added a new method createOrUpdate which creates or updates a record depending on whether it already exists. This new method has the advantage over the create method that it can be retried in case of IO exceptions, i.e. it is idempotent, similar to PUT in HTTP/REST. - Allow updating versioned-mutable fields without specifying the record type. - Throw a RecordLockedException instead of generic exception when a record is locked, this allows Lily clients to retry the operation in that case. - Clear historical data when deleting a record and remove any referenced blobs. - The link index stores record IDs and field IDs as bytes instead of strings. - The record ID string representation was changed to use comma instead of semicolon to separate variant properties, since the use of semicolons was problematic in the JAX-RS based REST interface implementation. - Upgrade to Apache HBase 0.90 - Blobs - Rework blobstore functionality - Blobs can only be accessed through the record they are used in, not directly by using their blob key. This is to allow for future record-level access control. - Introduce a Repository.getBlob() method, which returns a BlobAccess object, which provides access to the blob meta data (Blob object) and the blob input stream. This avoids the need to read the record in case you need the blob metadata. - Uploaded blobs which are never used in a record are cleaned up. - The HDFS-stored blobs are stored in a hierarchical structure. - RowLog improvements - Performance improvements - the RowLog processor uses a Zookeeper based notification system instead of Netty based. - Optimize queue scanning: avoid scanning over deleted rows in the table, fix too-frequent scanning, fix endless scanning loop on startup in case of no repository activity. - The RowLog processor only processes messages of a minimal age (avoid conflicts with direct processing of wal messages). - Extended RowLogConfigurationManager to add/update rowlog configuration information. - Avoid and remove stale messages in the queue. - Allow the rowlog to either use row-level locks (wal use case) or executionstate-level locks per subscription (mq use case) when processing messages. - Added a WAL processor which handles open WAL messages. - REST interface - Adapted blob-support to new blobstore functionality. Content-Length header is now set when downloading blobs. Multi-value or hierarchical blobs are now accessible. - Support updating versioned-mutable fields. - Fixed various smaller bugs reported by users. - HBase index library - Allow to add/remove multiple entries in one call. - Performance - Fixed important performance issue whereby row scanning always ran to the end of the index table. - Enable scan caching. - Added a performance testing tool. - Indexer - Upgrade to Tika 0.8 - Performance - Avoid FieldNotFoundException when evaluating field values - the SOLR request-writer and response-parser implementation configurable. This allows to use the XML format instead of the javabin format. - LilyClient - Automatically retry operations on IOExceptions, this allows operations to survive node failures. - Automatic balancing over all Lily nodes. Each method called on the Repository object will automatically be performed on an arbitrarily selected Lily node. - Avro: switch from HTTP to Netty transport. For this, upgraded to an Avro 1.5 snapshot with patch AVRO-747. - Tester tool - Allows to configure test scenarios and indexer and solr configuration. - Has extended logging, metrics and metrics plotting (gnuplot integration) capabilities allowing for performance evaluations. - Introduces general performance testing library. - Lily server process - Ability to create tables with multiple initial regions at first cluster startup (record table, linkindex, blobincubator, ...). Also allows to set the max file size and the memstore flush size. - The initial Lily startup can now be performed on multiple nodes concurrently, previously this failed because the table creation code did not handle failures in case of concurrent table creation. - Configuration files changed so that they allow for inheritance (= fallback from one conf dir to another, to the built-in conf). Include default configuration in Kauri-module jars. All this will help in maintaining Lily configuration across Lily versions. We hope you'll enjoy this new Lily as much as we did making it. Let us know how we're doing! The Outerthought Lily team. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily