Re: Bitmap indexes - reviving CASSANDRA-1472
Druid was our inspiration to layer bitmap indexes on top of Cassandra. Druid doesn't work for us because or data set is too large. We would need many hundreds of nodes just for the pre-processed data. What I envisioned was the ability to perform druid style queries (no aggregation) without the limitations imposed by having the entire dataset in memory. I primarily need to query whether a user performed some event, but I also intend to add trigram indexes for LIKE, ILIKE or possibly regex style matching. I wasn't aware of CONCISE, thanks for the pointer. We are currently evaluating fastbit, which is a very similar project: https://sdm.lbl.gov/fastbit/ On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill wrote: > > How does this compare with Druid? > https://github.com/metamx/druid > > We're currently evaluating Acunu, Vertica and Druid... > > http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html > > With its bitmapped indexes, Druid appears to have the most potential. > They boast some pretty impressive stats, especially WRT handling > "real-time" updates and adding new dimensions. > > They also use a compression algorithm, CONCISE, to cut down on the space > requirements. > http://ricerca.mat.uniroma3.it/users/colanton/concise.html > > I haven't looked too deep into the Druid code, but I've been meaning to > see if it could be backed by C*. > > We'd be game to join the hunt if you pursue such a beast. (with your code, > or with portions of Druid) > > -brian > > > On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote: > > > What do you think about set manipulation via indexes in Cassandra? I'm > > interested in answering queries such as give me all users that performed > > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case > > for spending my time on C*. The only downside for us would be our current > > prototype is in C++ so we would loose some performance and the ability to > > dedicate an entire machine to caching/performing queries. > > > > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis > wrote: > > > >> If you mean, "Can someone help me figure out how to get started updating > >> these old patches to trunk and cleaning out the Avro?" then yes, I've > been > >> knee-deep in indexing code recently. > >> > >> > >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome > >> wrote: > >> > >>> I'm currently building a distributed cluster on top of cassandra to > >> perform > >>> fast set manipulation via bitmap indexes. This gives me the ability to > >>> perform unions, intersections, and set subtraction across sub-queries. > >>> Currently I'm storing index information for thousands of dimensions as > >>> cassandra rows, and my cluster keeps this information cached, > distributed > >>> and replicated in order to answer queries. > >>> > >>> Every couple of days I think to myself this should really exist in C*. > >>> Given all the benifits would there be any interest in > >>> reviving CASSANDRA-1472? > >>> > >>> Some downsides are that this is very memory intensive, even for sparse > >>> bitmaps. > >>> > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder, http://www.datastax.com > >> @spyced > >> > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > mobile:215.588.6024 > blog: http://weblogs.java.net/blog/boneill42/ > blog: http://brianoneill.blogspot.com/ > >
Re: Bitmap indexes - reviving CASSANDRA-1472
You could embed Lucene, but then you pretty much have DSE search, and there are people on this list in a better position than I to describe the difficulty in making that scale. By rolling your own you get simplicity and control. If you use a uniform index size you can just assign chunks of it to the cassandra ring making it easy to distribute queries. I think that using Lucene in this way would cause most of the benefit of the library to be lost, and add unnecessary complexity. If Lucene were easy, then I think given the team's experience with both Lucene and C* it would have been done already. Sorry if it's a fuzzy answer, but I haven't run down every technical angle on the integration with C* yet. The idea was still very much in the wouldn't it be very cool if this thing lived in Cassandra. It would be the nail in the coffin for impala, redshift, et al. On Thu, Apr 11, 2013 at 3:15 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > What's the advantage over Lucene? > > > On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump > wrote: > > > Druid was our inspiration to layer bitmap indexes on top of Cassandra. > > Druid doesn't work for us because or data set is too large. We would need > > many hundreds of nodes just for the pre-processed data. What I envisioned > > was the ability to perform druid style queries (no aggregation) without > the > > limitations imposed by having the entire dataset in memory. I primarily > > need to query whether a user performed some event, but I also intend to > add > > trigram indexes for LIKE, ILIKE or possibly regex style matching. > > > > I wasn't aware of CONCISE, thanks for the pointer. We are currently > > evaluating fastbit, which is a very similar project: > > https://sdm.lbl.gov/fastbit/ > > > > > > On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill > >wrote: > > > > > > > > How does this compare with Druid? > > > https://github.com/metamx/druid > > > > > > We're currently evaluating Acunu, Vertica and Druid... > > > > > > > > > http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html > > > > > > With its bitmapped indexes, Druid appears to have the most potential. > > > They boast some pretty impressive stats, especially WRT handling > > > "real-time" updates and adding new dimensions. > > > > > > They also use a compression algorithm, CONCISE, to cut down on the > space > > > requirements. > > > http://ricerca.mat.uniroma3.it/users/colanton/concise.html > > > > > > I haven't looked too deep into the Druid code, but I've been meaning to > > > see if it could be backed by C*. > > > > > > We'd be game to join the hunt if you pursue such a beast. (with your > > code, > > > or with portions of Druid) > > > > > > -brian > > > > > > > > > On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote: > > > > > > > What do you think about set manipulation via indexes in Cassandra? > I'm > > > > interested in answering queries such as give me all users that > > performed > > > > event 1, 2, and 3, but not 4. If the answer is yes than I can make a > > case > > > > for spending my time on C*. The only downside for us would be our > > current > > > > prototype is in C++ so we would loose some performance and the > ability > > to > > > > dedicate an entire machine to caching/performing queries. > > > > > > > > > > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis > > > wrote: > > > > > > > >> If you mean, "Can someone help me figure out how to get started > > updating > > > >> these old patches to trunk and cleaning out the Avro?" then yes, > I've > > > been > > > >> knee-deep in indexing code recently. > > > >> > > > >> > > > >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome < > mrevilgn...@gmail.com> > > > >> wrote: > > > >> > > > >>> I'm currently building a distributed cluster on top of cassandra to > > > >> perform > > > >>> fast set manipulation via bitmap indexes. This gives me the ability > > to > > > >>> perform unions, intersections, and set subtraction across > > sub-queries. > > > >>> Currently I'm storing index information for thousands of dimensions > > as > > > >>> cassandra rows, and my cluster keeps this information cached, > > > distributed > > > >>> and replicated in order to answer queries. > > > >>> > > > >>> Every couple of days I think to myself this should really exist in > > C*. > > > >>> Given all the benifits would there be any interest in > > > >>> reviving CASSANDRA-1472? > > > >>> > > > >>> Some downsides are that this is very memory intensive, even for > > sparse > > > >>> bitmaps. > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Jonathan Ellis > > > >> Project Chair, Apache Cassandra > > > >> co-founder, http://www.datastax.com > > > >> @spyced > > > >> > > > > > > -- > > > Brian ONeill > > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > > mobile:215.588.6024 > > > blog: http://weblogs.java.net/blog/boneill42/ > > > blog: http://brianoneill.blogspot.com/ > > > > > > > > >
Re: Bitmap indexes - reviving CASSANDRA-1472
It looks like there is some interest so I'm going to disgorge everything I've learned/considered in the past couple weeks just so that we have a consistant base. I'm going to break down how the indexes work, different optimizations and drawbacks and try to address the points/questions that people have raised. I've broken it down by subject. Basic Equality: Bitmap indexes are essentially giant arrays of bytes, with one bit per possibility. If you want to know what rows have boolean value "event1" set to true then you set the address of those rows to 1 in the index. For example in index 0100100101, this would mean rows 1, 4, 7 and 9 would contain "event1". If you want to know which rows contain event1 and event2 then you do a bitwise AND over the two indexes to get the set intersection eg. 00100101 AND 1101 results in 0101. From this you can build complex queries by doing simple set arithmetic. Each of these sets are called a query dimension. Range Queries: If you want to encode ranges such as give me all users who have a counter in the integer interval [0, 2] then you need a two dimensional bitmap index. The first dimension is what values between [0, 7] have been hit: 10010011, the second dimension is which rows for each of those possible values contain the value. So for value 0 there would be another index 00100010, which means that rows 3 and 7 contain value 0. This forms a giant two dimensional array. [1] [00100010] [0] [1010] [0] [10100110] [1] [00100010] [0] [00101010] [0] [01101010] [0] [0010] [1] [00100110] [1] [0010] To figure out the answer to who has counter value [0, 2] would be the union of the sets [00100010], [1010], [10100110] which is [10101110]. Binning: Each index has an address size which limits the total number of rows/values you can encode in an index. So if you have a 32bit index out can encode a total of 4,294,967,295 positions. If you need to encode more values than what is possible in that address space you can do two things. Increase the address space or perform binning. Binning is essentially hash collision, meaning two or more values are assigned to the same value. For example if you wanted to index floats you could use the same index as above but if you want to know the rows who contain the real numbers [0,1.9] then you would need to check the actual value for the rows in the result set. This is called a candidate check which is very expensive, often the most expensive part of a query. Segments: If you increase the address size of the index then you run into space/memory problems. A 32 bit index is 4 GB when fully materialized, a 64 bit index is 2 PB. To solve this problem you use segments/partitions or compression. Segments work through the use of a sparse table or interval list. You break the index up into chunks of equal size lets say 1kb. If an bit gets flipped at position 4098, then you go to segment 0x02 and if it exists you flip that bit. If that segment doesn't exist then you create it and set the bit. The advantage is that you only have to create segments that contain flipped bits. The downside is that if you have a wide distribution of bits flipped you end up with many segments with one or two bits flipped and the is empty space or wasted memory. Compression: The other approach to use compression. The trick is that you need a compression algorithm that doesn't require decompression before doing the bitwise operations. This means you need a run length encoding (RLE). There are 3 leading contenders for king of the hill, CONCISE which is used by Druid, WAH which is used by Fastbit, and PWAH which isn't used by anybody I think yet. They all work the same way which is to encode store large blocks of zeros or ones as encoded values. So you get this index taken from PWAH [101 101011 11101100101 0010010] which means 11 blocks of 1 bits a literal block of 15 bits and 18 blocks of 0 bits. The downside to this approach is that you lose the ability to index directly to a position in the index. If you want to perform an update you've either got to read/decode to that position and split the index or rebuild the entire index. This is why Druid, and Fastbit are very good for static data sets but can't deal with fast changing data. A hybrid approach to performance: You can take a hybrid approach to performance which means combine segments and compression. You break the index address space up into segments and then perform compression within each of those segments to negate the problem of many segments with only 1 bit flipped but consuming the entire segment size worth of memory. This also limits the cost of having to split the encoding for any compressed segment. As segments fill up you can compress and join them together using the standard SSTable or levelDB approach. Distributing work across the ring: If you use segments and a uniform index size that means you can assign segments of the index to different portions of the C* ring
Re: Bitmap indexes - reviving CASSANDRA-1472
I spent some time this afternoon thinking about ways forward. I need to make progress regardless of whether or not my eventual work makes it into C*. In order to do so, I was thinking about creating an index management library and query engine in C++. Because of the nature of bitmap indexes it's ok if the engine on any node spits out partial results with a list of indexes it needs in order to complete the query. It would be up to the caller to get information to the index library, and shuttle around partial bitmaps if necessary. I like this plan for a couple reasons. I get tight the tight control over memory and the performance of C++. I can wrap a C++ library and make it accessible to C* via JNA. If the work doesn't make it into C* I can just wrap it in riak_core and still meet my goals. A library fills an empty ecosystem niche between full blown databases and collections. Unknowns areas of risk in my mind are that I don't know enough about the implementation of indexes and the query planner in C* to know whether or not this is crazy. I've read through the old patches for the aborted bitmap index attempt and it looks pretty simple, but I lack context which is hard to gather from old disembodied code. Are there any docs for the query planner or indexes other than code and the bug DB? I spent some time benchmarking over the weekend and my prototype is capable of answering 3 clause queries for fully materialsed 4GB uncompressed indexes (4B encoded values) in ~240 ms on my laptop. I'm fairly certain I'm running into memory bandwidth limitations. There might be one or two small tweaks still available to me, but they're all OS and hardware dependent. I would love any feedback, even if it's to say "Matt, you're crazy go home". Thanks, Matt Stump On Fri, Apr 12, 2013 at 10:41 AM, Matt Stump wrote: > It looks like there is some interest so I'm going to disgorge everything > I've learned/considered in the past couple weeks just so that we have a > consistant base. I'm going to break down how the indexes work, different > optimizations and drawbacks and try to address the points/questions that > people have raised. I've broken it down by subject. > > Basic Equality: > Bitmap indexes are essentially giant arrays of bytes, with one bit per > possibility. If you want to know what rows have boolean value "event1" set > to true then you set the address of those rows to 1 in the index. For > example in index 0100100101, this would mean rows 1, 4, 7 and 9 would > contain "event1". If you want to know which rows contain event1 and event2 > then you do a bitwise AND over the two indexes to get the set intersection > eg. 00100101 AND 1101 results in 0101. From this you can build > complex queries by doing simple set arithmetic. Each of these sets are > called a query dimension. > > Range Queries: > If you want to encode ranges such as give me all users who have a counter > in the integer interval [0, 2] then you need a two dimensional bitmap > index. The first dimension is what values between [0, 7] have been hit: > 10010011, the second dimension is which rows for each of those possible > values contain the value. So for value 0 there would be another index > 00100010, which means that rows 3 and 7 contain value 0. This forms a > giant two dimensional array. > > [1] [00100010] > [0] [1010] > [0] [10100110] > [1] [00100010] > [0] [00101010] > [0] [01101010] > [0] [0010] > [1] [00100110] > [1] [0010] > > To figure out the answer to who has counter value [0, 2] would be the > union of the sets [00100010], [1010], [10100110] which is [10101110]. > > Binning: > Each index has an address size which limits the total number of > rows/values you can encode in an index. So if you have a 32bit index out > can encode a total of 4,294,967,295 positions. If you need to encode more > values than what is possible in that address space you can do two things. > Increase the address space or perform binning. Binning is essentially > hash collision, meaning two or more values are assigned to the same value. > For example if you wanted to index floats you could use the same index as > above but if you want to know the rows who contain the real numbers [0,1.9] > then you would need to check the actual value for the rows in the result > set. This is called a candidate check which is very expensive, often the > most expensive part of a query. > > Segments: > If you increase the address size of the index then you run into > space/memory problems. A 32 bit index is 4 GB when fully materialized, a 64 > bit index is 2 PB. To solve this problem you use segments/partitions or > compression. Segments work through the use of a sparse table > or interval li
Add libcql to the wikis
Can someone with permissions please link to libcql on the wikis. I got yelled at by users who had trouble finding the library. The url to libcql: https://github.com/mstump/libcql The relevant wiki pages I could identify: http://wiki.apache.org/cassandra/ClientOptions http://www.planetcassandra.org/Download/ViewDownloads/ClientDriver Thanks, Matt Stump