This discussion is off topic for the dev list. If you want to continue it, please move to user@.
Thanks, Carl On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump <mrevilgn...@gmail.com> wrote: > Druid was our inspiration to layer bitmap indexes on top of Cassandra. > Druid doesn't work for us because or data set is too large. We would need > many hundreds of nodes just for the pre-processed data. What I envisioned > was the ability to perform druid style queries (no aggregation) without the > limitations imposed by having the entire dataset in memory. I primarily > need to query whether a user performed some event, but I also intend to add > trigram indexes for LIKE, ILIKE or possibly regex style matching. > > I wasn't aware of CONCISE, thanks for the pointer. We are currently > evaluating fastbit, which is a very similar project: > https://sdm.lbl.gov/fastbit/ > > > On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill <b...@alumni.brown.edu > >wrote: > > > > > How does this compare with Druid? > > https://github.com/metamx/druid > > > > We're currently evaluating Acunu, Vertica and Druid... > > > > > http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html > > > > With its bitmapped indexes, Druid appears to have the most potential. > > They boast some pretty impressive stats, especially WRT handling > > "real-time" updates and adding new dimensions. > > > > They also use a compression algorithm, CONCISE, to cut down on the space > > requirements. > > http://ricerca.mat.uniroma3.it/users/colanton/concise.html > > > > I haven't looked too deep into the Druid code, but I've been meaning to > > see if it could be backed by C*. > > > > We'd be game to join the hunt if you pursue such a beast. (with your > code, > > or with portions of Druid) > > > > -brian > > > > > > On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote: > > > > > What do you think about set manipulation via indexes in Cassandra? I'm > > > interested in answering queries such as give me all users that > performed > > > event 1, 2, and 3, but not 4. If the answer is yes than I can make a > case > > > for spending my time on C*. The only downside for us would be our > current > > > prototype is in C++ so we would loose some performance and the ability > to > > > dedicate an entire machine to caching/performing queries. > > > > > > > > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com> > > wrote: > > > > > >> If you mean, "Can someone help me figure out how to get started > updating > > >> these old patches to trunk and cleaning out the Avro?" then yes, I've > > been > > >> knee-deep in indexing code recently. > > >> > > >> > > >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com> > > >> wrote: > > >> > > >>> I'm currently building a distributed cluster on top of cassandra to > > >> perform > > >>> fast set manipulation via bitmap indexes. This gives me the ability > to > > >>> perform unions, intersections, and set subtraction across > sub-queries. > > >>> Currently I'm storing index information for thousands of dimensions > as > > >>> cassandra rows, and my cluster keeps this information cached, > > distributed > > >>> and replicated in order to answer queries. > > >>> > > >>> Every couple of days I think to myself this should really exist in > C*. > > >>> Given all the benifits would there be any interest in > > >>> reviving CASSANDRA-1472? > > >>> > > >>> Some downsides are that this is very memory intensive, even for > sparse > > >>> bitmaps. > > >>> > > >> > > >> > > >> > > >> -- > > >> Jonathan Ellis > > >> Project Chair, Apache Cassandra > > >> co-founder, http://www.datastax.com > > >> @spyced > > >> > > > > -- > > Brian ONeill > > Lead Architect, Health Market Science (http://healthmarketscience.com) > > mobile:215.588.6024 > > blog: http://weblogs.java.net/blog/boneill42/ > > blog: http://brianoneill.blogspot.com/ > > > > >