Re: Bitmap indexes - reviving CASSANDRA-1472

Carl Yeksigian Wed, 10 Apr 2013 19:51:34 -0700

This discussion is off topic for the dev list. If you want to continue it,
please move to user@.


Thanks,
Carl


On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump <mrevilgn...@gmail.com> wrote:

> Druid was our inspiration to layer bitmap indexes on top of Cassandra.
> Druid doesn't work for us because or data set is too large. We would need
> many hundreds of nodes just for the pre-processed data. What I envisioned
> was the ability to perform druid style queries (no aggregation) without the
> limitations imposed by having the entire dataset in memory. I primarily
> need to query whether a user performed some event, but I also intend to add
> trigram indexes for LIKE, ILIKE or possibly regex style matching.
>
> I wasn't aware of CONCISE, thanks for the pointer. We are currently
> evaluating fastbit, which is a very similar project:
> https://sdm.lbl.gov/fastbit/
>
>
> On Wed, Apr 10, 2013 at 5:49 PM, Brian O'Neill <b...@alumni.brown.edu
> >wrote:
>
> >
> > How does this compare with Druid?
> > https://github.com/metamx/druid
> >
> > We're currently evaluating Acunu, Vertica and Druid...
> >
> >
> http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html
> >
> > With its bitmapped indexes, Druid appears to have the most potential.
> > They boast some pretty impressive stats, especially WRT handling
> > "real-time" updates and adding new dimensions.
> >
> > They also use a compression algorithm, CONCISE, to cut down on the space
> > requirements.
> > http://ricerca.mat.uniroma3.it/users/colanton/concise.html
> >
> > I haven't looked too deep into the Druid code, but I've been meaning to
> > see if it could be backed by C*.
> >
> > We'd be game to join the hunt if you pursue such a beast. (with your
> code,
> > or with portions of Druid)
> >
> > -brian
> >
> >
> > On Apr 10, 2013, at 5:40 PM, mrevilgnome wrote:
> >
> > > What do you think about set manipulation via indexes in Cassandra? I'm
> > > interested in answering queries such as give me all users that
> performed
> > > event 1, 2, and 3, but not 4. If the answer is yes than I can make a
> case
> > > for spending my time on C*. The only downside for us would be our
> current
> > > prototype is in C++ so we would loose some performance and the ability
> to
> > > dedicate an entire machine to caching/performing queries.
> > >
> > >
> > > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis <jbel...@gmail.com>
> > wrote:
> > >
> > >> If you mean, "Can someone help me figure out how to get started
> updating
> > >> these old patches to trunk and cleaning out the Avro?" then yes, I've
> > been
> > >> knee-deep in indexing code recently.
> > >>
> > >>
> > >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome <mrevilgn...@gmail.com>
> > >> wrote:
> > >>
> > >>> I'm currently building a distributed cluster on top of cassandra to
> > >> perform
> > >>> fast set manipulation via bitmap indexes. This gives me the ability
> to
> > >>> perform unions, intersections, and set subtraction across
> sub-queries.
> > >>> Currently I'm storing index information for thousands of dimensions
> as
> > >>> cassandra rows, and my cluster keeps this information cached,
> > distributed
> > >>> and replicated in order to answer queries.
> > >>>
> > >>> Every couple of days I think to myself this should really exist in
> C*.
> > >>> Given all the benifits would there be any interest in
> > >>> reviving CASSANDRA-1472?
> > >>>
> > >>> Some downsides are that this is very memory intensive, even for
> sparse
> > >>> bitmaps.
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Jonathan Ellis
> > >> Project Chair, Apache Cassandra
> > >> co-founder, http://www.datastax.com
> > >> @spyced
> > >>
> >
> > --
> > Brian ONeill
> > Lead Architect, Health Market Science (http://healthmarketscience.com)
> > mobile:215.588.6024
> > blog: http://weblogs.java.net/blog/boneill42/
> > blog: http://brianoneill.blogspot.com/
> >
> >
>

Re: Bitmap indexes - reviving CASSANDRA-1472

Reply via email to