Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-18 Thread Jonathan Ellis
I can't see us ever committing a dependency to a custom C++ library to the core, for the same reason that despite passionate advocacy we'll probably never have Scala code in the tree -- saying that potential contributors need to be familiar with Java *and* Language X to contribute to the core is ju

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-15 Thread Matt Stump
I spent some time this afternoon thinking about ways forward. I need to make progress regardless of whether or not my eventual work makes it into C*. In order to do so, I was thinking about creating an index management library and query engine in C++. Because of the nature of bitmap indexes it's ok

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Matt Stump
It looks like there is some interest so I'm going to disgorge everything I've learned/considered in the past couple weeks just so that we have a consistant base. I'm going to break down how the indexes work, different optimizations and drawbacks and try to address the points/questions that people h

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Edward Capriolo
I am not sure about the collection case. But for compact storage you can specify multiple-ranges in a slice query. https://issues.apache.org/jira/browse/CASSANDRA-3885 I am not sure this will get you all the way to bit-map indexes but in a wide row scenario it seems like you could support a "even

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Jonathan Ellis
Something like this? SELECT * FROM users WHERE user_id IN (select user_id from events where type in (1, 2, 3)) AND user_id NOT IN (select user_id from events where type=4) This doesn't really look like a Cassandra query to me. More like a query for Hive (or Drill, or Impala). But, I know Sylv

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Jason Rutherglen
Brian, The Solr StatsComponent performs aggregations. http://wiki.apache.org/solr/StatsComponent I recommend using Datastax DSE Search... On Fri, Apr 12, 2013 at 10:09 AM, Brian O'Neill wrote: > @Jason, > > I have a lot of experience with SOLR + ES, but mainly for search. (i.e. > Finding the

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Brian O'Neill
@Jason, I have a lot of experience with SOLR + ES, but mainly for search. (i.e. Finding the most relevant records given a query) That's been working well, but now we have requirements to support dashboards. Those dashboards have aggregations in them (sum, average, count(s), etc). I have limited

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-11 Thread Matt Stump
You could embed Lucene, but then you pretty much have DSE search, and there are people on this list in a better position than I to describe the difficulty in making that scale. By rolling your own you get simplicity and control. If you use a uniform index size you can just assign chunks of it to th

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-11 Thread Jason Rutherglen
What's the advantage over Lucene? On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump wrote: > Druid was our inspiration to layer bitmap indexes on top of Cassandra. > Druid doesn't work for us because or data set is too large. We would need > many hundreds of nodes just for the pre-processed data. Wh

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Jawed
information shared in this discussion is quite informative for developers. Would like to go through this kind of discussion in the group. On Thu, Apr 11, 2013 at 9:14 AM, Brandon Williams wrote: > On Wed, Apr 10, 2013 at 9:50 PM, Carl Yeksigian > wrote: > > > This discussion is off topic for t

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Brandon Williams
On Wed, Apr 10, 2013 at 9:50 PM, Carl Yeksigian wrote: > This discussion is off topic for the dev list. If you want to continue it, > please move to user@. > I disagree entirely, this is absolutely dev-oriented. -Brandon

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Carl Yeksigian
This discussion is off topic for the dev list. If you want to continue it, please move to user@. Thanks, Carl On Wed, Apr 10, 2013 at 10:43 PM, Matt Stump wrote: > Druid was our inspiration to layer bitmap indexes on top of Cassandra. > Druid doesn't work for us because or data set is too larg

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Matt Stump
Druid was our inspiration to layer bitmap indexes on top of Cassandra. Druid doesn't work for us because or data set is too large. We would need many hundreds of nodes just for the pre-processed data. What I envisioned was the ability to perform druid style queries (no aggregation) without the limi

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Brian O'Neill
How does this compare with Druid? https://github.com/metamx/druid We're currently evaluating Acunu, Vertica and Druid... http://brianoneill.blogspot.com/2013/04/bianalytics-on-big-datacassandra.html With its bitmapped indexes, Druid appears to have the most potential. They boast some pretty im

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread mrevilgnome
What do you think about set manipulation via indexes in Cassandra? I'm interested in answering queries such as give me all users that performed event 1, 2, and 3, but not 4. If the answer is yes than I can make a case for spending my time on C*. The only downside for us would be our current prototy

Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-10 Thread Jonathan Ellis
If you mean, "Can someone help me figure out how to get started updating these old patches to trunk and cleaning out the Avro?" then yes, I've been knee-deep in indexing code recently. On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome wrote: > I'm currently building a distributed cluster on top of