[VOTE] Release Apache Cassandra 1.0.9

2012-04-02 Thread Sylvain Lebresne
1.0.8 has been release more than a month ago, we made quite a few bug fixes
and don't have any major outstanding issue open. I thus propose the following
artifacts for release as 1.0.9.

sha1: 4457839b9da623d9d4a090fa444614c35d39bb4c
Git: 
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.0.9-tentative
Artifacts: 
https://repository.apache.org/content/repositories/orgapachecassandra-001/org/apache/cassandra/apache-cassandra/1.0.9/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-001/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne/

The vote will be open for 72 hours (longer if needed).

[1]: http://goo.gl/CsEDg (CHANGES.txt)
[2]: http://goo.gl/4ByoR (NEWS.txt)


digest query: why relying on value?

2012-04-02 Thread Nicolas Romanetti
  Hello,

Why does the digest read response include a hash of the column value? Isn't
the timestamp sufficient?

May be an answer:
Is the value hash computed to cope with (I presume rare) race condition
scenario where 2 nodes would end up with same col. name and same col.
timestamp but with a different col. value ?
But then I wonder how to decide which value wins!

Sincerely,

Nicolas.


Re: digest query: why relying on value?

2012-04-02 Thread Jonathan Ellis
Look at Column.reconcile.

On Mon, Apr 2, 2012 at 9:17 AM, Nicolas Romanetti  wrote:
>  Hello,
>
> Why does the digest read response include a hash of the column value? Isn't
> the timestamp sufficient?
>
> May be an answer:
> Is the value hash computed to cope with (I presume rare) race condition
> scenario where 2 nodes would end up with same col. name and same col.
> timestamp but with a different col. value ?
> But then I wonder how to decide which value wins!
>
> Sincerely,
>
> Nicolas.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: [VOTE] Release Apache Cassandra 1.0.9

2012-04-02 Thread Jonathan Ellis
+1

On Mon, Apr 2, 2012 at 8:33 AM, Sylvain Lebresne  wrote:
> 1.0.8 has been release more than a month ago, we made quite a few bug fixes
> and don't have any major outstanding issue open. I thus propose the following
> artifacts for release as 1.0.9.
>
> sha1: 4457839b9da623d9d4a090fa444614c35d39bb4c
> Git: 
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.0.9-tentative
> Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-001/org/apache/cassandra/apache-cassandra/1.0.9/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-001/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~slebresne/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/CsEDg (CHANGES.txt)
> [2]: http://goo.gl/4ByoR (NEWS.txt)



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: [VOTE] Release Apache Cassandra 1.0.9

2012-04-02 Thread Pavel Yaskevich
+1 

-- 
Pavel Yaskevich


On Monday 2 April 2012 at 17:25, Jonathan Ellis wrote:

> +1
> 
> On Mon, Apr 2, 2012 at 8:33 AM, Sylvain Lebresne  (mailto:sylv...@datastax.com)> wrote:
> > 1.0.8 has been release more than a month ago, we made quite a few bug fixes
> > and don't have any major outstanding issue open. I thus propose the 
> > following
> > artifacts for release as 1.0.9.
> > 
> > sha1: 4457839b9da623d9d4a090fa444614c35d39bb4c
> > Git: 
> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/1.0.9-tentative
> > Artifacts: 
> > https://repository.apache.org/content/repositories/orgapachecassandra-001/org/apache/cassandra/apache-cassandra/1.0.9/
> > Staging repository:
> > https://repository.apache.org/content/repositories/orgapachecassandra-001/
> > 
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~slebresne/
> > 
> > The vote will be open for 72 hours (longer if needed).
> > 
> > [1]: http://goo.gl/CsEDg (CHANGES.txt)
> > [2]: http://goo.gl/4ByoR (NEWS.txt)
> > 
> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 




Re: digest query: why relying on value?

2012-04-02 Thread Sylvain Lebresne
A digest query is about making 1 digests for many columns, not 1
digest per column. If it were 1 digest per column, then yes, the
timestamp would be an option.

--
Sylvain

On Mon, Apr 2, 2012 at 4:25 PM, Jonathan Ellis  wrote:
> Look at Column.reconcile.
>
> On Mon, Apr 2, 2012 at 9:17 AM, Nicolas Romanetti  wrote:
>>  Hello,
>>
>> Why does the digest read response include a hash of the column value? Isn't
>> the timestamp sufficient?
>>
>> May be an answer:
>> Is the value hash computed to cope with (I presume rare) race condition
>> scenario where 2 nodes would end up with same col. name and same col.
>> timestamp but with a different col. value ?
>> But then I wonder how to decide which value wins!
>>
>> Sincerely,
>>
>> Nicolas.
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


Re: digest query: why relying on value?

2012-04-02 Thread Nicolas Romanetti
Right on spot thanks!

It would be interesting to have some metrics on how rare is the case:

// break ties by comparing values.
if (timestamp() == column.timestamp())
return value().compareTo(column.value()) < 0 ? column : this;

If extremely rare, it would be may be more efficient to not hash the value
and reclaim it only when hitting a such case (ok easy to say :-))




On Mon, Apr 2, 2012 at 4:25 PM, Jonathan Ellis  wrote:

> Look at Column.reconcile.
>
> On Mon, Apr 2, 2012 at 9:17 AM, Nicolas Romanetti 
> wrote:
> >  Hello,
> >
> > Why does the digest read response include a hash of the column value?
> Isn't
> > the timestamp sufficient?
> >
> > May be an answer:
> > Is the value hash computed to cope with (I presume rare) race condition
> > scenario where 2 nodes would end up with same col. name and same col.
> > timestamp but with a different col. value ?
> > But then I wonder how to decide which value wins!
> >
> > Sincerely,
> >
> > Nicolas.
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Nicolas Romanetti
06 18 65 03 89
twitter: @nromanetti
http://www.jaxio.com/
http://www.springfuse.com/


Re: ranges

2012-04-02 Thread Jonathan Ellis
Just List for the most part.  If there are exactly two, maybe
Pair.

On Mon, Apr 2, 2012 at 6:30 PM, Mark Dewey  wrote:
> Is there an object that is standard for specifying a compound range? (eg
> [W, X] + [Y, Z])
>
> Mark



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


kudos...

2012-04-02 Thread Brian O'Neill
I just wanted to let you guys know that I gave you a shout out...
http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html

thanks for all the support,
brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


implementation choice with regard to multiple range slice query filters

2012-04-02 Thread David Alves
Hi guys

I'm a PhD student and I'm trying to dip my feet in the water wrt to 
cassandra development, as I'm a long time fan.
I'm implementing CASSANDRA-3885 which pertains to supporting returning 
multiple slices of a row.

After looking around at the portion of the code that is involved two 
implementation options come to mind and I'd like to get feedback from you on 
whichever you think might work best (or even if I'm in the right track).

As a first approach I simply subclassed SliceQueryFilter (setting start 
and finish to firstRange.start and lastRange.finish) and made the subclass not 
return the elements in between the ranges (spinning to the first element of the 
next range whenever the final element of the previous was found). This approach 
only uses one IndexedSliceReader but it scans from firstRange.start to 
lastRange.finish.

Still when I was finishing It came to mind that in cases where the 
filter's selectivity is very low i.e., the ranges are a sparse selection of the 
total number of columns, I might be doing a full row scan for nothing, so 
another option came to mind: an iterator of iterators where I use multiple 
IndexedSliceReader's for each of the required slice ranges and simply iterate 
though them.

Which do you think is the better option? Am I making any sense, or am I 
completely off track?

Any help would be greatly appreciated.

Cheers
David Ribeiro Alves




Re: kudos...

2012-04-02 Thread Jonathan Ellis
Good post.  Thanks, Brian!

On Mon, Apr 2, 2012 at 11:04 PM, Brian O'Neill  wrote:
> I just wanted to let you guys know that I gave you a shout out...
> http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html
>
> thanks for all the support,
> brian
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: implementation choice with regard to multiple range slice query filters

2012-04-02 Thread Jonathan Ellis
That would work, but I think the best approach would actually push
multiple ranges down into ISR itself, otherwise you could waste a lot
of time reading the row header redundantly (the
skipBloomFilter/deserializeIndex part).

The tricky part would be getting IndexedBlockFetcher to not do extra
work in the case where the ranges's index blocks overlap -- in other
words, best of both worlds where we "skip ahead" when the index says
we can at the end of one range, but doing a seq scan when that is more
efficient.

(Here's where I admit that I've asked several people to implement 3885
as a technical interview problem for DataStax.  For the purposes of
that interview, this last part is optional.)

On Mon, Apr 2, 2012 at 11:19 PM, David Alves  wrote:
> Hi guys
>
>        I'm a PhD student and I'm trying to dip my feet in the water wrt to 
> cassandra development, as I'm a long time fan.
>        I'm implementing CASSANDRA-3885 which pertains to supporting returning 
> multiple slices of a row.
>
>        After looking around at the portion of the code that is involved two 
> implementation options come to mind and I'd like to get feedback from you on 
> whichever you think might work best (or even if I'm in the right track).
>
>        As a first approach I simply subclassed SliceQueryFilter (setting 
> start and finish to firstRange.start and lastRange.finish) and made the 
> subclass not return the elements in between the ranges (spinning to the first 
> element of the next range whenever the final element of the previous was 
> found). This approach only uses one IndexedSliceReader but it scans from 
> firstRange.start to lastRange.finish.
>
>        Still when I was finishing It came to mind that in cases where the 
> filter's selectivity is very low i.e., the ranges are a sparse selection of 
> the total number of columns, I might be doing a full row scan for nothing, so 
> another option came to mind: an iterator of iterators where I use multiple 
> IndexedSliceReader's for each of the required slice ranges and simply iterate 
> though them.
>
>        Which do you think is the better option? Am I making any sense, or am 
> I completely off track?
>
>        Any help would be greatly appreciated.
>
> Cheers
> David Ribeiro Alves
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com