Re: [VOTE] 0.7.1 (attempt #2)

2011-01-30 Thread Stephen Connolly
I'm getting

Bad Gateway

The proxy server received an invalid response from an upstream server.

>From repository.apache.org.

So the Maven central artifacts will probably be staged tomorrow AM (as
my wife will kill me if I "waste" Sunday working on this! and she'd be
right too!) ;-)

-Stephen

On 28 January 2011 20:34, Stephen Connolly
 wrote:
> I'll drop and restage the artifacts for maven central when I get a chance
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on the
> screen
>
> On 28 Jan 2011 20:30, "Eric Evans"  wrote:
>>
>> CASSANDRA-2058[1] has landed in 0.7, so let's give this another shot. I
>> propose the following for release as 0.7.1.
>>
>> SVN:
>> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1064845
>> 0.7.1 artifacts: http://people.apache.org/~eevans
>>
>> The vote will be open for 72 hours.
>>
>>
>> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2058
>> [2]: http://goo.gl/5Tafg (CHANGES.txt)
>> [3]: http://goo.gl/PkreZ (NEWS.txt)
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>>
>


Test Case Failure on Windows

2011-01-30 Thread indika kumara
Hi All,

I am experience with test case failure due to the following error (a Windows
file deletion issue). Any solution?

Failed to delete
C:\Project\cassandra\build\test\cassandra\commitlog\CommitLog-1296382997375.log"
type="java.io.IOException">java.io.IOException: Failed to delete
C:\Project\cassandra\build\test\cassandra\commitlog\CommitLog-1296382997375.log
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:51)
at
org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:211)
at
org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:207)
at org.apache.cassandra.CleanupHelper.cleanup(CleanupHelper.java:55)
at
org.apache.cassandra.CleanupHelper.cleanupAndLeaveDirs(CleanupHelper.java:41)

Thanks,

Indika


Re: [VOTE] 0.7.1 (attempt #2)

2011-01-30 Thread Stu Hood
-0
Upgrading from 0.7.0 to these artifacts was fine, but the write ONE read
ALL distributed test times out in an unexpected location, with no error
messages on the server. The test looks valid, but is also failing in
0.8/trunk.

I'll try and bisect it tomorrow from CASSANDRA-1964 (which passed
consistently) to the breakage.


On Sun, Jan 30, 2011 at 1:14 AM, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> I'm getting
>
> Bad Gateway
>
> The proxy server received an invalid response from an upstream server.
>
> From repository.apache.org.
>
> So the Maven central artifacts will probably be staged tomorrow AM (as
> my wife will kill me if I "waste" Sunday working on this! and she'd be
> right too!) ;-)
>
> -Stephen
>
> On 28 January 2011 20:34, Stephen Connolly
>  wrote:
> > I'll drop and restage the artifacts for maven central when I get a chance
> >
> > - Stephen
> >
> > ---
> > Sent from my Android phone, so random spelling mistakes, random nonsense
> > words and other nonsense are a direct result of using swype to type on
> the
> > screen
> >
> > On 28 Jan 2011 20:30, "Eric Evans"  wrote:
> >>
> >> CASSANDRA-2058[1] has landed in 0.7, so let's give this another shot. I
> >> propose the following for release as 0.7.1.
> >>
> >> SVN:
> >>
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1064845
> >> 0.7.1 artifacts: http://people.apache.org/~eevans
> >>
> >> The vote will be open for 72 hours.
> >>
> >>
> >> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2058
> >> [2]: http://goo.gl/5Tafg (CHANGES.txt)
> >> [3]: http://goo.gl/PkreZ (NEWS.txt)
> >>
> >> --
> >> Eric Evans
> >> eev...@rackspace.com
> >>
> >>
> >>
> >
>


Simple Compression Idea

2011-01-30 Thread David G. Boney
I propose a simple idea for compression using a compressed string datatype. 

The compressed string datatype could be implemented for column family keys by 
creating a compressed string ordered partitioner. The compressed string ordered 
partitioner works by decompressing the string and then applying an ordered 
partitioner for strings to the decompressed string. The hash based partitioner 
would be used with the compressed string without any modification. The 
compressed string datatype could be implemented for column keys by creating a 
compressed string comparator. A compressed string comparator would work by 
decompressing the string and then applying a string comparator. The compressed 
string datatype could be implemented for column values. The compressed string 
datatype would be an internal datatype for Cassandra. The compressed string 
would be converted to a string before returning the value to a client. I 
suppose you could also have an option of passing the compressed form back to 
the client if the client wanted it that way.

I propose using an adaptive arithmetic coding compressor. This type of 
compression can be done a byte at a time. It will build a compression model 
only on the string that is presented, a byte at a time. See the below papers.

Moffat, Alistair, Radford M. Neal, & Ian H. Witten, (1998), "Arithmetic Coding 
Revisited", ACM Trans. on Info. Systems, Vol. 16, No. 3, pp. 256-294.

Witten, Ian H., Radford M. Neal, & John G. Cleary, (1987), "Arithmetic Coding 
for Data Compression", Communications of the ACM, Vol. 30, No. 6, pp. 520-540.

It has been reported that arithmetic coding based compression applied to text 
can get compression ratios of up to 2.2 bits per character. Assuming you only 
get 4 bits per character because of short strings. This would be a 50% 
compression of text data, including keys and column names. Many applications 
would benefit. It should speed up the overall operation of Cassandra because 
you would be moving significantly less data through the system.

This would provide a compression option that could be implemented without any 
redesign to the internal structure of Cassandra except for the a new 
partitioner class, a new comparator class, a new datatype class, and the 
compression class.
-
Sincerely,
David G. Boney
dbon...@semanticartifacts.com
http://www.semanticartifacts.com






Re: Bringing a node back online after failure

2011-01-30 Thread Jonathan Ellis
I think we'd need a new operation type
(https://issues.apache.org/jira/browse/CASSANDRA-957) to go from "some
of the data gets streamed" to "all of the data gets streamed."  A node
that claims a token that is in the ring is assumed to actually have
that data and IMO trying to guess when to break that would be
error-prone -- better to have some explicit signal.

On Sun, Jan 30, 2011 at 1:38 AM, Chris Goffinet  wrote:
> I was looking over the Operations wiki, and with the many improvements with 
> 0.7, I wanted to bring up a thought.
>
> The two options today for replacing a node that has lost all data is:
>
> (Recommended approach) Bring up the replacement node with a new IP address, 
> and AutoBootstrap set to true in storage-conf.xml. This will place the 
> replacement node in the cluster and find the appropriate position 
> automatically. Then the bootstrap process begins. While this process runs, 
> the node will not receive reads until finished. Once this process is finished 
> on the replacement node, run nodetool removetoken once, supplying the token 
> of the dead node, and nodetool cleanup on each node.
> (Alternative approach) Bring up a replacement node with the same IP and token 
> as the old, and run nodetool repair. Until the repair process is complete, 
> clients reading only from this node may get no data back. Using a higher 
> ConsistencyLevel on reads will avoid this.
>
> For nodes that might have a drive failure, but same ip address, what do you 
> think about supplying the node's same token + autobootstrap set to true? This 
> process works in trunk, but not all the data seems to be streamed over from 
> it's replicas. This would provide the option to not let a node take on reads 
> until replicas stream the SSTables over and would eliminate the alternative 
> approach of forcing higher consistency levels.
>
> -Chris
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Bringing a node back online after failure

2011-01-30 Thread Chris Goffinet
So you would be okay if I added -Dreplace_token as the check to do that?

-Chris

On Jan 30, 2011, at 9:55 PM, Jonathan Ellis wrote:

> I think we'd need a new operation type
> (https://issues.apache.org/jira/browse/CASSANDRA-957) to go from "some
> of the data gets streamed" to "all of the data gets streamed."  A node
> that claims a token that is in the ring is assumed to actually have
> that data and IMO trying to guess when to break that would be
> error-prone -- better to have some explicit signal.
> 
> On Sun, Jan 30, 2011 at 1:38 AM, Chris Goffinet  
> wrote:
>> I was looking over the Operations wiki, and with the many improvements with 
>> 0.7, I wanted to bring up a thought.
>> 
>> The two options today for replacing a node that has lost all data is:
>> 
>> (Recommended approach) Bring up the replacement node with a new IP address, 
>> and AutoBootstrap set to true in storage-conf.xml. This will place the 
>> replacement node in the cluster and find the appropriate position 
>> automatically. Then the bootstrap process begins. While this process runs, 
>> the node will not receive reads until finished. Once this process is 
>> finished on the replacement node, run nodetool removetoken once, supplying 
>> the token of the dead node, and nodetool cleanup on each node.
>> (Alternative approach) Bring up a replacement node with the same IP and 
>> token as the old, and run nodetool repair. Until the repair process is 
>> complete, clients reading only from this node may get no data back. Using a 
>> higher ConsistencyLevel on reads will avoid this.
>> 
>> For nodes that might have a drive failure, but same ip address, what do you 
>> think about supplying the node's same token + autobootstrap set to true? 
>> This process works in trunk, but not all the data seems to be streamed over 
>> from it's replicas. This would provide the option to not let a node take on 
>> reads until replicas stream the SSTables over and would eliminate the 
>> alternative approach of forcing higher consistency levels.
>> 
>> -Chris
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com