cassandra increment counters, Jira #1072

2010-08-12 Thread Robin Bowes
Hi Jonathan,

I'm contacting you in your capacity as project lead for the cassandra
project. I am wondering how close ticket #1072 is to implementation [1]

We are about to do a proof of concept with cassandra to replace around
20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
master/slave in DC B).

We're essentially just counting web hits - around 10k/second at peak
times - so increment counters is pretty much essential functionality for us.

How close is the patch in #1072 to being acceptable? What is blocking it?

Thanks,

R.

[1] https://issues.apache.org/jira/browse/CASSANDRA-1072



Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Kelvin Kakugawa
Hi Robin,

Johan and I have brought the code up to trunk.  It's ready to be
reviewed.  However, in Jonathan's defense, it does require separate
code paths.  Since, we're aggregating commutative operations, not
updating a value.

I think the underlying unanswered question is whether #1072 is a niche
feature or whether it should be brought into trunk.

-Kelvin

On Thu, Aug 12, 2010 at 1:28 AM, Robin Bowes  wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Jesse McConnell
out of curiosity are you shooting for incrementing these counters 10k
times a second for sustained periods of time?

cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Thu, Aug 12, 2010 at 03:28, Robin Bowes  wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>


RE: [VOTE] 0.7.0-beta1

2010-08-12 Thread Stu Hood
+1
Artifacts look good!

-Original Message-
From: "Eric Evans" 
Sent: Tuesday, August 10, 2010 12:20pm
To: dev@cassandra.apache.org
Subject: [VOTE] 0.7.0-beta1


Today is the Cassandra Summit in San Francisco, the first ever.  As I
type this, Jonathan Ellis is at the podium delivering the keynote, and
the room is packed.  It's truly amazing how far this project has come in
such a short time, and it's great to see so many like-minded people
excited about Cassandra.

The work that's happened in trunk/ since 0.6 is just further evidence of
this progress[1], and now seems like the perfect time to expose it to a
wider audience.

I propose the following artifacts for release as 0.7.0 beta1.

SVN Tag:
https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.0-beta1
0.7.0-beta1 artifacts: http://people.apache.org/~eevans

The vote will be open for 72 hours.

Thanks!

[1]:
https://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.0-beta1/CHANGES.txt

-- 
Eric Evans
eev...@rackspace.com





RE: cassandra increment counters, Jira #1072

2010-08-12 Thread Viktor Jevdokimov
We're also looking into increment counters with the same load. It will not be 
periods, it will be constantly.

Viktor


-Original Message-
From: Jesse McConnell [mailto:jesse.mcconn...@gmail.com] 
Sent: Thursday, August 12, 2010 9:21 PM
To: dev@cassandra.apache.org
Subject: Re: cassandra increment counters, Jira #1072

out of curiosity are you shooting for incrementing these counters 10k
times a second for sustained periods of time?

cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Thu, Aug 12, 2010 at 03:28, Robin Bowes  wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>



Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Ryan King
On Thu, Aug 12, 2010 at 11:21 AM, Jesse McConnell
 wrote:
> out of curiosity are you shooting for incrementing these counters 10k
> times a second for sustained periods of time?

Our use cases include 100,000's of increments a second but most of the
values will only be incremented for a relatively short window of time.

This is for a real-time analytics system we're working on for both
business and technical analytics (like system monitoring).

We're hoping to open source this system at some point, but the
architecture is dependent on having distributed counters in cassandra.

-ryan


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Robin Bowes
On 12/08/10 19:21, Jesse McConnell wrote:
> out of curiosity are you shooting for incrementing these counters 10k
> times a second for sustained periods of time?

Jesse,

Our traffic pattern varies between 5.5k and 10k connections/hits per
second. We currently process the hits and log to MySQL (partitioned
DBs). We're looking into the possibility of using cassandra. I don't
think we'll be sending each hit to the DB individually, ie. 10k hits/sec
won't correspond to 10k updates/sec, but I imagine the counter updates
will be fairly high volume. We'll bottom that out in our initial testing.

R.



Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa  wrote:
>
> I think the underlying unanswered question is whether #1072 is a niche
> feature or whether it should be brought into trunk.
>

This should not be an unanswered question!  #1072 should be considered
essential, as it enables numerous use cases that currently require
bolting something like memcache or redis onto the side to handle
counters.

+1 on getting this into trunk ASAP.


b


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Colin Taylor
Would it help prioritizing  if silent majority chimed in if keen on
this functionality which is so key to large scale analytical apps?
in which case  :

+1

Although perhaps I should encourage signing up on jira and vote there.

https://issues.apache.org/jira/secure/Signup!default.jspa
https://issues.apache.org/jira/browse/CASSANDRA-1072

[We intend counting various attributes of the 100 million documents
coming through our system a day]

On Fri, Aug 13, 2010 at 11:15 AM, Benjamin Black  wrote:
> On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa  wrote:
>>
>> I think the underlying unanswered question is whether #1072 is a niche
>> feature or whether it should be brought into trunk.
>>
>
> This should not be an unanswered question!  #1072 should be considered
> essential, as it enables numerous use cases that currently require
> bolting something like memcache or redis onto the side to handle
> counters.
>
> +1 on getting this into trunk ASAP.
>
>
> b
>


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Dave Revell
For what it's worth, my team would very much like to see counters in trunk.

Right now we're trying to think of ways to implement counters by inserting
columns and counting the sizes of slices, and it seems difficult to do it
quickly and correctly at scale, even with low consistency.

-Dave

On Thu, Aug 12, 2010 at 4:31 PM, Colin Taylor wrote:

> Would it help prioritizing  if silent majority chimed in if keen on
> this functionality which is so key to large scale analytical apps?
> in which case  :
>
> +1
>
> Although perhaps I should encourage signing up on jira and vote there.
>
> https://issues.apache.org/jira/secure/Signup!default.jspa
> https://issues.apache.org/jira/browse/CASSANDRA-1072
>
> [We intend counting various attributes of the 100 million documents
> coming through our system a day]
>
> On Fri, Aug 13, 2010 at 11:15 AM, Benjamin Black  wrote:
> > On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa 
> wrote:
> >>
> >> I think the underlying unanswered question is whether #1072 is a niche
> >> feature or whether it should be brought into trunk.
> >>
> >
> > This should not be an unanswered question!  #1072 should be considered
> > essential, as it enables numerous use cases that currently require
> > bolting something like memcache or redis onto the side to handle
> > counters.
> >
> > +1 on getting this into trunk ASAP.
> >
> >
> > b
> >
>


Embedded integration testing causing strage issues

2010-08-12 Thread Todd Nine
Hi all,
  We've downloaded and used Ran's embedded Cassandra helper from the
Hector client.  It works really well for performing basic integration
testing.  We have a more advanced test that uses TCP sockets and
threading.  We send about 400 pieces of data via TCP.  This results in
~1200 writes and ~800 reads in our embedded Cassandra over 15 seconds.
Occasionally, the test will fail due to less than the 400 packets of
data being written, though the calling client code receives no errors
from Cassandra on the writes.  We see this behavior more on dual
processor systems over quad core systems.  If we change the tests to use
Cassandra running on it's own, the issues seem to disappear regardless
of the system size.  

This isn't a stress test, just a simple test with a very small input
set.  Is it possible to tweak any of the storage-conf.xml settings to
make the embedded test instance more reliable?  We don't need high
throughput only reliability to test our code and the results in simple
integration tests.

Version 0.6.3

Thanks,
Todd


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Jonathan Ellis
There are two concerns that give me pause.

The first is that 1072 is tackling a use case that Cassandra already
handles well: high volume of writes to a counter, with low volume
reads.  (This can be done by inserting uuids into a counter row, and
aggregating them either in the background or at read time or with some
combination of these.  The counter rows can be sharded if necessary.)

The second is that the approach in 1072 resembles an entirely separate
system that happens to use part of Cassandra infrastructure -- the
thrift API, the MessagingService, the sstable format -- but isn't
really part of it.  ConsistencyLevel is not respected, and special
cases abound to weld things in that don't fit, e.g. the AES/Streaming
business.

On Thu, Aug 12, 2010 at 1:28 AM, Robin Bowes  wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Embedded integration testing causing strage issues

2010-08-12 Thread Nathan McCall
Do you have a test case you can put up somewhere that triggers this? I
have not had this issue, but I don't think we test with any more than
100 rows at most for any given unit test.

On Thu, Aug 12, 2010 at 6:59 PM, Todd Nine  wrote:
> Hi all,
>  We've downloaded and used Ran's embedded Cassandra helper from the
> Hector client.  It works really well for performing basic integration
> testing.  We have a more advanced test that uses TCP sockets and
> threading.  We send about 400 pieces of data via TCP.  This results in
> ~1200 writes and ~800 reads in our embedded Cassandra over 15 seconds.
> Occasionally, the test will fail due to less than the 400 packets of
> data being written, though the calling client code receives no errors
> from Cassandra on the writes.  We see this behavior more on dual
> processor systems over quad core systems.  If we change the tests to use
> Cassandra running on it's own, the issues seem to disappear regardless
> of the system size.
>
> This isn't a stress test, just a simple test with a very small input
> set.  Is it possible to tweak any of the storage-conf.xml settings to
> make the embedded test instance more reliable?  We don't need high
> throughput only reliability to test our code and the results in simple
> integration tests.
>
> Version 0.6.3
>
> Thanks,
> Todd
>