Re: How is Cassandra being used?

2011-11-16 Thread Zhu Han
On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer  wrote:

> 2011/11/16 Jonathan Ellis :
> > I started a "users survey" thread over on the users list (replies are
> > still trickling in), but as useful as that is, I'd like to get
> > feedback that is more quantitative and with a broader base.  This will
> > let us prioritize our development efforts to better address what
> > people are actually using it for, with less guesswork.  For instance:
> > we put a lot of effort into compression for 1.0.0; if it turned out
> > that only 1% of 1.0.x users actually enable compression, then it means
> > that we should spend less effort fine-tuning that moving forward, and
> > use the energy elsewhere.
> >
> > (Of course it could also mean that we did a terrible job getting the
> > word out about new features and explaining how to use them, but either
> > way, it would be good to know!)
> >
> > I propose adding a basic cluster reporting feature to cassandra.yaml,
> > enabled by default.  It would send anonymous information about your
> > cluster to an apache.org VM.  Information like, number (but not names)
> > of keyspaces and columnfamilies, ks-level options like compression, cf
> > options like compaction strategy, data types (again, not names) of
> > columns, average row size (or better: the histogram data), and average
> > sstables per read.
> >
> > Thoughts?
>

-1.

It may scare some admins who stores sensitive data  in cassandra. Even if
it can
disabled, we can not sleep well in the night when we know the door can be
opened unintentionally...


> Hi there,
>
> I'm not a cassandra dev but an user of it. I would really "hate" to
> see such code in the cassandra code-base. I understand that it would
> be kind of useful to get a better feeling about usage etc, but its
> really something that scares the shit out of many managers (and even
> devs ;) ).
>
> So -1 to add this code (*non-binding)
>
> Bye,
> Norman
>


Re: How is Cassandra being used?

2011-11-16 Thread Peter Tillotson
I've read through the thread and have a few comments and and idea. 

1) I can understand a preference for opt in
2) As a user I would have probably opted in every time I hit a performance issue
3) Opt in may well be skewed to poorer use cases or hardware issues
4) There is a trust gap that needs to be bridged before opt out is acceptable

Now for the Idea, perhaps a report tool, in nodetool that generates a human 
readable profile, in the short term a manual submission process, perhaps down 
the line fully automated.

So basically there are two good plans in your email
1) Standard reporting  (+1)
2) Automated feedback (opt in +1)

 p



From: Jonathan Ellis 
To: dev 
Sent: Tuesday, 15 November 2011, 23:23
Subject: How is Cassandra being used?

I started a "users survey" thread over on the users list (replies are
still trickling in), but as useful as that is, I'd like to get
feedback that is more quantitative and with a broader base.  This will
let us prioritize our development efforts to better address what
people are actually using it for, with less guesswork.  For instance:
we put a lot of effort into compression for 1.0.0; if it turned out
that only 1% of 1.0.x users actually enable compression, then it means
that we should spend less effort fine-tuning that moving forward, and
use the energy elsewhere.

(Of course it could also mean that we did a terrible job getting the
word out about new features and explaining how to use them, but either
way, it would be good to know!)

I propose adding a basic cluster reporting feature to cassandra.yaml,
enabled by default.  It would send anonymous information about your
cluster to an apache.org VM.  Information like, number (but not names)
of keyspaces and columnfamilies, ks-level options like compression, cf
options like compaction strategy, data types (again, not names) of
columns, average row size (or better: the histogram data), and average
sstables per read.

Thoughts?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: How is Cassandra being used?

2011-11-16 Thread Dave Brosius

On 11/15/2011 09:01 PM, Jonathan Ellis wrote:

On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans  wrote:

I think this is potentially quite dangerous; There are a lot people
who get very twitchy at the idea of software that Phones Home.  I've
seen this so many times, and in all cases it was for software a lot
less sensitive than a database.

True, but unlike most Home Phoners, ours will be out there in the open
and you can see exactly what it's sending (or not, if you disable it).
  I'm sure there's other examples in the wild of this, but the only one
I can think of is popcorn [1].

More broadly, my sense is that people are getting used to the idea
that it's okay to give away anonymous statistics as part of the price
of "free," although YMMclearlyV. I am, after all, a Windows user. :)


I'm sure you've already considered this though, you're already talking
about anonymity, and transparency, and what I assume is neutrality of
the collection endpoint (can apache actually provide a VM; is that a
thing?).

Yes, they provide Ubuntu or FreeBSD VMs.


  I'm just afraid that we'll scare people off before they can
be properly convinced that it's all on the up-and-up.

How would you propose addressing this?


I'm curious to see what others think, but at the moment I'm hovering
somewhere around a -0 if it were opt-in (off by default).

I'm okay with opt-in if you think that's useful as a first step to
ease the twitchiness you mention, but longer term I think it's only
really useful if it's on by default. There's a lot of research that
shows that people tend to stick with whatever is the path of least
resistance [2], and specifically, my experience with Cassandra users
is exactly that -- one reason we've spent so much effort getting
defaults so good is because almost nobody goes beyond that.

[1] http://popcon.debian.org/
[2] 
http://www.richmondfed.org/publications/research/region_focus/2007/winter/pdf/feature2.pdf



+1 for opt-in, altho perhaps you should think about how to increase 
opt-in rates. I'm afraid having it buried in the yaml will keep folks 
from enabling it, even if they were ok with doing so (if they knew it 
existed). Perhaps prompting (once) on server start up? -- not sure.





Re: How is Cassandra being used?

2011-11-16 Thread Dave Brosius
+1 for an opt-in approach. To get better opt-in rates perhaps prompt for it on 
start (once) rather than hope folks find it buried in the yaml

Eric Evans  wrote:

>On Tue, Nov 15, 2011 at 11:23 PM, Jonathan Ellis  wrote:
>> I started a "users survey" thread over on the users list (replies are
>> still trickling in), but as useful as that is, I'd like to get
>> feedback that is more quantitative and with a broader base.  This will
>> let us prioritize our development efforts to better address what
>> people are actually using it for, with less guesswork.  For instance:
>> we put a lot of effort into compression for 1.0.0; if it turned out
>> that only 1% of 1.0.x users actually enable compression, then it means
>> that we should spend less effort fine-tuning that moving forward, and
>> use the energy elsewhere.
>>
>> (Of course it could also mean that we did a terrible job getting the
>> word out about new features and explaining how to use them, but either
>> way, it would be good to know!)
>>
>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>> enabled by default.  It would send anonymous information about your
>> cluster to an apache.org VM.  Information like, number (but not names)
>> of keyspaces and columnfamilies, ks-level options like compression, cf
>> options like compaction strategy, data types (again, not names) of
>> columns, average row size (or better: the histogram data), and average
>> sstables per read.
>>
>> Thoughts?
>
>I think this is potentially quite dangerous; There are a lot people
>who get very twitchy at the idea of software that Phones Home.  I've
>seen this so many times, and in all cases it was for software a lot
>less sensitive than a database.
>
>I'm sure you've already considered this though, you're already talking
>about anonymity, and transparency, and what I assume is neutrality of
>the collection endpoint (can apache actually provide a VM; is that a
>thing?).  I'm just afraid that we'll scare people off before they can
>be properly convinced that it's all on the up-and-up.
>
>I'm curious to see what others think, but at the moment I'm hovering
>somewhere around a -0 if it were opt-in (off by default).
>
>-- 
>Eric Evans
>Acunu | http://www.acunu.com | @acunu


AW: How is Cassandra being used?

2011-11-16 Thread Roland Gude
I think it is a very good idea to gather such information and to make it easy 
for the users who want to or don't care and to consider the "twitchiness" as 
well.
How about putting the reporting code in a separate module/jar and report 
statistics if the jar is there and don’t if it is not (similar as it is done 
with using native stuff if JNA is there)
The one could provide to binary archives on the homepage. One with the jar and 
one without. That way people who do not want the code simply select the 
shipment where it is not included or delete the jar.
You could - (and should) - even stick a "Enable/Disable" switch on top of it. 


-Ursprüngliche Nachricht-
Von: Dave Brosius [mailto:dbros...@baybroadband.net] 
Gesendet: Mittwoch, 16. November 2011 02:25
An: dev@cassandra.apache.org
Betreff: Re: How is Cassandra being used?

+1 for an opt-in approach. To get better opt-in rates perhaps prompt for it on 
start (once) rather than hope folks find it buried in the yaml

Eric Evans  wrote:

>On Tue, Nov 15, 2011 at 11:23 PM, Jonathan Ellis  wrote:
>> I started a "users survey" thread over on the users list (replies are
>> still trickling in), but as useful as that is, I'd like to get
>> feedback that is more quantitative and with a broader base.  This will
>> let us prioritize our development efforts to better address what
>> people are actually using it for, with less guesswork.  For instance:
>> we put a lot of effort into compression for 1.0.0; if it turned out
>> that only 1% of 1.0.x users actually enable compression, then it means
>> that we should spend less effort fine-tuning that moving forward, and
>> use the energy elsewhere.
>>
>> (Of course it could also mean that we did a terrible job getting the
>> word out about new features and explaining how to use them, but either
>> way, it would be good to know!)
>>
>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>> enabled by default.  It would send anonymous information about your
>> cluster to an apache.org VM.  Information like, number (but not names)
>> of keyspaces and columnfamilies, ks-level options like compression, cf
>> options like compaction strategy, data types (again, not names) of
>> columns, average row size (or better: the histogram data), and average
>> sstables per read.
>>
>> Thoughts?
>
>I think this is potentially quite dangerous; There are a lot people
>who get very twitchy at the idea of software that Phones Home.  I've
>seen this so many times, and in all cases it was for software a lot
>less sensitive than a database.
>
>I'm sure you've already considered this though, you're already talking
>about anonymity, and transparency, and what I assume is neutrality of
>the collection endpoint (can apache actually provide a VM; is that a
>thing?).  I'm just afraid that we'll scare people off before they can
>be properly convinced that it's all on the up-and-up.
>
>I'm curious to see what others think, but at the moment I'm hovering
>somewhere around a -0 if it were opt-in (off by default).
>
>-- 
>Eric Evans
>Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis  wrote:
> On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans  wrote:
>> I think this is potentially quite dangerous; There are a lot people
>> who get very twitchy at the idea of software that Phones Home.  I've
>> seen this so many times, and in all cases it was for software a lot
>> less sensitive than a database.
>
> True, but unlike most Home Phoners, ours will be out there in the open
> and you can see exactly what it's sending (or not, if you disable it).
>  I'm sure there's other examples in the wild of this, but the only one
> I can think of is popcorn [1].

I don't think the transparency of the implementation changes things
much.  It's still going to be opaque to a lot of folks, and more
importantly is the precedence it sets and the way it changes the
project/user trust relationship.

Even if you're satisfied with the implementation, and trust that it
won't be extended to transmit additional data later (unintentionally
or otherwise), there are still very valid privacy concerns.  For
example, seeing as how this must be transmitted over an IP network,
there are only so many guarantees you can make with respect to
anonymity.  There will always be *someone* that can tie the data to a
unique IP, and an IP can almost always be tied to an individual or
organization.  Imagine an organization that doesn't want *anyone* to
know it uses Cassandra, and isn't willing to accept the risk that one
of their admins might accidentally enable this reporting.

It's also interesting that you mention popcon because it has always
been contentious.  It's taken years for it to transition from the
point where it required users to install it themselves, to a prompt at
install-time that defaulted to "No", to the current state of an
install-time prompt that defaults to "Yes".  And, the installer asks
*very* few questions; Whether or not popcon is enabled is on par with
partitioning and the assignment of a root password.

Also, there should be no shame in the admission that we haven't earned
anywhere near the level of trust and respect that the Debian project
has.

> More broadly, my sense is that people are getting used to the idea
> that it's okay to give away anonymous statistics as part of the price
> of "free," although YMMclearlyV. I am, after all, a Windows user. :)

As privacy becomes more threatened people are either capitulating, or
becoming even more defensive; Whether that makes it better or worse
for us if we do this is debatable.

>> I'm sure you've already considered this though, you're already talking
>> about anonymity, and transparency, and what I assume is neutrality of
>> the collection endpoint (can apache actually provide a VM; is that a
>> thing?).
>
> Yes, they provide Ubuntu or FreeBSD VMs.
>
>> I'm just afraid that we'll scare people off before they can
>> be properly convinced that it's all on the up-and-up.
>
> How would you propose addressing this?

Honestly?  The best way to convince people that we take the privacy of
their data seriously is to not transmit any of it to a machine outside
their control.

>> I'm curious to see what others think, but at the moment I'm hovering
>> somewhere around a -0 if it were opt-in (off by default).
>
> I'm okay with opt-in if you think that's useful as a first step to
> ease the twitchiness you mention, but longer term I think it's only
> really useful if it's on by default. There's a lot of research that
> shows that people tend to stick with whatever is the path of least
> resistance [2], and specifically, my experience with Cassandra users
> is exactly that -- one reason we've spent so much effort getting
> defaults so good is because almost nobody goes beyond that.

It's even worse than that.  It's not just that you'll be receiving
less data, it will also be less meaningful (since it's from a
self-selecting group).

> [1] http://popcon.debian.org/
> [2] 
> http://www.richmondfed.org/publications/research/region_focus/2007/winter/pdf/feature2.pdf
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Brian O'Neill
Lively thread...

+1 opt-in
+1 in separate module

I'll just substantiate Rick Shaw's comments.  If this is on by default, I
can see it making its way into production at a large corporation, at which
time the traffic would sound an alarm as suspicious activity, which would
immediately get the server's plug pulled and trigger an investigation.
 That would land the architect responsible for deploying that server in the
proverbial principal's office.  In the extreme case, that might
"black-list" the technology and add fuel to any debate that the corporation
should just stick with the 'proven enterprise' solutions.  That is not my
perspective, just be aware that in some large corporations it is an uphill
battle to deploy Cassandra  in the first place given incumbent systems.

In every situation I've been in, even outside of large corporations, we
would need to disable this feature given the sensitivity of the data.

All that said... I would love to see this data. ;)
I'd love to know where our deployment lies on the spectrum of use.

Maybe a good old fashioned web form that allows companies to submit their
usage scenarios might accomplish the same goal? (and you could get
additional context information about the industry, etc.)  It wouldn't be
comprehensive, but it may be sufficiently representative.  Maybe you could
just output a couple lines at server start that said something like "Go
here http://... to see how your usage compares to others."

I personally wouldn't throw to big a hissy if it was incorporated into the
actual server and on by default, but I certainly know others that would.

-brian


On Wed, Nov 16, 2011 at 7:17 AM, Eric Evans  wrote:

> On Wed, Nov 16, 2011 at 2:01 AM, Jonathan Ellis  wrote:
> > On Tue, Nov 15, 2011 at 7:02 PM, Eric Evans  wrote:
> >> I think this is potentially quite dangerous; There are a lot people
> >> who get very twitchy at the idea of software that Phones Home.  I've
> >> seen this so many times, and in all cases it was for software a lot
> >> less sensitive than a database.
> >
> > True, but unlike most Home Phoners, ours will be out there in the open
> > and you can see exactly what it's sending (or not, if you disable it).
> >  I'm sure there's other examples in the wild of this, but the only one
> > I can think of is popcorn [1].
>
> I don't think the transparency of the implementation changes things
> much.  It's still going to be opaque to a lot of folks, and more
> importantly is the precedence it sets and the way it changes the
> project/user trust relationship.
>
> Even if you're satisfied with the implementation, and trust that it
> won't be extended to transmit additional data later (unintentionally
> or otherwise), there are still very valid privacy concerns.  For
> example, seeing as how this must be transmitted over an IP network,
> there are only so many guarantees you can make with respect to
> anonymity.  There will always be *someone* that can tie the data to a
> unique IP, and an IP can almost always be tied to an individual or
> organization.  Imagine an organization that doesn't want *anyone* to
> know it uses Cassandra, and isn't willing to accept the risk that one
> of their admins might accidentally enable this reporting.
>
> It's also interesting that you mention popcon because it has always
> been contentious.  It's taken years for it to transition from the
> point where it required users to install it themselves, to a prompt at
> install-time that defaulted to "No", to the current state of an
> install-time prompt that defaults to "Yes".  And, the installer asks
> *very* few questions; Whether or not popcon is enabled is on par with
> partitioning and the assignment of a root password.
>
> Also, there should be no shame in the admission that we haven't earned
> anywhere near the level of trust and respect that the Debian project
> has.
>
> > More broadly, my sense is that people are getting used to the idea
> > that it's okay to give away anonymous statistics as part of the price
> > of "free," although YMMclearlyV. I am, after all, a Windows user. :)
>
> As privacy becomes more threatened people are either capitulating, or
> becoming even more defensive; Whether that makes it better or worse
> for us if we do this is debatable.
>
> >> I'm sure you've already considered this though, you're already talking
> >> about anonymity, and transparency, and what I assume is neutrality of
> >> the collection endpoint (can apache actually provide a VM; is that a
> >> thing?).
> >
> > Yes, they provide Ubuntu or FreeBSD VMs.
> >
> >> I'm just afraid that we'll scare people off before they can
> >> be properly convinced that it's all on the up-and-up.
> >
> > How would you propose addressing this?
>
> Honestly?  The best way to convince people that we take the privacy of
> their data seriously is to not transmit any of it to a machine outside
> their control.
>
> >> I'm curious to see what others think, but at the moment I'm hovering
> >> somewhere around

Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
On Wed, Nov 16, 2011 at 8:11 AM, Brian O'Neill  wrote:
> Maybe a good old fashioned web form that allows companies to submit their
> usage scenarios might accomplish the same goal?

I suppose we could dump the information to a file somewhere as a
fallback way to submit, but I wouldn't want to make that the only
option -- too much effort raises the bar to where most people won't
bother. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 2:17 PM, Jonathan Ellis  wrote:
> On Wed, Nov 16, 2011 at 8:11 AM, Brian O'Neill  wrote:
>> Maybe a good old fashioned web form that allows companies to submit their
>> usage scenarios might accomplish the same goal?
>
> I suppose we could dump the information to a file somewhere as a
> fallback way to submit, but I wouldn't want to make that the only
> option -- too much effort raises the bar to where most people won't
> bother. :)

How much value do you think there is from a subset sent from a
self-selecting group?  Would this still be useful?  How would you
interpret the results; What would it say about the users who sent in
data versus those that didn't?

I guess an equally valid question is, how many users are we willing to
scare off (opt-in or opt-out) before its not worth it?

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
On Wed, Nov 16, 2011 at 8:28 AM, Eric Evans  wrote:
> How much value do you think there is from a subset sent from a
> self-selecting group?

A fair amount. [1] [2]

[1] 
http://amp.cs.berkeley.edu/wp-content/uploads/2011/06/The-Case-for-Evaluating-MapReduce-Performance-Using-Workload-Suites.pdf
[2] 
http://www.slideshare.net/cloudera/hadoop-world-2011-hadoop-and-performance-todd-lipcon-yanpei-chen-cloudera

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Gary Dusbabek
On Tue, Nov 15, 2011 at 17:23, Jonathan Ellis  wrote:
> I started a "users survey" thread over on the users list (replies are
> still trickling in), but as useful as that is, I'd like to get
> feedback that is more quantitative and with a broader base.  This will
> let us prioritize our development efforts to better address what
> people are actually using it for, with less guesswork.  For instance:

If we're having a hard time prioritizing work on Cassandra we should
consider if the work left is not worth doing or is all the same
priority.

> we put a lot of effort into compression for 1.0.0; if it turned out
> that only 1% of 1.0.x users actually enable compression, then it means
> that we should spend less effort fine-tuning that moving forward, and
> use the energy elsewhere.
>

Here is what should determine where energy is spent:  if enough people
are willing to expend the effort to voice their concerns about feature
X in JIRA and on the mailing list, and there are people willing to do
the technical work, and it doesn't represent a technical Wrong Turn
for the project, then it should (it will) get worked on.

> (Of course it could also mean that we did a terrible job getting the
> word out about new features and explaining how to use them, but either
> way, it would be good to know!)
>
> I propose adding a basic cluster reporting feature to cassandra.yaml,
> enabled by default.  It would send anonymous information about your
> cluster to an apache.org VM.  Information like, number (but not names)
> of keyspaces and columnfamilies, ks-level options like compression, cf
> options like compaction strategy, data types (again, not names) of
> columns, average row size (or better: the histogram data), and average
> sstables per read.
>
> Thoughts?

IMO this well-intentioned proposal, and most of the comments have
missed the mark entirely.

As open-source software, the community should decide which features
(including this proposal) make it into Cassandra.  Isn't this how it
is supposed to work?

Gary.

>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
On Wed, Nov 16, 2011 at 8:46 AM, Gary Dusbabek  wrote:
> Here is what should determine where energy is spent:  if enough people
> are willing to expend the effort to voice their concerns about feature
> X in JIRA and on the mailing list, and there are people willing to do
> the technical work, and it doesn't represent a technical Wrong Turn
> for the project, then it should (it will) get worked on.

Well, sort of.  I'm *willing* to work on all or most of the 217 open
Cassandra tickets, but since I don't have time to do them all I need
to prioritize aggressively.  My motivation here is to get more data
for that prioritization, which so far has been mostly guided by
intuition.

It sounds like your implicit assumption is that jira + mailing list
are a good enough approximation for who-is-using-what, but I'm not
sure that's the case.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Jake Luciani
Having worked at places where you get fired if software *attempts* to
contact outside world I understand the concerns.

However, if it's opt-in via config file and requires a restart then there
is no reason why it should be a concern.


On Wed, Nov 16, 2011 at 3:29 AM, Zhu Han  wrote:

> On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer  wrote:
>
> > 2011/11/16 Jonathan Ellis :
> > > I started a "users survey" thread over on the users list (replies are
> > > still trickling in), but as useful as that is, I'd like to get
> > > feedback that is more quantitative and with a broader base.  This will
> > > let us prioritize our development efforts to better address what
> > > people are actually using it for, with less guesswork.  For instance:
> > > we put a lot of effort into compression for 1.0.0; if it turned out
> > > that only 1% of 1.0.x users actually enable compression, then it means
> > > that we should spend less effort fine-tuning that moving forward, and
> > > use the energy elsewhere.
> > >
> > > (Of course it could also mean that we did a terrible job getting the
> > > word out about new features and explaining how to use them, but either
> > > way, it would be good to know!)
> > >
> > > I propose adding a basic cluster reporting feature to cassandra.yaml,
> > > enabled by default.  It would send anonymous information about your
> > > cluster to an apache.org VM.  Information like, number (but not names)
> > > of keyspaces and columnfamilies, ks-level options like compression, cf
> > > options like compaction strategy, data types (again, not names) of
> > > columns, average row size (or better: the histogram data), and average
> > > sstables per read.
> > >
> > > Thoughts?
> >
>
> -1.
>
> It may scare some admins who stores sensitive data  in cassandra. Even if
> it can
> disabled, we can not sleep well in the night when we know the door can be
> opened unintentionally...
>
>
> > Hi there,
> >
> > I'm not a cassandra dev but an user of it. I would really "hate" to
> > see such code in the cassandra code-base. I understand that it would
> > be kind of useful to get a better feeling about usage etc, but its
> > really something that scares the shit out of many managers (and even
> > devs ;) ).
> >
> > So -1 to add this code (*non-binding)
> >
> > Bye,
> > Norman
> >
>



-- 
http://twitter.com/tjake


Re: How is Cassandra being used?

2011-11-16 Thread Joe Stein
This brings up a nice possibility also for businesses to have Cassandra
outsourced monitoring service(s)/solutions(s) from ring
infrastructure perspective.

Any chance the URL Cassandra posts information too can be configurable
also?  Maybe even an abstract class so others can extend it and the first
implementation is for this stuff?  Makes it easy for your support contract
to have people pull up a dashboard of your cassandra cluster without having
to give them access to your production network (I hate giving access to
anyone to my production network but so many consulting/support companies
require this (b) or you even have to do anything... the service can
even be proactive when things start to get naughty.

On Wed, Nov 16, 2011 at 11:35 AM, Jake Luciani  wrote:

> Having worked at places where you get fired if software *attempts* to
> contact outside world I understand the concerns.
>
> However, if it's opt-in via config file and requires a restart then there
> is no reason why it should be a concern.
>
>
> On Wed, Nov 16, 2011 at 3:29 AM, Zhu Han  wrote:
>
> > On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer 
> wrote:
> >
> > > 2011/11/16 Jonathan Ellis :
> > > > I started a "users survey" thread over on the users list (replies are
> > > > still trickling in), but as useful as that is, I'd like to get
> > > > feedback that is more quantitative and with a broader base.  This
> will
> > > > let us prioritize our development efforts to better address what
> > > > people are actually using it for, with less guesswork.  For instance:
> > > > we put a lot of effort into compression for 1.0.0; if it turned out
> > > > that only 1% of 1.0.x users actually enable compression, then it
> means
> > > > that we should spend less effort fine-tuning that moving forward, and
> > > > use the energy elsewhere.
> > > >
> > > > (Of course it could also mean that we did a terrible job getting the
> > > > word out about new features and explaining how to use them, but
> either
> > > > way, it would be good to know!)
> > > >
> > > > I propose adding a basic cluster reporting feature to cassandra.yaml,
> > > > enabled by default.  It would send anonymous information about your
> > > > cluster to an apache.org VM.  Information like, number (but not
> names)
> > > > of keyspaces and columnfamilies, ks-level options like compression,
> cf
> > > > options like compaction strategy, data types (again, not names) of
> > > > columns, average row size (or better: the histogram data), and
> average
> > > > sstables per read.
> > > >
> > > > Thoughts?
> > >
> >
> > -1.
> >
> > It may scare some admins who stores sensitive data  in cassandra. Even if
> > it can
> > disabled, we can not sleep well in the night when we know the door can be
> > opened unintentionally...
> >
> >
> > > Hi there,
> > >
> > > I'm not a cassandra dev but an user of it. I would really "hate" to
> > > see such code in the cassandra code-base. I understand that it would
> > > be kind of useful to get a better feeling about usage etc, but its
> > > really something that scares the shit out of many managers (and even
> > > devs ;) ).
> > >
> > > So -1 to add this code (*non-binding)
> > >
> > > Bye,
> > > Norman
> > >
> >
>
>
>
> --
> http://twitter.com/tjake
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 2:59 PM, Jonathan Ellis  wrote:
> On Wed, Nov 16, 2011 at 8:46 AM, Gary Dusbabek  wrote:
>> Here is what should determine where energy is spent:  if enough people
>> are willing to expend the effort to voice their concerns about feature
>> X in JIRA and on the mailing list, and there are people willing to do
>> the technical work, and it doesn't represent a technical Wrong Turn
>> for the project, then it should (it will) get worked on.
>
> Well, sort of.  I'm *willing* to work on all or most of the 217 open
> Cassandra tickets, but since I don't have time to do them all I need
> to prioritize aggressively.  My motivation here is to get more data
> for that prioritization, which so far has been mostly guided by
> intuition.
>
> It sounds like your implicit assumption is that jira + mailing list
> are a good enough approximation for who-is-using-what, but I'm not
> sure that's the case.

There probably is a rather large group of "shadow" users whose
(valuable?) input doesn't make it to the list or bug tracker.  It
sounds like Gary is questioning whether we should be giving these
people a voice.  Assuming I have that right, I agree that's a very
good question.  This is a community-based project after all.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
On Wed, Nov 16, 2011 at 10:56 AM, Eric Evans  wrote:
> There probably is a rather large group of "shadow" users whose
> (valuable?) input doesn't make it to the list or bug tracker.  It
> sounds like Gary is questioning whether we should be giving these
> people a voice.  Assuming I have that right, I agree that's a very
> good question.  This is a community-based project after all.

First, as attractive (and easy!) as it is to live inside our echo
chamber, yes, I do think we should give them a voice.  Of course, that
doesn't mean you're obliged to listen to it.  If you don't think that
is a valuable source of input for prioritizing your work, you're free
to ignore it.

Second, what I'm talking about is a different type of data from what
you get on jira + ML.  Those are "negative" sources of information --
you mostly only find out someone is using compression if they have a
problem with it.  How many people are using it with no problems?  That
is what this would let us start to find out.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: hintedhandoff in 1.0.3

2011-11-16 Thread Jonathan Ellis
Keys in HCF are nodes it has hints for.  You can try forcing delivery
to the node that still has hints.  It's also possible that new hints
were created (because that node timed out some writes) during the
delivery of the first ones.

On Tue, Nov 15, 2011 at 3:42 AM, Radim Kolar  wrote:
> Same problem on other node:  2 keys in HintsColumnFamily. One delivered, one
> left.
>
>  INFO [HintedHandoff:1] 2011-11-15 10:31:53,181 HintedHandOffManager.java
> (line 268) Started hinted handoff for token:
> 99070591730234615865843651857942052864
>  INFO [HintedHandoff:1] 2011-11-15 10:32:49,385 ColumnFamilyStore.java (line
> 688) Enqueuing flush of Memtable-HintsColumnFamily@797897458(1674737/2093421
> serialized/live bytes, 6176 ops)
>  INFO [FlushWriter:5] 2011-11-15 10:32:49,386 Memtable.java (line 239)
> Writing Memtable-HintsColumnFamily@797897458(1674737/2093421 serialized/live
> bytes, 6176 ops)
>  INFO [CompactionExecutor:10] 2011-11-15 10:32:49,387 CompactionTask.java
> (line 112) Compacting
> [SSTableReader(path='/usr/local/cassandra/data/system/HintsColumnFamily-hb-754-Data.db'),
> SSTableReader(path='/usr/local/cassandra/data/system/HintsColumnFamily-hb-752-Data.db')]
>  INFO [FlushWriter:5] 2011-11-15 10:32:49,523 Memtable.java (line 275)
> Completed flushing
> /usr/local/cassandra/data/system/HintsColumnFamily-hb-755-Data.db (1888357
> bytes)
>  INFO [CompactionExecutor:10] 2011-11-15 10:32:49,820 CompactionTask.java
> (line 213) Compacted to
> [/usr/local/cassandra/data/system/HintsColumnFamily-hb-756-Data.db,].
>  19,913,818 to 19,913,392 (~99% of original) bytes for 2 keys at
> 43.960395MB/s.  Time: 432ms.
>  INFO [HintedHandoff:1] 2011-11-15 10:32:49,820 HintedHandOffManager.java
> (line 334) Finished hinted handoff of 5796 rows
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Eric Evans
On Wed, Nov 16, 2011 at 5:06 PM, Jonathan Ellis  wrote:
> On Wed, Nov 16, 2011 at 10:56 AM, Eric Evans  wrote:
>> There probably is a rather large group of "shadow" users whose
>> (valuable?) input doesn't make it to the list or bug tracker.  It
>> sounds like Gary is questioning whether we should be giving these
>> people a voice.  Assuming I have that right, I agree that's a very
>> good question.  This is a community-based project after all.
>
> First, as attractive (and easy!) as it is to live inside our echo
> chamber, yes, I do think we should give them a voice.  Of course, that
> doesn't mean you're obliged to listen to it.  If you don't think that
> is a valuable source of input for prioritizing your work, you're free
> to ignore it.

Assuming that there were no strings attached to collecting the data, I
would agree completely.  But since your proposing that we add code to
something that we all use, it's fair to question the benefit.

> Second, what I'm talking about is a different type of data from what
> you get on jira + ML.  Those are "negative" sources of information --
> you mostly only find out someone is using compression if they have a
> problem with it.  How many people are using it with no problems?  That
> is what this would let us start to find out.

There's no doubt that information is power.  That's both the source of
the appeal, and the potential to scare off users.  Incidentally, I
suspect it's those very same shadow-users we stand to scare off.  It
might be the case that we don't know whether or not they're using
compression now, but we won't necessarily know how many we've lost if
we starting sending data offsite either.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: How is Cassandra being used?

2011-11-16 Thread Jonathan Ellis
Sounds like the consensus is that if this is a good idea at all, it
needs to be opt-in.  Like I said earlier, I can live with that.

On Wed, Nov 16, 2011 at 10:35 AM, Jake Luciani  wrote:
> Having worked at places where you get fired if software *attempts* to
> contact outside world I understand the concerns.
>
> However, if it's opt-in via config file and requires a restart then there
> is no reason why it should be a concern.
>
>
> On Wed, Nov 16, 2011 at 3:29 AM, Zhu Han  wrote:
>
>> On Wed, Nov 16, 2011 at 3:03 PM, Norman Maurer  wrote:
>>
>> > 2011/11/16 Jonathan Ellis :
>> > > I started a "users survey" thread over on the users list (replies are
>> > > still trickling in), but as useful as that is, I'd like to get
>> > > feedback that is more quantitative and with a broader base.  This will
>> > > let us prioritize our development efforts to better address what
>> > > people are actually using it for, with less guesswork.  For instance:
>> > > we put a lot of effort into compression for 1.0.0; if it turned out
>> > > that only 1% of 1.0.x users actually enable compression, then it means
>> > > that we should spend less effort fine-tuning that moving forward, and
>> > > use the energy elsewhere.
>> > >
>> > > (Of course it could also mean that we did a terrible job getting the
>> > > word out about new features and explaining how to use them, but either
>> > > way, it would be good to know!)
>> > >
>> > > I propose adding a basic cluster reporting feature to cassandra.yaml,
>> > > enabled by default.  It would send anonymous information about your
>> > > cluster to an apache.org VM.  Information like, number (but not names)
>> > > of keyspaces and columnfamilies, ks-level options like compression, cf
>> > > options like compaction strategy, data types (again, not names) of
>> > > columns, average row size (or better: the histogram data), and average
>> > > sstables per read.
>> > >
>> > > Thoughts?
>> >
>>
>> -1.
>>
>> It may scare some admins who stores sensitive data  in cassandra. Even if
>> it can
>> disabled, we can not sleep well in the night when we know the door can be
>> opened unintentionally...
>>
>>
>> > Hi there,
>> >
>> > I'm not a cassandra dev but an user of it. I would really "hate" to
>> > see such code in the cassandra code-base. I understand that it would
>> > be kind of useful to get a better feeling about usage etc, but its
>> > really something that scares the shit out of many managers (and even
>> > devs ;) ).
>> >
>> > So -1 to add this code (*non-binding)
>> >
>> > Bye,
>> > Norman
>> >
>>
>
>
>
> --
> http://twitter.com/tjake
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Ryan King
On Wed, Nov 16, 2011 at 10:02 AM, Jonathan Ellis  wrote:
> Sounds like the consensus is that if this is a good idea at all, it
> needs to be opt-in.  Like I said earlier, I can live with that.

In addition, if you want to get data from large companies that manage
their own datacenters, there needs to be a way to contribute data
without the software phoning home automatically. We aren't allowed to
make connections to the outside world from our datacenter. And I'm not
willing to ask for an exception for this.

A mode that dumps the data to a file which can be uploaded would be
preferable. People probably won't do it often, but imagine if your
periodic "how are you using cassandra?" email threads included data?

-ryan


Re: [VOTE] Release Apache Cassandra 1.0.3 (take 2)

2011-11-16 Thread Jonathan Ellis
I'm +1 on either these artifacts as is, or these artifacts with thrift
rebuilt to reflect the correct api version

On Tue, Nov 15, 2011 at 7:46 AM, Eric Evans  wrote:
> On Tue, Nov 15, 2011 at 1:40 AM, Sylvain Lebresne  
> wrote:
>> So, CASSANDRA-3491 and CASSANDRA-3492 got in the way of the first take.
>> Now that they are fixed, let's try again. I propose the following artifacts
>> for release as 1.0.3.
>>
>> SVN: 
>> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-1.0@1202082
>> Artifacts: 
>> https://repository.apache.org/content/repositories/orgapachecassandra-186/org/apache/cassandra/apache-cassandra/1.0.3/
>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachecassandra-186/
>>
>> The artifacts as well as the debian package are also available here:
>> http://people.apache.org/~slebresne/
>>
>> The vote will be open for 72 hours (longer if needed).
>>
>> [1]: http://goo.gl/I1dZG (CHANGES.txt)
>> [2]: http://goo.gl/PeD3Z (NEWS.txt)
>>
>
> It looks like interface/cassandra.thrift has changed without the Java
> code being regenerated.  The test_describe system test is failing
> because of this, (the versions don't match).
>
> Probably not justification for a re-roll, but not a great thing for
> the release either...
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How is Cassandra being used?

2011-11-16 Thread Bill

> Thoughts?
>

We'll turn this off, and would possibly patch it out of the code. That's 
not to say it wouldn't be useful to others.


Bill


On 15/11/11 23:23, Jonathan Ellis wrote:

I started a "users survey" thread over on the users list (replies are
still trickling in), but as useful as that is, I'd like to get
feedback that is more quantitative and with a broader base.  This will
let us prioritize our development efforts to better address what
people are actually using it for, with less guesswork.  For instance:
we put a lot of effort into compression for 1.0.0; if it turned out
that only 1% of 1.0.x users actually enable compression, then it means
that we should spend less effort fine-tuning that moving forward, and
use the energy elsewhere.

(Of course it could also mean that we did a terrible job getting the
word out about new features and explaining how to use them, but either
way, it would be good to know!)

I propose adding a basic cluster reporting feature to cassandra.yaml,
enabled by default.  It would send anonymous information about your
cluster to an apache.org VM.  Information like, number (but not names)
of keyspaces and columnfamilies, ks-level options like compression, cf
options like compaction strategy, data types (again, not names) of
columns, average row size (or better: the histogram data), and average
sstables per read.

Thoughts?






Build failed in Jenkins: Cassandra-quick #124

2011-11-16 Thread Apache Jenkins Server
See 

Changes:

[jbellis] merge from 1.0

--
[...truncated 1585 lines...]
[junit] 
[junit] Testsuite: 
org.apache.cassandra.locator.OldNetworkTopologyStrategyTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.163 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.locator.ReplicationStrategyEndpointCacheTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.512 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.SimpleStrategyTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.682 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.TokenMetadataTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.453 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceCounterTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.683 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceCounterTest):
 Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceCounterTest 
FAILED
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceStandardTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.388 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceStandardTest):
Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceStandardTest 
FAILED
[junit] Testsuite: org.apache.cassandra.service.CassandraServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.46 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.service.ConsistencyLevelTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.779 sec
[junit] 
[junit] Testcase: 
testReadWriteConsistencyChecks(org.apache.cassandra.service.ConsistencyLevelTest):
Caused an ERROR
[junit] invalid consistency level: ANY
[junit] java.lang.UnsupportedOperationException: invalid consistency level: 
ANY
[junit] at 
org.apache.cassandra.service.ReadCallback.determineBlockFor(ReadCallback.java:195)
[junit] at 
org.apache.cassandra.service.ReadCallback.(ReadCallback.java:68)
[junit] at 
org.apache.cassandra.service.StorageProxy.getReadCallback(StorageProxy.java:798)
[junit] at 
org.apache.cassandra.service.ConsistencyLevelTest.testReadWriteConsistencyChecks(ConsistencyLevelTest.java:110)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.ConsistencyLevelTest FAILED
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] 
[junit] Testcase: 
org.apache.cassandra.service.EmbeddedCassandraServiceTest:BeforeFirstTest:  
  Caused an ERROR
[junit] Forked Java VM exited abnormally

Build failed in Jenkins: Cassandra #1209

2011-11-16 Thread Apache Jenkins Server
See 

Changes:

[jbellis] merge from 1.0

--
[...truncated 2243 lines...]
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.163 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.locator.ReplicationStrategyEndpointCacheTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.513 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.SimpleStrategyTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.687 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.TokenMetadataTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.453 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceCounterTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.607 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceCounterTest):
 Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceCounterTest 
FAILED
[junit] Testsuite: 
org.apache.cassandra.service.AntiEntropyServiceStandardTest
[junit] Tests run: 6, Failures: 0, Errors: 1, Time elapsed: 2.437 sec
[junit] 
[junit] Testcase: 
testValidatorPrepare(org.apache.cassandra.service.AntiEntropyServiceStandardTest):
Caused an ERROR
[junit] /127.0.0.1:7010 is in use by another process.  Change 
listen_address:storage_port in cassandra.yaml to values that do not conflict 
with other services
[junit] org.apache.cassandra.config.ConfigurationException: /127.0.0.1:7010 
is in use by another process.  Change listen_address:storage_port in 
cassandra.yaml to values that do not conflict with other services
[junit] at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:271)
[junit] at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:241)
[junit] at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:484)
[junit] at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:461)
[junit] at 
org.apache.cassandra.service.AntiEntropyServiceTestAbstract.prepare(AntiEntropyServiceTestAbstract.java:80)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.AntiEntropyServiceStandardTest 
FAILED
[junit] Testsuite: org.apache.cassandra.service.CassandraServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.461 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.service.ConsistencyLevelTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.776 sec
[junit] 
[junit] Testcase: 
testReadWriteConsistencyChecks(org.apache.cassandra.service.ConsistencyLevelTest):
Caused an ERROR
[junit] invalid consistency level: ANY
[junit] java.lang.UnsupportedOperationException: invalid consistency level: 
ANY
[junit] at 
org.apache.cassandra.service.ReadCallback.determineBlockFor(ReadCallback.java:195)
[junit] at 
org.apache.cassandra.service.ReadCallback.(ReadCallback.java:68)
[junit] at 
org.apache.cassandra.service.StorageProxy.getReadCallback(StorageProxy.java:798)
[junit] at 
org.apache.cassandra.service.ConsistencyLevelTest.testReadWriteConsistencyChecks(ConsistencyLevelTest.java:110)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.service.ConsistencyLevelTest FAILED
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Testsuite: org.apache.cassandra.service.EmbeddedCassandraServiceTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] 
[junit] Testcase: 
org.apache.cassandra.service.EmbeddedCassandraServiceTest:BeforeFirstTest:  
  Caused an ERROR
[junit] Forked Java VM exited abnormally. Please note the time in the 
report does not reflect the time until the VM exit.
[junit] junit.

Jenkins build is still unstable: Cassandra-Coverage #168

2011-11-16 Thread Apache Jenkins Server
See 




Re: How is Cassandra being used?

2011-11-16 Thread Jeremy Hanna
Sounds like it would be best if it were in a separate jar for people?

On Nov 16, 2011, at 4:58 PM, Bill wrote:

> > Thoughts?
> >
> 
> We'll turn this off, and would possibly patch it out of the code. That's not 
> to say it wouldn't be useful to others.
> 
> Bill
> 
> 
> On 15/11/11 23:23, Jonathan Ellis wrote:
>> I started a "users survey" thread over on the users list (replies are
>> still trickling in), but as useful as that is, I'd like to get
>> feedback that is more quantitative and with a broader base.  This will
>> let us prioritize our development efforts to better address what
>> people are actually using it for, with less guesswork.  For instance:
>> we put a lot of effort into compression for 1.0.0; if it turned out
>> that only 1% of 1.0.x users actually enable compression, then it means
>> that we should spend less effort fine-tuning that moving forward, and
>> use the energy elsewhere.
>> 
>> (Of course it could also mean that we did a terrible job getting the
>> word out about new features and explaining how to use them, but either
>> way, it would be good to know!)
>> 
>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>> enabled by default.  It would send anonymous information about your
>> cluster to an apache.org VM.  Information like, number (but not names)
>> of keyspaces and columnfamilies, ks-level options like compression, cf
>> options like compaction strategy, data types (again, not names) of
>> columns, average row size (or better: the histogram data), and average
>> sstables per read.
>> 
>> Thoughts?
>> 
> 
> 



Re: How is Cassandra being used?

2011-11-16 Thread Jeremiah Jordan
+1 for a separate jar (and a second download link that doesn't include this 
jar, though I would make the primary link include it with BIG BOLD PRINT saying 
it is in there)
+1 for a config option to turn off auto-post (defaulted on in the download that 
has the jar)
+1 for a nodetool command to dump it to a file for manual posting

I think this could be a good debugging tool as well.  Have a command to dump 
"here is what my cluster looks like" to a file, that could then be sent though 
email for others to be used help resolve issues with would be nice.  The 
current nodetool information commands have too much stuff that needs to be 
sanitized out before you can send it outside the firewall.

- Jeremiah

On Nov 16, 2011, at 7:16 PM, Jeremy Hanna wrote:

> Sounds like it would be best if it were in a separate jar for people?
> 
> On Nov 16, 2011, at 4:58 PM, Bill wrote:
> 
>>> Thoughts?
>>> 
>> 
>> We'll turn this off, and would possibly patch it out of the code. That's not 
>> to say it wouldn't be useful to others.
>> 
>> Bill
>> 
>> 
>> On 15/11/11 23:23, Jonathan Ellis wrote:
>>> I started a "users survey" thread over on the users list (replies are
>>> still trickling in), but as useful as that is, I'd like to get
>>> feedback that is more quantitative and with a broader base.  This will
>>> let us prioritize our development efforts to better address what
>>> people are actually using it for, with less guesswork.  For instance:
>>> we put a lot of effort into compression for 1.0.0; if it turned out
>>> that only 1% of 1.0.x users actually enable compression, then it means
>>> that we should spend less effort fine-tuning that moving forward, and
>>> use the energy elsewhere.
>>> 
>>> (Of course it could also mean that we did a terrible job getting the
>>> word out about new features and explaining how to use them, but either
>>> way, it would be good to know!)
>>> 
>>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>>> enabled by default.  It would send anonymous information about your
>>> cluster to an apache.org VM.  Information like, number (but not names)
>>> of keyspaces and columnfamilies, ks-level options like compression, cf
>>> options like compaction strategy, data types (again, not names) of
>>> columns, average row size (or better: the histogram data), and average
>>> sstables per read.
>>> 
>>> Thoughts?
>>> 
>> 
>> 
>