Re: CMS GC / fragmentation / memtables etc

2014-05-21 Thread Benedict Elliott Smith
Graham,

This is largely fixed in 2.1 with the introduction of partially off-heap
memtables - the slabs reside off-heap, so do not cause any GC issues.

As it happens the changes would also permit us to recycle on-heap slabs
reasonable easily as well, so feel free to file a ticket for that, although
it won't be back ported to 2.0.


On 21 May 2014 00:57, graham sanderson  wrote:

> So i’ve been tinkering a bit with CMS config because we are still seeing
> fairly frequent full compacting GC due to framgentation/promotion failure
>
> As mentioned below, we are usually too fragmented to promote new in-flight
> memtables.
>
> This is likely caused by sudden write spikes (which we do have), though
> actually the problems don’t generally happen at that time of our largest
> write spikes (though any write spikes likely cause spill of both new
> memtables along with many other new objects of unknown size into the
> tenured gen, so they cause fragmentation if not immediate GC issue). We
> have lots of things going on in this multi-tenant cluster (GC pauses are of
> course extra bad, since they cause spike in hinted-handoff on other nodes
> which were already busy etc…)
>
> Anyway, considering possibilities:
>
> 0) Try and make our application behavior more steady state - this is
> probably possible, but there are lots of other things (e.g. compaction,
> opscenter, repair etc.) which are both tunable and generally throttle-able
> to think about too.
> 1) Play with tweaking PLAB configs to see if we can ease fragmentation
> (I’d be curious what the “crud” is in particular that is getting spilled -
> presumably it is larger objects since it affects the binary tree of large
> objects)
> 2) Given the above, if we can guarantee even > 24 hours without full GC, I
> don’t think we’d mind running a regular rolling re-start on the servers
> during off hours (note usually the GCs don’t have a visible impact, but
> when they hit multiple machines at once they can)
> 3) Zing is seriously an option, if it would save us large amounts of
> tuning, and constant worry about the “next” thing tweaking the allocation
> patterns - does anyone have any experience with Zing & Cassandra
> 4) Given that we expect periodic bursts of writes,
> memtable_total_space_in_mb is bounded, we are not actually short of memory
> (it just gets fragmented), I’m wondering if anyone has played with pinning
> (up to or initially?) that many 1MB chunks of memory via SlabAllocator and
> re-using… It will get promoted once, and then these 1M chunks won’t be part
> of the subsequent promotion hassle… it will probably also allow more crud
> to die in eden under write load since we aren’t allocating these large
> chunks in eden at the same time. Anyway, I had a little look at the code,
> and the life cycles of memtables is not trivial, but was considering
> attempting a patch to play with… anyone have any thoughts?
>
> Basically in summary, the Slab allocator helps by allocating and freeing
> lots of objects at the same time, however any time slabs are allocated
> under load, we end up promoting them with whatever other live stuff in eden
> is still there. If we only do this once and reuse the slabs, we are likely
> to minimize our promotion problem later (at least for these large objects)
>
> On May 16, 2014, at 9:37 PM, graham sanderson  wrote:
>
> > Excellent - thank you…
> >
> > On May 16, 2014, at 7:08 AM, Samuel CARRIERE 
> wrote:
> >
> >> Hi,
> >> This is arena allocation of memtables. See here for more infos :
> >> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
> >>
> >>
> >>
> >>
> >> De :graham sanderson 
> >> A : dev@cassandra.apache.org,
> >> Date :  16/05/2014 14:03
> >> Objet : Things that are about 1M big
> >>
> >>
> >>
> >> So just throwing this out there for those for whom this might ring a
> bell.
> >>
> >> I?m debugging some CMS memory fragmentation issues on 2.0.5 - and
> >> interestingly enough most of the objects giving us promotion failures
> are
> >> of size 131074 (dwords) - GC logging obviously doesn?t say what those
> are,
> >> but I?d wager money they are either 1M big byte arrays, or less likely
> >> 256k entry object arrays backing large maps
> >>
> >> So not strictly critical to solving my problem, but I was wondering if
> >> anyone can think of any heap allocated C* objects which are (with no
> >> significant changes to standard cassandra config) allocated in 1M
> chunks.
> >> (It would save me scouring the code, or a 9 gig heap dump if I need to
> >> figure it out!)
> >>
> >> Thanks,
> >>
> >> Graham
> >
>
>


Re: CQL unit tests vs dtests

2014-05-21 Thread Sylvain Lebresne
Just to be clear, I'm not strongly opposed to having CQL tests in the unit
tests suite per-se (I happen to find dtests easier to work with, probably
because I don't use debuggers, but I'm good with saying that this just mean
I'm crazy and shouldn't be taken into account). Having tests that are
intrinsically
the same kind of tests in two places bugs me a bit more however. We
currently
have all of our CQL tests in dtests (
https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py)
and there is quite a bunch of them. Here we're talking about starting to
slowly
add the same kind of tests in the unit test suite. So do we start adding
every
new CQL test from now on in the unit tests? And what about the existing
tests?

--
Sylvain



On Wed, May 21, 2014 at 4:51 AM, Brandon Williams  wrote:

> On Tue, May 20, 2014 at 6:42 PM, Jonathan Ellis  wrote:
>
> > So my preferred approach is, unit test when possible without writing a
> lot
> > of scaffolding and mock superstructure. Mocking is your code telling you
> to
> > write a system test.
>
>
> This.
>


Re: CQL unit tests vs dtests

2014-05-21 Thread Ryan Svihla
The standard reasoning for unit tests is specificity of errors. Well
written tests suites tell you where you screwed up exactly just by the
success and failure pattern, often cutting down the need for a debugger.

System tests standard rational is validating these units are wired up
correctly. Hence where the doubling up of having system tests and unit
tests occasionally overlap code pathways is considered worth it.

In my experience, unless everyone is on board with crafting isolated unit
tests well however the promised Nirvana of rapid feedback, specific errors
and proper test coverage never happens.
On May 21, 2014 4:07 AM, "Sylvain Lebresne"  wrote:

> Just to be clear, I'm not strongly opposed to having CQL tests in the unit
> tests suite per-se (I happen to find dtests easier to work with, probably
> because I don't use debuggers, but I'm good with saying that this just mean
> I'm crazy and shouldn't be taken into account). Having tests that are
> intrinsically
> the same kind of tests in two places bugs me a bit more however. We
> currently
> have all of our CQL tests in dtests (
> https://github.com/riptano/cassandra-dtest/blob/master/cql_tests.py)
> and there is quite a bunch of them. Here we're talking about starting to
> slowly
> add the same kind of tests in the unit test suite. So do we start adding
> every
> new CQL test from now on in the unit tests? And what about the existing
> tests?
>
> --
> Sylvain
>
>
>
> On Wed, May 21, 2014 at 4:51 AM, Brandon Williams 
> wrote:
>
> > On Tue, May 20, 2014 at 6:42 PM, Jonathan Ellis 
> wrote:
> >
> > > So my preferred approach is, unit test when possible without writing a
> > lot
> > > of scaffolding and mock superstructure. Mocking is your code telling
> you
> > to
> > > write a system test.
> >
> >
> > This.
> >
>


[VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Sylvain Lebresne
Since we closed the first try we've fixed even more bugs, with notable
things
like CASSANDRA-6285. In any case, that changelog is getting pretty damn
long so
I propose the following artifacts for release as 2.0.8.

sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1012/

The artifacts as well as the debian package are also available here:
http://people.apache.org/~slebresne/

The vote will be open for 72 hours (longer if needed).

[1]: http://goo.gl/EE3aHy (CHANGES.txt)
[2]: http://goo.gl/dkl3Yu (NEWS.txt)


Re: [VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Gary Dusbabek
One of the reasons it is so long is because there are quite a few duplicate
entries.

* (Hadoop) support authentication in CqlRecordReader (CASSANDRA-7221)
 * (Hadoop) Close java driver Cluster in CQLRR.close (CASSANDRA-7228)
 * Fix potential SlabAllocator yield-starvation (CASSANDRA-7133)
 * Warn when 'USING TIMESTAMP' is used on a CAS BATCH (CASSANDRA-7067)
 * Starting threads in OutboundTcpConnectionPool constructor causes
race conditions (CASSANDRA-7177)
 * return all cpu values from BackgroundActivityMonitor.readAndCompute
(CASSANDRA-7183)
 * fix c* launch issues on Russian os's due to output of linux 'free'
cmd (CASSANDRA-6162)
 * Fix disabling autocompaction (CASSANDRA-7187)
 * Fix potential NumberFormatException when deserializing IntegerType
(CASSANDRA-7088)
 * cqlsh can't tab-complete disabling compaction (CASSANDRA-7185)
 * cqlsh: Accept and execute CQL statement(s) from command-line
parameter (CASSANDRA-7172)
 * Fix IllegalStateException in CqlPagingRecordReader (CASSANDRA-7198)
 * Fix the InvertedIndex trigger example (CASSANDRA-7211)


+1 from me though.

Gary.


On Wed, May 21, 2014 at 8:19 AM, Sylvain Lebresne wrote:

> Since we closed the first try we've fixed even more bugs, with notable
> things
> like CASSANDRA-6285. In any case, that changelog is getting pretty damn
> long so
> I propose the following artifacts for release as 2.0.8.
>
> sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~slebresne/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/EE3aHy (CHANGES.txt)
> [2]: http://goo.gl/dkl3Yu (NEWS.txt)
>


Getting to RC1

2014-05-21 Thread Jonathan Ellis
Please prioritize issues assigned to you that are blocking RC1:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20resolution%20%3D%20Unresolved%20AND%20assignee%20%3D%20currentUser%28%29%20and%20fixVersion%20%3D%20%272.1%20rc1%27%20ORDER%20BY%20priority%20DESC

Also, issues you're reviewing for RC1:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20resolution%20%3D%20Unresolved%20AND%20reviewer%20%3D%20currentUser%28%29%20and%20fixVersion%20%3D%20%272.1%20rc1%27%20ORDER%20BY%20priority%20DESC

Special shout out to Sylvain who has 5 of the 17 outstanding issues.
Which of those could someone else help with?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Getting to RC1

2014-05-21 Thread Tyler Hobbs
On Wed, May 21, 2014 at 11:34 AM, Jonathan Ellis  wrote:

> Which of those could someone else help with?


I'll take CASSANDRA-7120 (and CASSANDRA-7267, if needed).


-- 
Tyler Hobbs
DataStax 


Re: [jira] [Assigned] (CASSANDRA-7120) Bad paging state returned for prepared statements for last page

2014-05-21 Thread Benedict Elliott Smith
We need to add 7245 to that list. I'll try to get to it tomorrow.


On 21 May 2014 17:40, Tyler Hobbs (JIRA)  wrote:

>
>  [
> https://issues.apache.org/jira/browse/CASSANDRA-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Tyler Hobbs reassigned CASSANDRA-7120:
> --
>
> Assignee: Tyler Hobbs  (was: Sylvain Lebresne)
>
> > Bad paging state returned for prepared statements for last page
> > ---
> >
> > Key: CASSANDRA-7120
> > URL:
> https://issues.apache.org/jira/browse/CASSANDRA-7120
> > Project: Cassandra
> >  Issue Type: Bug
> >  Components: Core
> >Reporter: Tyler Hobbs
> >Assignee: Tyler Hobbs
> > Fix For: 2.1 rc1
> >
> >
> > When executing a paged query with a prepared statement, a non-null
> paging state is sometimes being returned for the final page, causing an
> endless paging loop.
> > Specifically, this is the schema being used:
> > {noformat}
> > CREATE KEYSPACE test3rf WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'}';
> > USE test3rf;
> > CREATE TABLE test3rf.test (
> > k int PRIMARY KEY,
> > v int
> > )
> > {noformat}
> > The inserts are like so:
> > {noformat}
> > INSERT INTO test3rf.test (k, v) VALUES (?, 0)
> > {noformat}
> > With values from [0, 99] used for k.
> > The query is {{SELECT * FROM test3rf.test}} with a fetch size of 3.
> > The final page returns the row with k=3, and the paging state is
> {{000400420004000176007fa2}}.  This matches the paging state from
> three pages earlier.  When executing this with a non-prepared statement, no
> paging state is returned for this page.
> > This problem doesn't happen with the 2.0 branch.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>


Re: [VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Mikhail Stepura
+1

-M

On May 21, 2014, at 6:19, Sylvain Lebresne  wrote:

> Since we closed the first try we've fixed even more bugs, with notable
> things
> like CASSANDRA-6285. In any case, that changelog is getting pretty damn
> long so
> I propose the following artifacts for release as 2.0.8.
> 
> sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/
> 
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~slebresne/
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: http://goo.gl/EE3aHy (CHANGES.txt)
> [2]: http://goo.gl/dkl3Yu (NEWS.txt)



Re: [VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Pavel Yaskevich
+1


On Wed, May 21, 2014 at 10:32 AM, Mikhail Stepura <
mikhail.step...@outlook.com> wrote:

> +1
>
> -M
>
> On May 21, 2014, at 6:19, Sylvain Lebresne  wrote:
>
> > Since we closed the first try we've fixed even more bugs, with notable
> > things
> > like CASSANDRA-6285. In any case, that changelog is getting pretty damn
> > long so
> > I propose the following artifacts for release as 2.0.8.
> >
> > sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
> > Git:
> >
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
> > Artifacts:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~slebresne/
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: http://goo.gl/EE3aHy (CHANGES.txt)
> > [2]: http://goo.gl/dkl3Yu (NEWS.txt)
>
>


Re: [VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Jonathan Ellis
+1

On Wed, May 21, 2014 at 8:19 AM, Sylvain Lebresne  wrote:
> Since we closed the first try we've fixed even more bugs, with notable
> things
> like CASSANDRA-6285. In any case, that changelog is getting pretty damn
> long so
> I propose the following artifacts for release as 2.0.8.
>
> sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~slebresne/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/EE3aHy (CHANGES.txt)
> [2]: http://goo.gl/dkl3Yu (NEWS.txt)



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: [VOTE] Release Apache Cassandra 2.0.8 (strike 2)

2014-05-21 Thread Brandon Williams
+1


On Wed, May 21, 2014 at 8:19 AM, Sylvain Lebresne wrote:

> Since we closed the first try we've fixed even more bugs, with notable
> things
> like CASSANDRA-6285. In any case, that changelog is getting pretty damn
> long so
> I propose the following artifacts for release as 2.0.8.
>
> sha1: 484d2816940cd2eb22d2365fcb376dd27e059e2e
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.8-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/org/apache/cassandra/apache-cassandra/2.0.8/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1012/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~slebresne/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/EE3aHy (CHANGES.txt)
> [2]: http://goo.gl/dkl3Yu (NEWS.txt)
>


Re: CQL unit tests vs dtests

2014-05-21 Thread Jonathan Ellis
On Wed, May 21, 2014 at 4:06 AM, Sylvain Lebresne  wrote:
>  Having tests that are
> intrinsically
> the same kind of tests in two places bugs me a bit more however.

I do think that CQL tests in general make more sense as unit tests,
but I'm not so anal that I'm going to insist on rewriting existing
ones.  But in theory, if I had an infinite army of interns, sure. I'd
have one of them do that. :)

But in the real world, compared to saying "we don't have any cql unit
tests, so we should always write them as dtests to be consistent" I
think having mixed unit + dtests is the lesser of evils.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: CMS GC / fragmentation / memtables etc

2014-05-21 Thread graham sanderson
Thanks so much for the info…

I will read the relevant code and/or JIRA issues since I’m not sure what the 
trigger for on/off heap is yet, and maybe open a ticket for recycling. 
Hopefully in the short term we can mitigate our fragmentation problem trivially 
- we upped JVM memory on some nodes last night and fingers crossed it’s been 
good so far (for those nodes)… but in either case we’ll be early testers of 2.1 
in beta.

Thanks,

Graham.

On May 21, 2014, at 2:20 AM, Benedict Elliott Smith 
 wrote:

> Graham,
> 
> This is largely fixed in 2.1 with the introduction of partially off-heap
> memtables - the slabs reside off-heap, so do not cause any GC issues.
> 
> As it happens the changes would also permit us to recycle on-heap slabs
> reasonable easily as well, so feel free to file a ticket for that, although
> it won't be back ported to 2.0.
> 
> 
> On 21 May 2014 00:57, graham sanderson  wrote:
> 
>> So i’ve been tinkering a bit with CMS config because we are still seeing
>> fairly frequent full compacting GC due to framgentation/promotion failure
>> 
>> As mentioned below, we are usually too fragmented to promote new in-flight
>> memtables.
>> 
>> This is likely caused by sudden write spikes (which we do have), though
>> actually the problems don’t generally happen at that time of our largest
>> write spikes (though any write spikes likely cause spill of both new
>> memtables along with many other new objects of unknown size into the
>> tenured gen, so they cause fragmentation if not immediate GC issue). We
>> have lots of things going on in this multi-tenant cluster (GC pauses are of
>> course extra bad, since they cause spike in hinted-handoff on other nodes
>> which were already busy etc…)
>> 
>> Anyway, considering possibilities:
>> 
>> 0) Try and make our application behavior more steady state - this is
>> probably possible, but there are lots of other things (e.g. compaction,
>> opscenter, repair etc.) which are both tunable and generally throttle-able
>> to think about too.
>> 1) Play with tweaking PLAB configs to see if we can ease fragmentation
>> (I’d be curious what the “crud” is in particular that is getting spilled -
>> presumably it is larger objects since it affects the binary tree of large
>> objects)
>> 2) Given the above, if we can guarantee even > 24 hours without full GC, I
>> don’t think we’d mind running a regular rolling re-start on the servers
>> during off hours (note usually the GCs don’t have a visible impact, but
>> when they hit multiple machines at once they can)
>> 3) Zing is seriously an option, if it would save us large amounts of
>> tuning, and constant worry about the “next” thing tweaking the allocation
>> patterns - does anyone have any experience with Zing & Cassandra
>> 4) Given that we expect periodic bursts of writes,
>> memtable_total_space_in_mb is bounded, we are not actually short of memory
>> (it just gets fragmented), I’m wondering if anyone has played with pinning
>> (up to or initially?) that many 1MB chunks of memory via SlabAllocator and
>> re-using… It will get promoted once, and then these 1M chunks won’t be part
>> of the subsequent promotion hassle… it will probably also allow more crud
>> to die in eden under write load since we aren’t allocating these large
>> chunks in eden at the same time. Anyway, I had a little look at the code,
>> and the life cycles of memtables is not trivial, but was considering
>> attempting a patch to play with… anyone have any thoughts?
>> 
>> Basically in summary, the Slab allocator helps by allocating and freeing
>> lots of objects at the same time, however any time slabs are allocated
>> under load, we end up promoting them with whatever other live stuff in eden
>> is still there. If we only do this once and reuse the slabs, we are likely
>> to minimize our promotion problem later (at least for these large objects)
>> 
>> On May 16, 2014, at 9:37 PM, graham sanderson  wrote:
>> 
>>> Excellent - thank you…
>>> 
>>> On May 16, 2014, at 7:08 AM, Samuel CARRIERE 
>> wrote:
>>> 
 Hi,
 This is arena allocation of memtables. See here for more infos :
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
 
 
 
 
 De :graham sanderson 
 A : dev@cassandra.apache.org,
 Date :  16/05/2014 14:03
 Objet : Things that are about 1M big
 
 
 
 So just throwing this out there for those for whom this might ring a
>> bell.
 
 I?m debugging some CMS memory fragmentation issues on 2.0.5 - and
 interestingly enough most of the objects giving us promotion failures
>> are
 of size 131074 (dwords) - GC logging obviously doesn?t say what those
>> are,
 but I?d wager money they are either 1M big byte arrays, or less likely
 256k entry object arrays backing large maps
 
 So not strictly critical to solving my problem, but I was wondering if
 anyone can think of any heap allocated C* objects which are (with no
 signif