[GitHub] cassandra pull request: Fix bug for running on ppc64

2016-03-23 Thread mlpesant
GitHub user mlpesant opened a pull request:

https://github.com/apache/cassandra/pull/65

Fix bug for running on ppc64

I was getting errors like the one below when trying to build on POWERpc.  I 
fixed this by adding  in the build.xml.  
Thanks.

```
[javac] 
/tmp/pkb/cassandra/test/unit/org/apache/cassandra/cql3/validation/operations/UpdateTest.java:38:
 error: unmappable character for encoding ASCII
[javac] execute("INSERT INTO %s (k, c, v, s) VALUES ('??', 
'??', '??', {'??'})");
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mlpesant/cassandra ppc64

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/65.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #65


commit 605d28e69463280018465a12e40ea16cc2d1205f
Author: mlpesant 
Date:   2016-03-23T16:38:12Z

Fix bug for running on ppc64




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Fix bug for running on ppc64

2016-03-23 Thread verma7
Github user verma7 commented on the pull request:

https://github.com/apache/cassandra/pull/65#issuecomment-200539241
  
Hi @mlpesant, please see the steps on how to contribute to Cassandra are 
available on the wiki (https://wiki.apache.org/cassandra/HowToContribute).

Notably, we don't use pull requests. It would be great if you could close 
your pull request on GitHub. Please create a new JIRA ticket and patch and 
somebody should be able to review and merge it.

Disclaimer: I am also new to the Cassandra dev community.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Fix bug for running on ppc64

2016-03-23 Thread mlpesant
Github user mlpesant closed the pull request at:

https://github.com/apache/cassandra/pull/65


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] cassandra pull request: Fix bug for running on ppc64

2016-03-23 Thread mlpesant
Github user mlpesant commented on the pull request:

https://github.com/apache/cassandra/pull/65#issuecomment-200555378
  
Moved this request to JIRA   
https://issues.apache.org/jira/browse/THRIFT-3754


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: How to measure the write amplification of C*?

2016-03-23 Thread Dikang Gu
As a follow-up, I'm going to write a simple patch to expose the number of
flushed bytes from memtable to JMX, so that we can easily monitor it.

Here is the jira: https://issues.apache.org/jira/browse/CASSANDRA-11420

On Thu, Mar 10, 2016 at 12:55 PM, Jack Krupansky 
wrote:

> The doc does say this:
>
> "A log-structured engine that avoids overwrites and uses sequential IO to
> update data is essential for writing to solid-state disks (SSD) and hard
> disks (HDD) On HDD, writing randomly involves a higher number of seek
> operations than sequential writing. The seek penalty incurred can be
> substantial. Using sequential IO (thereby avoiding write amplification
>  and disk failure),
> Cassandra accommodates inexpensive, consumer SSDs extremely well."
>
> I presume that write amplification argues for placing the commit log on a
> separate SSD device. That should probably be mentioned.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:52 PM, Matt Kennedy 
> wrote:
>
>> It isn't really the data written by the host that you're concerned with,
>> it's the data written by your application. I'd start by instrumenting your
>> application tier to tally up the size of the values that it writes to C*.
>>
>> However, it may not be extremely useful to have this value. You can't do
>> much with the information it provides. It is probably a better idea to
>> track the bytes written to flash for each drive so that you know the
>> physical endurance of that type of drive given your workload. Unfortunately
>> the TBW endurance rated for the drive may not be extremely useful given the
>> difference between the synthetic workload used to create those ratings and
>> the workload that Cassandra is producing for your particular case. You can
>> find out more about those here:
>> https://www.jedec.org/standards-documents/docs/jesd219a
>>
>>
>> Matt Kennedy
>>
>> Sr. Product Manager, DSE Core
>>
>> matt.kenn...@datastax.com | Public Calendar 
>>
>> *DataStax Enterprise - the database for cloud applications.*
>>
>> On Thu, Mar 10, 2016 at 11:44 AM, Dikang Gu  wrote:
>>
>>> Hi Matt,
>>>
>>> Thanks for the detailed explanation! Yes, this is exactly what I'm
>>> looking for, "write amplification = data written to flash/data written
>>> by the host".
>>>
>>> We are heavily using the LCS in production, so I'd like to figure out
>>> the amplification caused by that and see what we can do to optimize it. I
>>> have the metrics of "data written to flash", and I'm wondering is there
>>> an easy way to get the "data written by the host" on each C* node?
>>>
>>> Thanks
>>>
>>> On Thu, Mar 10, 2016 at 8:48 AM, Matt Kennedy 
>>> wrote:
>>>
 TL;DR - Cassandra actually causes a ton of write amplification but it
 doesn't freaking matter any more. Read on for details...

 That slide deck does have a lot of very good information on it, but
 unfortunately I think it has led to a fundamental misunderstanding about
 Cassandra and write amplification. In particular, slide 51 vastly
 oversimplifies the situation.

 The wikipedia definition of write amplification looks at this from the
 perspective of the SSD controller:
 https://en.wikipedia.org/wiki/Write_amplification#Calculating_the_value

 In short, write amplification = data written to flash/data written by
 the host

 So, if I write 1MB in my application, but the SSD has to write my 1MB,
 plus rearrange another 1MB of data in order to make room for it, then I've
 written a total of 2MB and my write amplification is 2x.

 In other words, it is measuring how much extra the SSD controller has
 to write in order to do its own housekeeping.

 However, the wikipedia definition is a bit more constrained than how
 the term is used in the storage industry. The whole point of looking at
 write amplification is to understand the impact that a particular workload
 is going to have on the underlying NAND by virtue of the data written. So a
 definition of write amplification that is a little more relevant to the
 context of Cassandra is to consider this:

 write amplification = data written to flash/data written to the database

 So, while the fact that we only sequentially write large immutable
 SSTables does in fact mean that controller-level write amplification is
 near zero, Compaction comes along and completely destroys that tidy little
 story. Think about it, every time a compaction re-writes data that has
 already been written, we are creating a lot of application-level write
 amplification. Different compaction strategies and the workload itself
 impact what the real application-level write amp is, but generally
 speaking, LCS is the worst, followed by STCS and DTCS will cause the least
 write-amp. To measure this, you can usually use smartctl (may be another
 mechanism

Counter values become under-counted when running repair.

2016-03-23 Thread Dikang Gu
Hello there,

We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have
6 nodes, across three different regions, and in each region, the
replication factor is 2. Basically, each nodes holds a full copy of the
data.

When are doing 30k/s counter increment/decrement per node, and at the
meanwhile, we are double writing to our mysql tier, so that we can measure
the accuracy of C* counter, compared to mysql.

The experiment result was great at the beginning, the counter value in C*
and mysql are very close. The difference is less than 0.1%.

But when we start to run the repair on one node, the counter value in C*
become much less than the value in mysql,  the difference becomes larger
than 1%.

My question is that is it a known problem that the counter value will
become under-counted if repair is running? Should we avoid running repair
for counter tables?

Thanks.

-- 
Dikang