Re: Which approach should we use for exposing metrics through Virtual tables?

2018-06-25 Thread dinesh.jo...@yahoo.com.INVALID
+1 on doing this on a case-by-case basis. The threadpool_metrics looks 
reasonable. It's best not to shoehorn all metrics into a single table with all 
possible columns.
Dinesh 

On Friday, June 22, 2018, 8:11:33 AM PDT, Chris Lohfink 
 wrote:  
 
 I think this can really be case by case. In tpstats (I have patch for that by 
the way in CASSANDRA-14523 
) is pretty intuitive in 
way you listed. Table metrics is another beast and we will likely have many 
tables for them, ie a table for viewing latencies, caches, on disk 
statistics... they can be discussed in their respective tickets.

Having a general table for viewing all metrics is I think an additional thing 
(ie like the table_stats below), not as a general use browsing thing but to 
provide alternative to JMX. The custom tables that expose things in a nice 
(attempted at least) intuitive manner wont have _all_ the metrics and its very 
likely that people will want them for reporting. Unfortunately the metrics are 
currently not something you can readily expose in a single table as there are 
type/scope on some while others have type/keyspace/scope, type/keyspace, and 
others type/path/scope so there will likely need to be some breakup here with 
things like "table_metrics", "client_metrics", "streaming_metrics" etc.

I agree with benedict that we should attempt to not expose the internal 
implementation details in the metrics for when there are changes again, there 
are always changes. However it is kinda necessary at some level for this 
"generalized" metrics. This is something the "custom" tables that expose data 
in the nodetool way don't have as much issues with, and what I personally have 
been working on first.

Chris

> On Jun 22, 2018, at 5:14 AM, Benjamin Lerer  
> wrote:
> 
> Hi,
> 
> I would like to start working on exposing the metrics through virtual
> tables in CASSANDRA-14537
> 
> 
> We had some long discussion already in CASSANDRA-7622 about which schema to
> use to expose the metrics, unfortunately in the end I was not truly
> convinced by any solution (including my own).
> 
> I would like to expose the possible solutions and there limitations and
> advantages to find out which is the solution that people prefer or to see
> if somebody can come up with another solution.
> 
> In CASSANDRA-7622, Chris Lohfink proposed to expose the table metric using
> the following schema:
> 
> VIRTUAL TABLE table_stats (
>    keyspace_name TEXT,
>    table_name TEXT,
>    metric TEXT,
>    value DOUBLE,
>    fifteen_min_rate DOUBLE,
>    five_min_rate DOUBLE,
>    mean_rate DOUBLE,
>    one_min_rate DOUBLE,
>    p75th DOUBLE,
>    p95th DOUBLE,
>    p99th DOUBLE,
>    p999th DOUBLE,
>    min BIGINT,
>    max BIGINT,
>    mean DOUBLE,
>    std_dev DOUBLE,
>    median DOUBLE,
>    count BIGINT,
>    PRIMARY KEY( keyspace_name,  table_name , metric));
> 
> This approach has some advantages:
> 
>  - It is easy to use for all the metric categories that we have (http://
>  cassandra.apache.org/doc/latest/operating/metrics.html)
>  - The number of column is relatively small and fit in the cqlsh console.
> 
> 
> The main disadvantage that I see with that approach is that it might not
> always be super readable. Gauge or a Counter metric will have data for only
> one column and will return NULL for all the others. If you know precisely
> which metric is what and you only target that type of metric you can build
> your query in such a way that the output is nicely formatted.
> Unfortunately, I do not expect every user to know which metric is what.
> The output format can also be problematic for monitoring tools as they
> might have to use some extra logic to determine how to process each metric.
> 
> My preferred approach was to use metrics has columns. For example for the
> threadpool metrics it will have given the following schema:
> 
> VIRTUAL TABLE threadpool_metrics (
>    pool_name TEXT,
>    active INT,
>    pending INT,
>    completed BIGINT,
>    blocked BIGINT,
>    total_blocked BIGINT,
>    max_pool_size INT,
>    PRIMARY KEY( pool_name )
> )
> 
> That approach provide an output similar to the one of the nodetool
> tpstats which will be, in my opinion, more readable that the previous
> approach.
> 
> Unfortunately, it also has several serious drawbacks:
> 
> 
>  - It does work for small set of metrics but do not work well for the
>  table or keyspace metrics where we have more than 63 metrics. If you
>  split the histograms, meters and timers into multiple columns you easily
>  reach more than a hundred columns. As Chris pointed out in CASSANDRA-7622
>  it makes the all thing unusable.
>  - It also does not work properly for set of metrics like the commit log
>  metrics because you can not get a natural primary key and will have to
>  somehow create a fake one.
> 
> 
> Nodetool solved the table and keyspace metric problems by splitting 

Re: Which approach should we use for exposing metrics through Virtual tables?

2018-06-25 Thread Aleksey Yeshchenko
I don’t think it’s really necessary. Or at least I’m not seeing why having 
individual, specialised virtual tables is not sufficient.

Nor do I think that we should expose everything that nodetool does, so IMO we 
shouldn’t aim for that. Even if the goal were to eventually deprecate and 
remove JMX/nodetool, we could do that without providing a CQL equivalent for 
everything it did.

—
AY

On 22 June 2018 at 16:11:43, Chris Lohfink (clohf...@apple.com) wrote:

However it is kinda necessary at some level for this "generalized" metrics.

Recommendation: running Cassandra in containers

2018-06-25 Thread Pierre Mavro
Hi,

Regarding the limits in linux cgroups (as used in Kubernetes/Mesos), I
was wondering if there are any recommendation (didn't find anything on
this topic).

In general on Java 8 running instances, it is advised to run those
options to take into account cgroup environment:

-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap

Other tuning options for this exists (ex: MaxRAMFraction), I was
wondering if there is any information somewhere about it.

Thanks in advance

Pierre


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Recommendation: running Cassandra in containers

2018-06-25 Thread daemeon reiydelle
The use of Mesos in production for cassandra was a failure due to the
inability to reserve network bandwidth as Mesos can only allocate cpu and
memory profiles to a task. So, assuming you are either running on
dedicated/manually controlled VM's, or are no running a product/meaningful
data storage footprint, your questions are relevant. Otherwise Mesos is not
a viable solution. Note this same issue hit me at several clients with
jenkens CI workloads as well. Look at K8S for these contra-Mesos scenarios.


<==>
"When I finish a project for a client, I have ... learned their issues with
life, their personal secrets, I have come to care about them.
Once the project is over, I lose them as if I lost family. For the client,
however, they’ve just dismissed a service worker." ...
"Thought on the Gig Economy" by Francine Brevetti

*Daemeon C.M. Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype
daemeon.c.m.reiydelle*


On Mon, Jun 25, 2018 at 8:12 AM, Pierre Mavro  wrote:

> Hi,
>
> Regarding the limits in linux cgroups (as used in Kubernetes/Mesos), I
> was wondering if there are any recommendation (didn't find anything on
> this topic).
>
> In general on Java 8 running instances, it is advised to run those
> options to take into account cgroup environment:
>
> -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
>
> Other tuning options for this exists (ex: MaxRAMFraction), I was
> wondering if there is any information somewhere about it.
>
> Thanks in advance
>
> Pierre
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


CVE-2018-8016 on Apache Cassandra

2018-06-25 Thread Nate McCall
CVE-2018-8016 describes an issue with the default configuration of
Apache Cassandra releases 3.8 through 3.11.1 which binds an
unauthenticated JMX/RMI interface to all network interfaces allowing
attackers to execute arbitrary Java code via an RMI request. This
issue is a regression of the previously disclosed CVE-2015-0225.

The regression was introduced in
https://issues.apache.org/jira/browse/CASSANDRA-12109. The fix for the
regression is implemented in
https://issues.apache.org/jira/browse/CASSANDRA-14173. This fix is
contained in the 3.11.2 release of Apache Cassandra.

- The Apache Cassandra PMC

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Tombstone passed GC period causes un-repairable inconsistent data

2018-06-25 Thread Jay Zhuang
Thanks Jeff. CASSANDRA-6434 is exactly the issue. Do we have a plan/ticket
to get rid of GCGS (and make only_purge_repaired_tombstones default)? Will
it be covered in CASSANDRA-14145?

I created a ticket CASSANDRA-14543 for purgeable tombstone hints replaying,
which doesn't fix the root cause but reduces the chance to cause this
issue, please comment if you have any suggestion.

On Thu, Jun 21, 2018 at 12:55 PM Jeff Jirsa  wrote:

> Think he's talking about
> https://issues.apache.org/jira/browse/CASSANDRA-6434
>
> Doesn't solve every problem if you don't run repair at all, but if you're
> not running repairs, you're nearly guaranteed problems with resurrection
> after gcgs anyway.
>
>
>
> On Thu, Jun 21, 2018 at 11:33 AM, Jay Zhuang 
> wrote:
>
> > Yes, I also agree that the user should run (incremental) repair within
> GCGS
> > to prevent it from happening.
> >
> > @Sankalp, would you please point us the patch you mentioned from Marcus?
> > The problem is basically the same as
> > https://issues.apache.org/jira/browse/CASSANDRA-14145
> >
> > CASSANDRA-11427 
> is
> > actually the opposite of this problem. As purgeable tombstone is
> repaired,
> > this un-repairable problem cannot be reproduced. I tried 2.2.5 (before
> the
> > fix), it's able to repair the purgeable tombstone from node1 to node2, so
> > the data is deleted as expected. But it doesn't mean that's the right
> > behave, as it will also cause purgeable tombstones keeps bouncing around
> > the nodes.
> > I think https://issues.apache.org/jira/browse/CASSANDRA-14145 will fix
> the
> > problem by detecting the repaired/un-repaired data.
> >
> > How about having hints dispatch to deliver/replay purgeable (not live)
> > tombstones? It will reduce the chance to have this issue, especially when
> > GCGS < hinted handoff window.
> >
> > On Wed, Jun 20, 2018 at 9:36 AM sankalp kohli 
> > wrote:
> >
> > > I agree with Stefan that we should use incremental repair and use
> patches
> > > from Marcus to drop tombstones only from repaired data.
> > > Regarding deep repair, you can bump the read repair and run the repair.
> > The
> > > issue will be that you will stream lot of data and also your blocking
> > read
> > > repair will go up when you bump the gc grace to higher value.
> > >
> > > On Wed, Jun 20, 2018 at 1:10 AM Stefan Podkowinski 
> > > wrote:
> > >
> > > > Sounds like an older issue that I tried to address two years ago:
> > > > https://issues.apache.org/jira/browse/CASSANDRA-11427
> > > >
> > > > As you can see, the result hasn't been as expected and we got some
> > > > unintended side effects based on the patch. I'm not sure I'd be
> willing
> > > > to give this another try, considering the behaviour we like to fix in
> > > > the first place is rather harmless and the read repairs shouldn't
> > happen
> > > > at all to any users who regularly run repairs within gc_grace.
> > > >
> > > > What I'd suggest is to think more into the direction of a
> > > > post-full-repair-world and to fully embrace incremental repairs, as
> > > > fixed by Blake in 4.0. In that case, we should stop doing read
> repairs
> > > > at all for repaired data, as described in
> > > > https://issues.apache.org/jira/browse/CASSANDRA-13912. RRs are
> > certainly
> > > > useful, but can be very risky if not very very carefully implemented.
> > So
> > > > I'm wondering if we shouldn't disable RRs for everything but
> unrepaired
> > > > data. I'd btw also be interested to hear any opinions on this in
> > context
> > > > of transient replicas.
> > > >
> > > >
> > > > On 20.06.2018 03:07, Jay Zhuang wrote:
> > > > > Hi,
> > > > >
> > > > > We know that the deleted data may re-appear if repair is not run
> > within
> > > > > gc_grace_seconds. When the tombstone is not propagated to all
> nodes,
> > > the
> > > > > data will re-appear. But it's also causing following 2 issues
> before
> > > the
> > > > > tombstone is compacted away:
> > > > > a. inconsistent query result
> > > > >
> > > > > With consistency level ONE or QUORUM, it may or may not return the
> > > value.
> > > > > b. lots of read repairs, but doesn't repair anything
> > > > >
> > > > > With consistency level ALL, it always triggers a read repair.
> > > > > With consistency level QUORUM, it also very likely (2/3) causes a
> > read
> > > > > repair. But it doesn't repair the data, so it's causing repair
> every
> > > > time.
> > > > >
> > > > >
> > > > > Here are the reproducing steps:
> > > > >
> > > > > 1. Create a 3 nodes cluster
> > > > > 2. Create a table (with small gc_grace_seconds):
> > > > >
> > > > > CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy',
> > > > > 'replication_factor': 3};
> > > > > CREATE TABLE foo.bar (
> > > > > id int PRIMARY KEY,
> > > > > name text
> > > > > ) WITH gc_grace_seconds=30;
> > > > >
> > > > > 3. Insert data with consistency all:
> > > > >
> > > > > INSERT INTO foo.bar (id, name) VALUES(1, 'cstar');
>