Re: Evolving the client protocol

2018-04-24 Thread Avi Kivity

I have not asked this list to do any work on the drivers.


If Cassandra agrees to Scylla protocol changes (either proactively or 
retroactively) then the benefit to Cassandra is that if the drivers are 
changed (by the driver maintainers or by Scylla developers) then 
Cassandra developers need not do additional work to update the drivers. 
So there is less work for you, in the future, if those features are of 
interest to you.



On 2018-04-24 02:13, Jonathan Haddad wrote:

 From where I stand it looks like you've got only two options for any
feature that involves updating the protocol:

1. Don't built the feature
2. Built it in Cassanda & scylladb, update the drivers accordingly

I don't think you have a third option, which is built it only in ScyllaDB,
because that means you have to fork *all* the drivers and make it work,
then maintain them.  Your business model appears to be built on not doing
any of the driver work yourself, and you certainly aren't giving back to
the open source community via a permissive license on ScyllaDB itself, so
I'm a bit lost here.

To me it looks like you're asking a bunch of volunteers that work on
Cassandra to accommodate you.  What exactly do we get out of this
relationship?  What incentive do I or anyone else have to spend time
helping you instead of working on something that interests me?

Jon


On Mon, Apr 23, 2018 at 7:59 AM Ben Bromhead  wrote:


This doesn't work without additional changes, for RF>1. The token ring

could place two replicas of the same token range on the same physical
server, even though those are two separate cores of the same server. You
could add another element to the hierarchy (cluster -> datacenter -> rack
-> node -> core/shard), but that generates unneeded range movements when

a

node is added.

I have seen rack awareness used/abused to solve this.


But then you lose real rack awareness. It's fine for a quick hack, but
not a long-term solution.

(it also creates a lot more tokens, something nobody needs)


I'm having trouble understanding how you loose "real" rack awareness, as
these shards are in the same rack anyway, because the address and port are
on the same server in the same rack. So it behaves as expected. Could you
explain a situation where the shards on a single server would be in
different racks (or fault domains)?

If you wanted to support a situation where you have a single rack per DC
for simple deployments, extending NetworkTopologyStrategy to behave the way
it did before https://issues.apache.org/jira/browse/CASSANDRA-7544 with
respect to treating InetAddresses as servers rather than the address and
port would be simple. Both this implementation in Apache Cassandra and the
respective load balancing classes in the drivers are explicitly designed to
be pluggable so that would be an easier integration point for you.

I'm not sure how it creates more tokens? If a server normally owns 256
tokens, each shard on a different port would just advertise ownership of
256/# of cores (e.g. 4 tokens if you had 64 cores).



Regards,
Ariel


On Apr 22, 2018, at 8:26 AM, Avi Kivity  wrote:




On 2018-04-19 21:15, Ben Bromhead wrote:
Re #3:

Yup I was thinking each shard/port would appear as a discrete server

to the

client.

This doesn't work without additional changes, for RF>1. The token ring

could place two replicas of the same token range on the same physical
server, even though those are two separate cores of the same server. You
could add another element to the hierarchy (cluster -> datacenter -> rack
-> node -> core/shard), but that generates unneeded range movements when

a

node is added.

If the per port suggestion is unacceptable due to hardware

requirements,

remembering that Cassandra is built with the concept scaling

*commodity*

hardware horizontally, you'll have to spend your time and energy

convincing

the community to support a protocol feature it has no (current) use

for or

find another interim solution.

Those servers are commodity servers (not x86, but still commodity). In

any case 60+ logical cores are common now (hello AWS i3.16xlarge or even
i3.metal), and we can only expect logical core count to continue to
increase (there are 48-core ARM processors now).

Another way, would be to build support and consensus around a clear
technical need in the Apache Cassandra project as it stands today.

One way to build community support might be to contribute an Apache
licensed thread per core implementation in Java that matches the

protocol

change and shard concept you are looking for ;P

I doubt I'll survive the egregious top-posting that is going on in

this

list.

On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg 

wrote:

Hi,

So at technical level I don't understand this yet.

So you have a database consisting of single threaded shards and a

socket

for accept that is generating TCP connections and in advance you

don't know

which connection is going to send messages to which shard.

What is the mechanism by wh

Re: Evolving the client protocol

2018-04-24 Thread Avi Kivity



On 2018-04-23 17:59, Ben Bromhead wrote:


>> This doesn't work without additional changes, for RF>1. The
token ring could place two replicas of the same token range on the
same physical server, even though those are two separate cores of
the same server. You could add another element to the hierarchy
(cluster -> datacenter -> rack -> node -> core/shard), but that
generates unneeded range movements when a node is added.
> I have seen rack awareness used/abused to solve this.
>

But then you lose real rack awareness. It's fine for a quick hack,
but
not a long-term solution.

(it also creates a lot more tokens, something nobody needs)


I'm having trouble understanding how you loose "real" rack awareness, 
as these shards are in the same rack anyway, because the address and 
port are on the same server in the same rack. So it behaves as 
expected. Could you explain a situation where the shards on a single 
server would be in different racks (or fault domains)?


You're right - it continues to work.



If you wanted to support a situation where you have a single rack per 
DC for simple deployments, extending NetworkTopologyStrategy to behave 
the way it did before 
https://issues.apache.org/jira/browse/CASSANDRA-7544 with respect to 
treating InetAddresses as servers rather than the address and port 
would be simple. Both this implementation in Apache Cassandra and the 
respective load balancing classes in the drivers are explicitly 
designed to be pluggable so that would be an easier integration point 
for you.


I'm not sure how it creates more tokens? If a server normally owns 256 
tokens, each shard on a different port would just advertise ownership 
of 256/# of cores (e.g. 4 tokens if you had 64 cores).


Having just 4 tokens results in imbalance. CASSANDRA-7032 mitigates it, 
but only for one replication factor, and doesn't work for decommission.


(and if you have 60 lcores then you get between 4 and 5 tokens per 
lcore, which is a 20% imbalance right there)




> Regards,
> Ariel
>
>> On Apr 22, 2018, at 8:26 AM, Avi Kivity mailto:a...@scylladb.com>> wrote:
>>
>>
>>
>>> On 2018-04-19 21:15, Ben Bromhead wrote:
>>> Re #3:
>>>
>>> Yup I was thinking each shard/port would appear as a discrete
server to the
>>> client.
>> This doesn't work without additional changes, for RF>1. The
token ring could place two replicas of the same token range on the
same physical server, even though those are two separate cores of
the same server. You could add another element to the hierarchy
(cluster -> datacenter -> rack -> node -> core/shard), but that
generates unneeded range movements when a node is added.
>>
>>> If the per port suggestion is unacceptable due to hardware
requirements,
>>> remembering that Cassandra is built with the concept scaling
*commodity*
>>> hardware horizontally, you'll have to spend your time and
energy convincing
>>> the community to support a protocol feature it has no
(current) use for or
>>> find another interim solution.
>> Those servers are commodity servers (not x86, but still
commodity). In any case 60+ logical cores are common now (hello
AWS i3.16xlarge or even i3.metal), and we can only expect logical
core count to continue to increase (there are 48-core ARM
processors now).
>>
>>> Another way, would be to build support and consensus around a
clear
>>> technical need in the Apache Cassandra project as it stands today.
>>>
>>> One way to build community support might be to contribute an
Apache
>>> licensed thread per core implementation in Java that matches
the protocol
>>> change and shard concept you are looking for ;P
>> I doubt I'll survive the egregious top-posting that is going on
in this list.
>>
>>>
 On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg
mailto:ar...@weisberg.ws>> wrote:

 Hi,

 So at technical level I don't understand this yet.

 So you have a database consisting of single threaded shards
and a socket
 for accept that is generating TCP connections and in advance
you don't know
 which connection is going to send messages to which shard.

 What is the mechanism by which you get the packets for a
given TCP
 connection delivered to a specific core? I know that a given
TCP connection
 will normally have all of its packets delivered to the same
queue from the
 NIC because the tuple of source address + port and
destination address +
 port is typically hashed to pick one of the queues the NIC
presents. I
 might have the contents of the tuple slightly wrong, but it
always includes
 a component you don't get to control.

 Since it's hashing how do you man

Re: Evolving the client protocol

2018-04-24 Thread Eric Stevens
Let met just say that as an observer to this conversation -- and someone
who believes that compatibility, extensibility, and frankly competition
bring out the best in products -- I'm fairly surprised and disappointed
with the apparent hostility many community members have shown toward a
sincere attempt by another open source product to find common ground here.

Yes, Scylla has a competing OSS project (albeit under a different
license).  They also have a business built around it.  It's hard for me to
see that as dramatically different than the DataStax relationship to this
community.  Though I would love to be shown why.


Re: Evolving the client protocol

2018-04-24 Thread Jonathan Haddad
DataStax invested millions of dollars into Cassandra, tens of thousands of
man hours, hosted hundreds of events and has been a major factor in the
success of the project.

ScyllaDB wants us to change the C* protocol in order to improve features in
a competing database which contributes nothing back to the Cassandra
community.

Seems a little different to me.

On Tue, Apr 24, 2018 at 8:30 AM Eric Stevens  wrote:

> Let met just say that as an observer to this conversation -- and someone
> who believes that compatibility, extensibility, and frankly competition
> bring out the best in products -- I'm fairly surprised and disappointed
> with the apparent hostility many community members have shown toward a
> sincere attempt by another open source product to find common ground here.
>
> Yes, Scylla has a competing OSS project (albeit under a different
> license).  They also have a business built around it.  It's hard for me to
> see that as dramatically different than the DataStax relationship to this
> community.  Though I would love to be shown why.
>


Re: Evolving the client protocol

2018-04-24 Thread Russell Bateman

Eric,

You have to understand the poisonous GPL. It's very different from 
Apache licensing in the sense that, roughly speaking, you're welcome to 
contribute to Scylla, but legally barred from distributing it with or 
inside any product you base on it unless your product source code is 
also open or you contract with Scylla DB. The objections raised by some 
in this thread are based on the inequality of contribution in the two models


On 04/24/2018 09:30 AM, Eric Stevens wrote:

Let met just say that as an observer to this conversation -- and someone
who believes that compatibility, extensibility, and frankly competition
bring out the best in products -- I'm fairly surprised and disappointed
with the apparent hostility many community members have shown toward a
sincere attempt by another open source product to find common ground here.

Yes, Scylla has a competing OSS project (albeit under a different
license).  They also have a business built around it.  It's hard for me to
see that as dramatically different than the DataStax relationship to this
community.  Though I would love to be shown why.





Re: Evolving the client protocol

2018-04-24 Thread Avi Kivity



On 2018-04-24 04:18, Nate McCall wrote:

Folks,
Before this goes much further, let's take a step back for a second.

I am hearing the following: Folks are fine with CASSANDRA-14311 and
CASSANDRA-2848 *BUT* they don't make much sense from the project's
perspective without a reference implementation. I think the shard
concept is too abstract for the project right now, so we should
probably set that one aside.

Dor and Avi, I appreciate you both engaging directly on this. Where
can we find common ground on this?



I started with three options:

1. Scylla (or other protocol implementers) contribute spec changes, and 
each implementer implements them on their own


This was rejected.

2. Scylla defines and implements spec changes on its own, and when 
Cassandra implements similar changes, it will retroactively apply the 
Scylla change if it makes technical sense


IOW, no gratuitous divergence, but no hard commitment either.

I received no feedback on this.

3. No cooperation.

This is the fall-back option which I would like to avoid if possible. 
It's main advantage is that it avoids long email threads and flamewars.


There was also a suggestion made in this thread:

4. Scylla defines spec changes and also implements them for Cassandra

That works for some changes but not all (for example, thread-per-core 
awareness, or changes that require significant effort). I would like to 
find a way that works for all of the changes that we want to make.



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Evolving the client protocol

2018-04-24 Thread Dor Laor
The main point is that we decided to take a strategic decision to invest in
the client
side. We always wanted to get to the state but for natural reasons, it took
us a while.
The client side changes aren't just about a small feature here and there or
stop at
thread per core. Think about the changes that will come in a 3-5 year scope.

Avi had a great idea about changing the underline TCP to UDP. It removes
head-of-the-line
blocking, removes limitations of number of sockets and since clients
restrasmit on timeouts,
it will improve performance a lot.
Another change is in the CDC domain.

Some other idea that comes to my mind is to use IDL and automatic generate
bindings
to different languages, to improve reuse an d standardization Scylla
automatically
generated its internal RPC code from an IDL and modern implementations
should take
this path, especially with polyglot of languages. Believe me, it sounds
more and more compeling
to me as an easier path.




On Tue, Apr 24, 2018 at 9:26 AM, Avi Kivity  wrote:

>
>
> On 2018-04-24 04:18, Nate McCall wrote:
>
>> Folks,
>> Before this goes much further, let's take a step back for a second.
>>
>> I am hearing the following: Folks are fine with CASSANDRA-14311 and
>> CASSANDRA-2848 *BUT* they don't make much sense from the project's
>> perspective without a reference implementation. I think the shard
>> concept is too abstract for the project right now, so we should
>> probably set that one aside.
>>
>> Dor and Avi, I appreciate you both engaging directly on this. Where
>> can we find common ground on this?
>>
>>
> I started with three options:
>
> 1. Scylla (or other protocol implementers) contribute spec changes, and
> each implementer implements them on their own
>
> This was rejected.
>
> 2. Scylla defines and implements spec changes on its own, and when
> Cassandra implements similar changes, it will retroactively apply the
> Scylla change if it makes technical sense
>
> IOW, no gratuitous divergence, but no hard commitment either.
>
> I received no feedback on this.
>
> 3. No cooperation.
>
> This is the fall-back option which I would like to avoid if possible. It's
> main advantage is that it avoids long email threads and flamewars.
>
> There was also a suggestion made in this thread:
>
> 4. Scylla defines spec changes and also implements them for Cassandra
>
> That works for some changes but not all (for example, thread-per-core
> awareness, or changes that require significant effort). I would like to
> find a way that works for all of the changes that we want to make.
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Optimizing queries for partition keys

2018-04-24 Thread Sam Klock
Thanks.  For those interested: opened CASSANDRA-14415.

SK

On 2018-04-19 06:04, Benjamin Lerer wrote:
> Hi Sam,
> 
> Your finding is interesting. Effectively, if the number of bytes to skip is
> larger than the remaining bytes in the buffer + the buffer size it could be
> faster to use seek.
> Feel free to open a JIRA ticket and attach your patch. It will be great if
> you could add to the ticket your table schema as well
>  as some information on your environment (e.g. disk type).
> 
> On Tue, Apr 17, 2018 at 8:53 PM, Sam Klock  wrote:
> 
>> Thanks (and apologies for the delayed response); that was the kind of
>> feedback we were looking for.
>>
>> We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially
>> addresses our problem in the sense that it does limit the data sent on
>> the wire.  The performance is still extremely poor, however, due to the
>> fact that Cassandra continues to read large volumes of data from disk.
>> (We've also confirmed this behavior in 3.11.2.)
>>
>> With a bit more investigation, we now believe the problem (after
>> CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(),
>> which appears to read bytes in order to skip them.  The subclass used in
>> our case, RandomAccessReader, exposes a seek(), so we overrode
>> skipBytes() in it to make use of seek(), and that seems to resolve the
>> problem.
>>
>> This change is intuitively much safer than the one we'd originally
>> identified, but we'd still like to confirm with you folks whether it's
>> likely safe and, if so whether it's also potentially worth contributing.
>>
>> Thanks,
>> Sk
>>
>>
>> On 2018-03-22 18:16, Benjamin Lerer wrote:
>>
>>> You should check the 3.x release. CASSANDRA-10657 could have fixed your
>>> problem.
>>>
>>>
>>> On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer <
>>> benjamin.le...@datastax.com
>>>
 wrote:

>>>
>>> Syvlain explained the problem in CASSANDRA-4536:
 " Let me note that in CQL3 a row that have no live column don't exist, so
 we can't really implement this with a range slice having an empty columns
 list. Instead we should do a range slice with a full-row slice predicate
 with a count of 1, to make sure we do have a live column before including
 the partition key. "

 By using ColumnFilter.selectionBuilder(); you do not select all the
 columns. By consequence, some partitions might be returned while they
 should not.

 On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock  wrote:

 Cassandra devs,
>
> We use workflows in some of our clusters (running 3.0.15) that involve
> "SELECT DISTINCT key FROM..."-style queries.  For some tables, we
> observed extremely poor performance under light load (i.e., a small
> number of rows per second and frequent timeouts), which we eventually
> traced to replicas shipping entire rows (which in some cases could store
> on the order of MBs of data) to service the query.  That surprised us
> (partly because 2.1 doesn't seem to behave this way), so we did some
> digging, and we eventually came up with a patch that modifies
> SelectStatement.java in the following way: if the selection in the query
> only includes the partition key, then when building a ColumnFilter for
> the query, use:
>
>  builder = ColumnFilter.selectionBuilder();
>
> instead of:
>
>  builder = ColumnFilter.allColumnsBuilder();
>
> to initialize the ColumnFilter.Builder in gatherQueriedColumns().  That
> seems to repair the performance regression, and it doesn't appear to
> break any functionality (based on the unit tests and some smoke tests we
> ran involving insertions and deletions).
>
> We'd like to contribute this patch back to the project, but we're not
> convinced that there aren't subtle correctness issues we're missing,
> judging both from comments in the code and the existence of
> CASSANDRA-5912, which suggests optimizing this kind of query is
> nontrivial.
>
> So: does this change sound safe to make, or are there corner cases we
> need to account for?  If there are corner cases, are there plausibly
> ways of addressing them at the SelectStatement level, or will we need to
> look deeper?
>
> Thanks,
> SK
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>
>

>>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Evolving the client protocol

2018-04-24 Thread Jeff Jirsa
They aren't even remotely similar, they're VERY different. Here's a few
starting points:

1) Most of Datastax's work for the first 5, 6, 8 years of existence focused
on driving users to cassandra from other DBs (see all of the "Cassandra
Summits" that eventually created trademark friction) ; Scylla's marketing
is squarely Scylla v  Cassandra. Ultimately they're both companies out to
make money, but one has a history of driving users to Cassandra, and the
other is trying to siphon users away from Cassandra.
2) Datastax may not be actively contributing as much as they used to, but
some ridiculous number of engineering hours got paid out of their budget -
maybe 80% of total lines of code? Maybe higher (though it's decreasing day
by day). By contrast, Scylla has exactly zero meaningful concrete code
contributions to the project, uses a license that makes even sharing
concepts prohibitive, only has a handful or so JIRAs opened (which is
better than zero), but has effectively no goodwill in the eyes of many of
the longer-term community members (in large part because of #1, and also
because of the way they positioned their talk-turned-product announcement
at the competitor-funded 2016 summit).
3) Datastax apparently respects the project enough that they'd NEVER come
in and ask for a protocol spec change without providing a reference
implementation.
4) To that end, native protocol changes aren't something anyone is anxious
to shove in without good reason. Even with a reference implementation, and
a REALLY GOOD REASON (namely data correctness / protection from
corruption), https://issues.apache.org/jira/browse/CASSANDRA-13304 has been
sitting patch available for OVER A YEAR.

So again: we have a Cassandra native protocol, and we have a process for
changing it, and that process is contributor agnostic.  Anyone who wants a
change can submit a patch, and it'll get reviewed, and maybe if it's a good
idea, it'll get committed, but the chances of a review leading to a commit
without an implementation is nearly zero.

Would be happy to see this thread die now. There's nothing new coming out
of it.

- Jeff


On Tue, Apr 24, 2018 at 8:30 AM, Eric Stevens  wrote:

> Let met just say that as an observer to this conversation -- and someone
> who believes that compatibility, extensibility, and frankly competition
> bring out the best in products -- I'm fairly surprised and disappointed
> with the apparent hostility many community members have shown toward a
> sincere attempt by another open source product to find common ground here.
>
> Yes, Scylla has a competing OSS project (albeit under a different
> license).  They also have a business built around it.  It's hard for me to
> see that as dramatically different than the DataStax relationship to this
> community.  Though I would love to be shown why.
>