Re: Evolving the client protocol
I have not asked this list to do any work on the drivers. If Cassandra agrees to Scylla protocol changes (either proactively or retroactively) then the benefit to Cassandra is that if the drivers are changed (by the driver maintainers or by Scylla developers) then Cassandra developers need not do additional work to update the drivers. So there is less work for you, in the future, if those features are of interest to you. On 2018-04-24 02:13, Jonathan Haddad wrote: From where I stand it looks like you've got only two options for any feature that involves updating the protocol: 1. Don't built the feature 2. Built it in Cassanda & scylladb, update the drivers accordingly I don't think you have a third option, which is built it only in ScyllaDB, because that means you have to fork *all* the drivers and make it work, then maintain them. Your business model appears to be built on not doing any of the driver work yourself, and you certainly aren't giving back to the open source community via a permissive license on ScyllaDB itself, so I'm a bit lost here. To me it looks like you're asking a bunch of volunteers that work on Cassandra to accommodate you. What exactly do we get out of this relationship? What incentive do I or anyone else have to spend time helping you instead of working on something that interests me? Jon On Mon, Apr 23, 2018 at 7:59 AM Ben Bromhead wrote: This doesn't work without additional changes, for RF>1. The token ring could place two replicas of the same token range on the same physical server, even though those are two separate cores of the same server. You could add another element to the hierarchy (cluster -> datacenter -> rack -> node -> core/shard), but that generates unneeded range movements when a node is added. I have seen rack awareness used/abused to solve this. But then you lose real rack awareness. It's fine for a quick hack, but not a long-term solution. (it also creates a lot more tokens, something nobody needs) I'm having trouble understanding how you loose "real" rack awareness, as these shards are in the same rack anyway, because the address and port are on the same server in the same rack. So it behaves as expected. Could you explain a situation where the shards on a single server would be in different racks (or fault domains)? If you wanted to support a situation where you have a single rack per DC for simple deployments, extending NetworkTopologyStrategy to behave the way it did before https://issues.apache.org/jira/browse/CASSANDRA-7544 with respect to treating InetAddresses as servers rather than the address and port would be simple. Both this implementation in Apache Cassandra and the respective load balancing classes in the drivers are explicitly designed to be pluggable so that would be an easier integration point for you. I'm not sure how it creates more tokens? If a server normally owns 256 tokens, each shard on a different port would just advertise ownership of 256/# of cores (e.g. 4 tokens if you had 64 cores). Regards, Ariel On Apr 22, 2018, at 8:26 AM, Avi Kivity wrote: On 2018-04-19 21:15, Ben Bromhead wrote: Re #3: Yup I was thinking each shard/port would appear as a discrete server to the client. This doesn't work without additional changes, for RF>1. The token ring could place two replicas of the same token range on the same physical server, even though those are two separate cores of the same server. You could add another element to the hierarchy (cluster -> datacenter -> rack -> node -> core/shard), but that generates unneeded range movements when a node is added. If the per port suggestion is unacceptable due to hardware requirements, remembering that Cassandra is built with the concept scaling *commodity* hardware horizontally, you'll have to spend your time and energy convincing the community to support a protocol feature it has no (current) use for or find another interim solution. Those servers are commodity servers (not x86, but still commodity). In any case 60+ logical cores are common now (hello AWS i3.16xlarge or even i3.metal), and we can only expect logical core count to continue to increase (there are 48-core ARM processors now). Another way, would be to build support and consensus around a clear technical need in the Apache Cassandra project as it stands today. One way to build community support might be to contribute an Apache licensed thread per core implementation in Java that matches the protocol change and shard concept you are looking for ;P I doubt I'll survive the egregious top-posting that is going on in this list. On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg wrote: Hi, So at technical level I don't understand this yet. So you have a database consisting of single threaded shards and a socket for accept that is generating TCP connections and in advance you don't know which connection is going to send messages to which shard. What is the mechanism by wh
Re: Evolving the client protocol
On 2018-04-23 17:59, Ben Bromhead wrote: >> This doesn't work without additional changes, for RF>1. The token ring could place two replicas of the same token range on the same physical server, even though those are two separate cores of the same server. You could add another element to the hierarchy (cluster -> datacenter -> rack -> node -> core/shard), but that generates unneeded range movements when a node is added. > I have seen rack awareness used/abused to solve this. > But then you lose real rack awareness. It's fine for a quick hack, but not a long-term solution. (it also creates a lot more tokens, something nobody needs) I'm having trouble understanding how you loose "real" rack awareness, as these shards are in the same rack anyway, because the address and port are on the same server in the same rack. So it behaves as expected. Could you explain a situation where the shards on a single server would be in different racks (or fault domains)? You're right - it continues to work. If you wanted to support a situation where you have a single rack per DC for simple deployments, extending NetworkTopologyStrategy to behave the way it did before https://issues.apache.org/jira/browse/CASSANDRA-7544 with respect to treating InetAddresses as servers rather than the address and port would be simple. Both this implementation in Apache Cassandra and the respective load balancing classes in the drivers are explicitly designed to be pluggable so that would be an easier integration point for you. I'm not sure how it creates more tokens? If a server normally owns 256 tokens, each shard on a different port would just advertise ownership of 256/# of cores (e.g. 4 tokens if you had 64 cores). Having just 4 tokens results in imbalance. CASSANDRA-7032 mitigates it, but only for one replication factor, and doesn't work for decommission. (and if you have 60 lcores then you get between 4 and 5 tokens per lcore, which is a 20% imbalance right there) > Regards, > Ariel > >> On Apr 22, 2018, at 8:26 AM, Avi Kivity mailto:a...@scylladb.com>> wrote: >> >> >> >>> On 2018-04-19 21:15, Ben Bromhead wrote: >>> Re #3: >>> >>> Yup I was thinking each shard/port would appear as a discrete server to the >>> client. >> This doesn't work without additional changes, for RF>1. The token ring could place two replicas of the same token range on the same physical server, even though those are two separate cores of the same server. You could add another element to the hierarchy (cluster -> datacenter -> rack -> node -> core/shard), but that generates unneeded range movements when a node is added. >> >>> If the per port suggestion is unacceptable due to hardware requirements, >>> remembering that Cassandra is built with the concept scaling *commodity* >>> hardware horizontally, you'll have to spend your time and energy convincing >>> the community to support a protocol feature it has no (current) use for or >>> find another interim solution. >> Those servers are commodity servers (not x86, but still commodity). In any case 60+ logical cores are common now (hello AWS i3.16xlarge or even i3.metal), and we can only expect logical core count to continue to increase (there are 48-core ARM processors now). >> >>> Another way, would be to build support and consensus around a clear >>> technical need in the Apache Cassandra project as it stands today. >>> >>> One way to build community support might be to contribute an Apache >>> licensed thread per core implementation in Java that matches the protocol >>> change and shard concept you are looking for ;P >> I doubt I'll survive the egregious top-posting that is going on in this list. >> >>> On Thu, Apr 19, 2018 at 1:43 PM Ariel Weisberg mailto:ar...@weisberg.ws>> wrote: Hi, So at technical level I don't understand this yet. So you have a database consisting of single threaded shards and a socket for accept that is generating TCP connections and in advance you don't know which connection is going to send messages to which shard. What is the mechanism by which you get the packets for a given TCP connection delivered to a specific core? I know that a given TCP connection will normally have all of its packets delivered to the same queue from the NIC because the tuple of source address + port and destination address + port is typically hashed to pick one of the queues the NIC presents. I might have the contents of the tuple slightly wrong, but it always includes a component you don't get to control. Since it's hashing how do you man
Re: Evolving the client protocol
Let met just say that as an observer to this conversation -- and someone who believes that compatibility, extensibility, and frankly competition bring out the best in products -- I'm fairly surprised and disappointed with the apparent hostility many community members have shown toward a sincere attempt by another open source product to find common ground here. Yes, Scylla has a competing OSS project (albeit under a different license). They also have a business built around it. It's hard for me to see that as dramatically different than the DataStax relationship to this community. Though I would love to be shown why.
Re: Evolving the client protocol
DataStax invested millions of dollars into Cassandra, tens of thousands of man hours, hosted hundreds of events and has been a major factor in the success of the project. ScyllaDB wants us to change the C* protocol in order to improve features in a competing database which contributes nothing back to the Cassandra community. Seems a little different to me. On Tue, Apr 24, 2018 at 8:30 AM Eric Stevens wrote: > Let met just say that as an observer to this conversation -- and someone > who believes that compatibility, extensibility, and frankly competition > bring out the best in products -- I'm fairly surprised and disappointed > with the apparent hostility many community members have shown toward a > sincere attempt by another open source product to find common ground here. > > Yes, Scylla has a competing OSS project (albeit under a different > license). They also have a business built around it. It's hard for me to > see that as dramatically different than the DataStax relationship to this > community. Though I would love to be shown why. >
Re: Evolving the client protocol
Eric, You have to understand the poisonous GPL. It's very different from Apache licensing in the sense that, roughly speaking, you're welcome to contribute to Scylla, but legally barred from distributing it with or inside any product you base on it unless your product source code is also open or you contract with Scylla DB. The objections raised by some in this thread are based on the inequality of contribution in the two models On 04/24/2018 09:30 AM, Eric Stevens wrote: Let met just say that as an observer to this conversation -- and someone who believes that compatibility, extensibility, and frankly competition bring out the best in products -- I'm fairly surprised and disappointed with the apparent hostility many community members have shown toward a sincere attempt by another open source product to find common ground here. Yes, Scylla has a competing OSS project (albeit under a different license). They also have a business built around it. It's hard for me to see that as dramatically different than the DataStax relationship to this community. Though I would love to be shown why.
Re: Evolving the client protocol
On 2018-04-24 04:18, Nate McCall wrote: Folks, Before this goes much further, let's take a step back for a second. I am hearing the following: Folks are fine with CASSANDRA-14311 and CASSANDRA-2848 *BUT* they don't make much sense from the project's perspective without a reference implementation. I think the shard concept is too abstract for the project right now, so we should probably set that one aside. Dor and Avi, I appreciate you both engaging directly on this. Where can we find common ground on this? I started with three options: 1. Scylla (or other protocol implementers) contribute spec changes, and each implementer implements them on their own This was rejected. 2. Scylla defines and implements spec changes on its own, and when Cassandra implements similar changes, it will retroactively apply the Scylla change if it makes technical sense IOW, no gratuitous divergence, but no hard commitment either. I received no feedback on this. 3. No cooperation. This is the fall-back option which I would like to avoid if possible. It's main advantage is that it avoids long email threads and flamewars. There was also a suggestion made in this thread: 4. Scylla defines spec changes and also implements them for Cassandra That works for some changes but not all (for example, thread-per-core awareness, or changes that require significant effort). I would like to find a way that works for all of the changes that we want to make. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Evolving the client protocol
The main point is that we decided to take a strategic decision to invest in the client side. We always wanted to get to the state but for natural reasons, it took us a while. The client side changes aren't just about a small feature here and there or stop at thread per core. Think about the changes that will come in a 3-5 year scope. Avi had a great idea about changing the underline TCP to UDP. It removes head-of-the-line blocking, removes limitations of number of sockets and since clients restrasmit on timeouts, it will improve performance a lot. Another change is in the CDC domain. Some other idea that comes to my mind is to use IDL and automatic generate bindings to different languages, to improve reuse an d standardization Scylla automatically generated its internal RPC code from an IDL and modern implementations should take this path, especially with polyglot of languages. Believe me, it sounds more and more compeling to me as an easier path. On Tue, Apr 24, 2018 at 9:26 AM, Avi Kivity wrote: > > > On 2018-04-24 04:18, Nate McCall wrote: > >> Folks, >> Before this goes much further, let's take a step back for a second. >> >> I am hearing the following: Folks are fine with CASSANDRA-14311 and >> CASSANDRA-2848 *BUT* they don't make much sense from the project's >> perspective without a reference implementation. I think the shard >> concept is too abstract for the project right now, so we should >> probably set that one aside. >> >> Dor and Avi, I appreciate you both engaging directly on this. Where >> can we find common ground on this? >> >> > I started with three options: > > 1. Scylla (or other protocol implementers) contribute spec changes, and > each implementer implements them on their own > > This was rejected. > > 2. Scylla defines and implements spec changes on its own, and when > Cassandra implements similar changes, it will retroactively apply the > Scylla change if it makes technical sense > > IOW, no gratuitous divergence, but no hard commitment either. > > I received no feedback on this. > > 3. No cooperation. > > This is the fall-back option which I would like to avoid if possible. It's > main advantage is that it avoids long email threads and flamewars. > > There was also a suggestion made in this thread: > > 4. Scylla defines spec changes and also implements them for Cassandra > > That works for some changes but not all (for example, thread-per-core > awareness, or changes that require significant effort). I would like to > find a way that works for all of the changes that we want to make. > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Optimizing queries for partition keys
Thanks. For those interested: opened CASSANDRA-14415. SK On 2018-04-19 06:04, Benjamin Lerer wrote: > Hi Sam, > > Your finding is interesting. Effectively, if the number of bytes to skip is > larger than the remaining bytes in the buffer + the buffer size it could be > faster to use seek. > Feel free to open a JIRA ticket and attach your patch. It will be great if > you could add to the ticket your table schema as well > as some information on your environment (e.g. disk type). > > On Tue, Apr 17, 2018 at 8:53 PM, Sam Klock wrote: > >> Thanks (and apologies for the delayed response); that was the kind of >> feedback we were looking for. >> >> We backported the fix for CASSANDRA-10657 to 3.0.16, and it partially >> addresses our problem in the sense that it does limit the data sent on >> the wire. The performance is still extremely poor, however, due to the >> fact that Cassandra continues to read large volumes of data from disk. >> (We've also confirmed this behavior in 3.11.2.) >> >> With a bit more investigation, we now believe the problem (after >> CASSNDRA-10657 is applied) is in RebufferingInputStream.skipBytes(), >> which appears to read bytes in order to skip them. The subclass used in >> our case, RandomAccessReader, exposes a seek(), so we overrode >> skipBytes() in it to make use of seek(), and that seems to resolve the >> problem. >> >> This change is intuitively much safer than the one we'd originally >> identified, but we'd still like to confirm with you folks whether it's >> likely safe and, if so whether it's also potentially worth contributing. >> >> Thanks, >> Sk >> >> >> On 2018-03-22 18:16, Benjamin Lerer wrote: >> >>> You should check the 3.x release. CASSANDRA-10657 could have fixed your >>> problem. >>> >>> >>> On Thu, Mar 22, 2018 at 9:15 PM, Benjamin Lerer < >>> benjamin.le...@datastax.com >>> wrote: >>> >>> Syvlain explained the problem in CASSANDRA-4536: " Let me note that in CQL3 a row that have no live column don't exist, so we can't really implement this with a range slice having an empty columns list. Instead we should do a range slice with a full-row slice predicate with a count of 1, to make sure we do have a live column before including the partition key. " By using ColumnFilter.selectionBuilder(); you do not select all the columns. By consequence, some partitions might be returned while they should not. On Thu, Mar 22, 2018 at 6:24 PM, Sam Klock wrote: Cassandra devs, > > We use workflows in some of our clusters (running 3.0.15) that involve > "SELECT DISTINCT key FROM..."-style queries. For some tables, we > observed extremely poor performance under light load (i.e., a small > number of rows per second and frequent timeouts), which we eventually > traced to replicas shipping entire rows (which in some cases could store > on the order of MBs of data) to service the query. That surprised us > (partly because 2.1 doesn't seem to behave this way), so we did some > digging, and we eventually came up with a patch that modifies > SelectStatement.java in the following way: if the selection in the query > only includes the partition key, then when building a ColumnFilter for > the query, use: > > builder = ColumnFilter.selectionBuilder(); > > instead of: > > builder = ColumnFilter.allColumnsBuilder(); > > to initialize the ColumnFilter.Builder in gatherQueriedColumns(). That > seems to repair the performance regression, and it doesn't appear to > break any functionality (based on the unit tests and some smoke tests we > ran involving insertions and deletions). > > We'd like to contribute this patch back to the project, but we're not > convinced that there aren't subtle correctness issues we're missing, > judging both from comments in the code and the existence of > CASSANDRA-5912, which suggests optimizing this kind of query is > nontrivial. > > So: does this change sound safe to make, or are there corner cases we > need to account for? If there are corner cases, are there plausibly > ways of addressing them at the SelectStatement level, or will we need to > look deeper? > > Thanks, > SK > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > >>> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Evolving the client protocol
They aren't even remotely similar, they're VERY different. Here's a few starting points: 1) Most of Datastax's work for the first 5, 6, 8 years of existence focused on driving users to cassandra from other DBs (see all of the "Cassandra Summits" that eventually created trademark friction) ; Scylla's marketing is squarely Scylla v Cassandra. Ultimately they're both companies out to make money, but one has a history of driving users to Cassandra, and the other is trying to siphon users away from Cassandra. 2) Datastax may not be actively contributing as much as they used to, but some ridiculous number of engineering hours got paid out of their budget - maybe 80% of total lines of code? Maybe higher (though it's decreasing day by day). By contrast, Scylla has exactly zero meaningful concrete code contributions to the project, uses a license that makes even sharing concepts prohibitive, only has a handful or so JIRAs opened (which is better than zero), but has effectively no goodwill in the eyes of many of the longer-term community members (in large part because of #1, and also because of the way they positioned their talk-turned-product announcement at the competitor-funded 2016 summit). 3) Datastax apparently respects the project enough that they'd NEVER come in and ask for a protocol spec change without providing a reference implementation. 4) To that end, native protocol changes aren't something anyone is anxious to shove in without good reason. Even with a reference implementation, and a REALLY GOOD REASON (namely data correctness / protection from corruption), https://issues.apache.org/jira/browse/CASSANDRA-13304 has been sitting patch available for OVER A YEAR. So again: we have a Cassandra native protocol, and we have a process for changing it, and that process is contributor agnostic. Anyone who wants a change can submit a patch, and it'll get reviewed, and maybe if it's a good idea, it'll get committed, but the chances of a review leading to a commit without an implementation is nearly zero. Would be happy to see this thread die now. There's nothing new coming out of it. - Jeff On Tue, Apr 24, 2018 at 8:30 AM, Eric Stevens wrote: > Let met just say that as an observer to this conversation -- and someone > who believes that compatibility, extensibility, and frankly competition > bring out the best in products -- I'm fairly surprised and disappointed > with the apparent hostility many community members have shown toward a > sincere attempt by another open source product to find common ground here. > > Yes, Scylla has a competing OSS project (albeit under a different > license). They also have a business built around it. It's hard for me to > see that as dramatically different than the DataStax relationship to this > community. Though I would love to be shown why. >