Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-10 Thread Jon Haddad
Good suggestion Mike.  I'm +1 on the idea and agree the name KEYSPACE is 
confusing to new users.

Jon

On 2023/04/04 15:48:26 Mike Adamson wrote:
> Hi,
> 
> I'd like to propose that we add DATABASE to the CQL grammar as an
> alternative to KEYSPACE.
> 
> Background: While TABLE was introduced as an alternative for COLUMNFAMILY
> in the grammar we have kept KEYSPACE for the container name for a group of
> tables. Nearly all traditional SQL databases use DATABASE as the container
> name for a group of tables so it would make sense for Cassandra to adopt
> this naming as well.
> 
> KEYSPACE would be kept in the grammar but we would update some logging and
> documentation to encourage use of the new name.
> 
> Mike Adamson
> 
> -- 
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
> 
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS Feed]
>    [image: Github Logo]
> 
> 


Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-04-10 Thread Doug Rohrer
I’ve updated the CEP with two overview diagrams of the interactions between 
Sidecar, Cassandra, and the Bulk Analytics library.  Hope this helps folks 
better understand how things work, and thanks for the patience as it took a bit 
longer than expected for me to find the time for this.

Doug

> On Apr 5, 2023, at 11:18 AM, Doug Rohrer  wrote:
> 
> Sorry for the delay in responding here - yes, we can add some diagrams to the 
> CEP - I’ll try to get that done by end-of-week.
> 
> Thanks,
> 
> Doug
> 
>> On Mar 28, 2023, at 1:14 PM, J. D. Jordan  wrote:
>> 
>> Maybe some data flow diagrams could be added to the cep showing some example 
>> operations for read/write?
>> 
>>> On Mar 28, 2023, at 11:35 AM, Yifan Cai  wrote:
>>> 
>>> 
>>> A lot of great discussions! 
>>> 
>>> On the sidecar front, especially what the role sidecar plays in terms of 
>>> this CEP, I feel there might be some confusion. Once the code is published, 
>>> we should have clarity.
>>> Sidecar does not read sstables nor do any coordination for analytics 
>>> queries. It is local to the companion Cassandra instance. For bulk read, it 
>>> takes snapshots and streams sstables to spark workers to read. For bulk 
>>> write, it imports the sstables uploaded from spark workers. All commands 
>>> are existing jmx/nodetool functionalities from Cassandra. Sidecar adds the 
>>> http interface to them. It might be an over simplified description. The 
>>> complex computation is performed in spark clusters only.
>>> 
>>> In the long run, Cassandra might evolve into a database that does both OLTP 
>>> and OLAP. (Not what this thread aims for) 
>>> At the current stage, Spark is very suited for analytic purposes. 
>>> 
>>> On Tue, Mar 28, 2023 at 9:06 AM Benedict >> > wrote:
 I disagree with the first claim, as the process has all the information it 
 chooses to utilise about which resources it’s using and what it’s using 
 those resources for.
 
 The inability to isolate GC domains is something we cannot address, but 
 also probably not a problem if we were doing everything with memory 
 management as well as we could be.
 
 But, not worth detailing this thread for. Today we do very little well on 
 this front within the process, and a separate process is well justified 
 given the state of play.
 
> On 28 Mar 2023, at 16:38, Derek Chen-Becker  > wrote:
> 
> 
> 
> On Tue, Mar 28, 2023 at 9:03 AM Joseph Lynch  > wrote:
> ...
> 
>> I think we might be underselling how valuable JVM isolation is,
>> especially for analytics queries that are going to pass the entire
>> dataset through heap somewhat constantly. 
> 
> Big +1 here. The JVM simply does not have significant granularity of 
> control for resource utilization, but this is explicitly a feature of 
> separate processes. Add in being able to separate GC domains and you can 
> avoid a lot of noisy neighbor in-VM behavior for the disparate workloads.
> 
> Cheers,
> 
> Derek
> 
> 
> -- 
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
> 
>