Re: [Marketing] For Review: Performance Benchmarking of Apache Cassandra in the Cloud

2022-08-26 Thread Sharan Foga
Hi Chris

Added a few suggestions (and I see some others have too!). As always please 
feel free to use or ignore as you think.

Thanks
Sharan

On 2022/08/24 21:35:01 Chris Thornett wrote:
> Here is Part 1 in a series of 3 on performance benchmarking in Apache
> Cassandra by Daniel Seybold:
> https://docs.google.com/document/d/1eMFYEOp8lNxZCYelYCWj6jXZ-VaJGNbl2YE3jLWRdOA/edit?usp=sharing
> 
> We are opening this up for 72-hour community review. Please add your amends
> in the comments—thanks very much!
> 
> We are looking at 30 August for publication.
> 
> Thanks,
> -- 
> 
> Chris Thornett
> Senior Content Strategist, Constantia.io
> 


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-26 Thread Benjamin Lerer
Views (even only projection view) is a completely new feature with its own
set of complexities and limitations. My first feeling is that it might not
be as simple as it sounds. There are an important amount of use cases to
cover. It will definitely require its own CEP. :-)

I like Andrés' proposal. It offers some nice and easy to use safeguards.

Le ven. 26 août 2022 à 03:45, Derek Chen-Becker  a
écrit :

> Yes, I was thinking that simple projection views (essentially a SELECT
> statement with application of transform functions) would complement masking
> functions, and from the discussion it sounds like this is basically what
> some of the other databases do. Projection views seem like they would be
> useful in their own right, so would it be proper to write a separate CEP
> for that? I would be happy to help drive that document and discussion. I'm
> not sure if it's the best name, but I'm trying to distinguish views that
> expose a subset of an existing schema vs materialized views, which offer
> more complex capabilities.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022, 3:11 PM Benedict  wrote:
>
>> I’m inclined to agree that this seems a more straightforward approach
>> that makes fewer implied promises.
>>
>> Perhaps we could deliver simple views backed by virtual tables, and model
>> our approach on that of Postgres, MySQL et al?
>>
>> Views in C* would be very simple, just offering a subset of fields with
>> some UDFs applied. It would allow users to define roles with access only to
>> the views, or for applications to use the views for presentation purposes.
>>
>> It feels like a cleaner approach to me, and we’d get two features for the
>> price of one. BUT I don’t feel super strongly about this.
>>
>> On 25 Aug 2022, at 20:16, Derek Chen-Becker 
>> wrote:
>>
>> 
>> To make sure I understand, if I wanted to use a masked column for a
>> conditional update, you're saying we would need SELECT_MASKED to use it in
>> the IF clause? I worry that this proposal is increasing in complexity; I
>> would actually be OK starting with something smaller in scope. Perhaps just
>> providing the masking functions and not tying masking to schema would be
>> sufficient for an initial goal? That wouldn't preclude additional
>> permissions, schema integration, or perhaps just plain Views in the future.
>>
>> Cheers,
>>
>> Derek
>>
>> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
>> wrote:
>>
>>> I have modified the proposal adding a new SELECT_MASKED permission.
>>> Using masked columns on WHERE/IF clauses would require having SELECT and
>>> either UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in
>>> the query results would always require both SELECT and UNMASK.
>>>
>>> This way we can have the best of both worlds, allowing admins to decide
>>> whether they trust their immediate users or not. wdyt?
>>>
>>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>>> wrote:
>>>
 This is the difference between security and compliance I guess :-D

 The way I see this, the attacker or threat in this concept is not the
 developer with access to the database. Rather a feature like this is just a
 convenient way to apply some masking rule in a centralized way. The
 protection is against an end user of the application, who should not be
 able to see the personal data of someone else. Or themselves, even. As long
 as the application end user doesn't have access to run arbitrary CQL, then
 these frorms of masking prevent accidental unauthorized use/leaking of
 personal data.

 henrik



 On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:

> Is it typical for a masking feature to make no effort to prevent
> unmasking? I’m just struggling to see the value of this without such
> mechanisms. Otherwise it’s just a default formatter, and we should 
> consider
> renaming the feature IMO
>
> On 23 Aug 2022, at 21:27, Andrés de la Peña 
> wrote:
>
> 
> As mentioned in the CEP document, dynamic data masking doesn't try to
> prevent malicious users with SELECT permissions to indirectly guess the
> real value of the masked value. This can easily be done by just trying
> values on the WHERE clause of SELECT queries. DDM would not be a
> replacement for proper column-level permissions.
>
> The data served by the database is usually consumed by applications
> that present this data to end users. These end users are not necessarily
> the users directly connecting to the database. With DDM, it would be easy
> for applications to mask sensitive data that is going to be consumed by 
> the
> end users. However, the users directly connecting to the database should 
> be
> trusted, provided that they have the right SELECT permissions.
>
> In other words, DDM doesn't directly protect the data, but it eases
> the production of protected data.
>
> Said that, we co

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-26 Thread Andrés de la Peña
>
> Yes, I was thinking that simple projection views (essentially a SELECT
> statement with application of transform functions) would complement masking
> functions, and from the discussion it sounds like this is basically what
> some of the other databases do.


I don't see that the mentioned databases in general suggest using views for
dynamic data masking. So far, I have only seen this this blog post entry
 suggesting to
use MySQL's not-materialized views with masking functions, probably because
MySQL lacks the more sophisticated mechanisms for data masking that other
databases offer.

However, using MySQL views can allow malicious users to run queries to
infer the masked data, which is what we were trying to avoid. For example:

CREATE TABLE employees(
 id INT NOT NULL AUTO_INCREMENT,
 name VARCHAR(100) NOT NULL,
 PRIMARY KEY (id));

CREATE VIEW employee_mask AS SELECT
  id,
  mask_inner(name, 1, 0, _binary'*') AS name
  FROM employees;

INSERT INTO employees(name) SELECT "Joseph";
INSERT INTO employees(name) SELECT "Olivia";

SELECT * FROM employee_mask WHERE name="Joseph";
+++
| id | name   |
+++
|  1 | J* |
+++

On Fri, 26 Aug 2022 at 02:45, Derek Chen-Becker 
wrote:

> Yes, I was thinking that simple projection views (essentially a SELECT
> statement with application of transform functions) would complement masking
> functions, and from the discussion it sounds like this is basically what
> some of the other databases do. Projection views seem like they would be
> useful in their own right, so would it be proper to write a separate CEP
> for that? I would be happy to help drive that document and discussion. I'm
> not sure if it's the best name, but I'm trying to distinguish views that
> expose a subset of an existing schema vs materialized views, which offer
> more complex capabilities.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022, 3:11 PM Benedict  wrote:
>
>> I’m inclined to agree that this seems a more straightforward approach
>> that makes fewer implied promises.
>>
>> Perhaps we could deliver simple views backed by virtual tables, and model
>> our approach on that of Postgres, MySQL et al?
>>
>> Views in C* would be very simple, just offering a subset of fields with
>> some UDFs applied. It would allow users to define roles with access only to
>> the views, or for applications to use the views for presentation purposes.
>>
>> It feels like a cleaner approach to me, and we’d get two features for the
>> price of one. BUT I don’t feel super strongly about this.
>>
>> On 25 Aug 2022, at 20:16, Derek Chen-Becker 
>> wrote:
>>
>> 
>> To make sure I understand, if I wanted to use a masked column for a
>> conditional update, you're saying we would need SELECT_MASKED to use it in
>> the IF clause? I worry that this proposal is increasing in complexity; I
>> would actually be OK starting with something smaller in scope. Perhaps just
>> providing the masking functions and not tying masking to schema would be
>> sufficient for an initial goal? That wouldn't preclude additional
>> permissions, schema integration, or perhaps just plain Views in the future.
>>
>> Cheers,
>>
>> Derek
>>
>> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
>> wrote:
>>
>>> I have modified the proposal adding a new SELECT_MASKED permission.
>>> Using masked columns on WHERE/IF clauses would require having SELECT and
>>> either UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in
>>> the query results would always require both SELECT and UNMASK.
>>>
>>> This way we can have the best of both worlds, allowing admins to decide
>>> whether they trust their immediate users or not. wdyt?
>>>
>>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>>> wrote:
>>>
 This is the difference between security and compliance I guess :-D

 The way I see this, the attacker or threat in this concept is not the
 developer with access to the database. Rather a feature like this is just a
 convenient way to apply some masking rule in a centralized way. The
 protection is against an end user of the application, who should not be
 able to see the personal data of someone else. Or themselves, even. As long
 as the application end user doesn't have access to run arbitrary CQL, then
 these frorms of masking prevent accidental unauthorized use/leaking of
 personal data.

 henrik



 On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:

> Is it typical for a masking feature to make no effort to prevent
> unmasking? I’m just struggling to see the value of this without such
> mechanisms. Otherwise it’s just a default formatter, and we should 
> consider
> renaming the feature IMO
>
> On 23 Aug 2022, at 21:27, Andrés de la Peña 
> wrote:
>
> 
> As mentioned in the CEP document, dynamic data masking doesn't try to
> prevent malicious users w

New episode of the Apache Cassandra (R) Corner podcast!

2022-08-26 Thread Aaron Ploetz
Link to next episode:

Ep8 - Sarma Pydipally (Udemy instructor, open source dev)
https://drive.google.com/file/d/15-qRcWOyLwQi5lsY06rUYBdLViXGk_xs/view?usp=sharing

(You may have to download it to listen)

It will remain in staging for 72 hours, going live (assuming no objections)
by Wednesday, August 31st.

If anyone should have any questions, comments, or if you want to be a
guest, please reach out to me.

For my guest pipeline, I have recording sessions scheduled with:
- Otavio Santana (Java Champion and Open Source Committer w/ the Eclipse
Foundation)


Thanks everyone!

Aaron Ploetz


CEP Creation Permissions

2022-08-26 Thread Jackson Fleming
Hi,

I would like to create a draft CEP on confluence, can you please grant my
confluence user (jfleming) permissions.


Thanks,
Jackson


Re: [VOTE] Release Apache Cassandra 4.0.6

2022-08-26 Thread Rahul Xavier Singh
+1 nb
Rahul Singh

Chief Executive Officer | Business Platform Architect m: 202.905.2818 e:
rahul.si...@anant.us li: http://linkedin.com/in/xingh ca:
http://calendly.com/xingh

*We create, support, and manage real-time global data & analytics platforms
for the modern enterprise.*

*Anant | https://anant.us *

3 Washington Circle, Suite 301

Washington, D.C. 20037

*http://Cassandra.Link * : The best resources for
Apache Cassandra


On Tue, Aug 23, 2022 at 8:56 AM Berenguer Blasi 
wrote:

> +1
> On 23/8/22 14:50, Ekaterina Dimitrova wrote:
>
>
> +1(nb)
> On Tue, 23 Aug 2022 at 8:49, Josh McKenzie  wrote:
>
>> +1
>>
>> On Tue, Aug 23, 2022, at 6:47 AM, Benjamin Lerer wrote:
>>
>> +1
>>
>> Le mar. 23 août 2022 à 11:30, Andrés de la Peña 
>> a écrit :
>>
>> +1 (nb)
>>
>> On Tue, 23 Aug 2022 at 06:14, Tommy Stendahl via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> +1 nb
>>
>> -Original Message-
>> *From*: Brandon Williams > >
>> *Reply-To*: dev@cassandra.apache.org
>> *To*: dev > >
>> *Subject*: Re: [VOTE] Release Apache Cassandra 4.0.6
>> *Date*: Mon, 22 Aug 2022 17:47:59 -0500
>>
>> +1
>>
>> On Sun, Aug 21, 2022 at 7:44 AM Mick Semb Wever <
>>
>> m...@apache.org
>>
>> > wrote:
>>
>> Proposing the test build of Cassandra 4.0.6 for release.
>>
>> sha1: eb2375718483f4c360810127ae457f2a26ccce67
>>
>> Git:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.6-tentative
>>
>> Maven Artifacts:
>>
>> https://repository.apache.org/content/repositories/orgapachecassandra-/org/apache/cassandra/cassandra-all/4.0.6/
>>
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here:
>>
>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.6/
>>
>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>> [1]: CHANGES.txt:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.6-tentative
>>
>> [2]: NEWS.txt:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.6-tentative
>>
>>


Re: [Marketing] For Review: Learn How CommitLog Works in Apache Cassandra

2022-08-26 Thread Rahul Xavier Singh
Added a comment about "ACID". I would recommend not saying ACID until it's
there. C* has strong consistency when needed. It doesn't for example
guarantee that two competing mutations will be executed (or be able to be
rolled back to the previous state) in the same exact order they were
intended if they come in at the same time, especially if these are coming
from two different data centers for example.

Maybe it can be explained later that the commitlog mechanism provides
ACID-like features ... ?

>From my understanding the Accord white paper has not been implemented into
any working Cassandra code. I may be wrong.


Rahul Singh

Chief Executive Officer | Business Platform Architect m: 202.905.2818 e:
rahul.si...@anant.us li: http://linkedin.com/in/xingh ca:
http://calendly.com/xingh

*We create, support, and manage real-time global data & analytics platforms
for the modern enterprise.*

*Anant | https://anant.us *

3 Washington Circle, Suite 301

Washington, D.C. 20037

*http://Cassandra.Link * : The best resources for
Apache Cassandra


On Tue, Aug 23, 2022 at 5:43 AM Sharan Foga  wrote:

> Hi Chris
>
> I've added a few comments and suggestions. Please feel free to use /ignore
> whichever ones you think :-)
>
> Thanks
> Sharan
>
> On 2022/08/23 00:08:52 Chris Thornett wrote:
> > Opening up Alex Sorokoumov's guide 'Learn How CommitLog Works in Apache
> > Cassandra' for a 72-hr community review by lazy consensus.
> >
> > Please add any amends and suggestions in the comments:
> >
> https://docs.google.com/document/d/1cyOi-IeU_I9GBkpQbJS6IIrmemAesEqvzLb-eeFs_rM/edit#
> >
> > Thanks!
> >
> > --
> >
> > Chris Thornett
> > Senior Content Strategist, Constantia.io
> >
>


Re: CEP Creation Permissions

2022-08-26 Thread Brandon Williams
I think you should have access now.

Kind Regards,
Brandon


On Fri, Aug 26, 2022 at 4:52 PM Jackson Fleming
 wrote:
>
> Hi,
>
> I would like to create a draft CEP on confluence, can you please grant my 
> confluence user (jfleming) permissions.
>
>
> Thanks,
> Jackson


Re: [Marketing] For Review: Learn How CommitLog Works in Apache Cassandra

2022-08-26 Thread Alexander Sorokoumov
Hey Rhaul,

Thanks for the feedback. I have changed it to just durability (without
mentioning ACID) to prevent confusion.

Best,
Alex

On Fri, Aug 26, 2022 at 11:53 PM Rahul Xavier Singh <
rahul.xavier.si...@gmail.com> wrote:

> Added a comment about "ACID". I would recommend not saying ACID until it's
> there. C* has strong consistency when needed. It doesn't for example
> guarantee that two competing mutations will be executed (or be able to be
> rolled back to the previous state) in the same exact order they were
> intended if they come in at the same time, especially if these are coming
> from two different data centers for example.
>
> Maybe it can be explained later that the commitlog mechanism provides
> ACID-like features ... ?
>
> From my understanding the Accord white paper has not been implemented into
> any working Cassandra code. I may be wrong.
>
>
> Rahul Singh
>
> Chief Executive Officer | Business Platform Architect m: 202.905.2818 e:
> rahul.si...@anant.us li: http://linkedin.com/in/xingh ca:
> http://calendly.com/xingh
>
> *We create, support, and manage real-time global data & analytics
> platforms for the modern enterprise.*
>
> *Anant | https://anant.us *
>
> 3 Washington Circle, Suite 301
>
> Washington, D.C. 20037
>
> *http://Cassandra.Link * : The best resources for
> Apache Cassandra
>
>
> On Tue, Aug 23, 2022 at 5:43 AM Sharan Foga  wrote:
>
>> Hi Chris
>>
>> I've added a few comments and suggestions. Please feel free to use
>> /ignore whichever ones you think :-)
>>
>> Thanks
>> Sharan
>>
>> On 2022/08/23 00:08:52 Chris Thornett wrote:
>> > Opening up Alex Sorokoumov's guide 'Learn How CommitLog Works in Apache
>> > Cassandra' for a 72-hr community review by lazy consensus.
>> >
>> > Please add any amends and suggestions in the comments:
>> >
>> https://docs.google.com/document/d/1cyOi-IeU_I9GBkpQbJS6IIrmemAesEqvzLb-eeFs_rM/edit#
>> >
>> > Thanks!
>> >
>> > --
>> >
>> > Chris Thornett
>> > Senior Content Strategist, Constantia.io
>> >
>>
>