[RESULT][VOTE] Release Apache Cassandra 4.0.6

2022-08-25 Thread Mick Semb Wever
The vote passes with elleven +1's (four binding) and no vetoes.




On Tue, 23 Aug 2022 at 14:56, Berenguer Blasi 
wrote:

> +1
> On 23/8/22 14:50, Ekaterina Dimitrova wrote:
>
>
> +1(nb)
> On Tue, 23 Aug 2022 at 8:49, Josh McKenzie  wrote:
>
>> +1
>>
>> On Tue, Aug 23, 2022, at 6:47 AM, Benjamin Lerer wrote:
>>
>> +1
>>
>> Le mar. 23 août 2022 à 11:30, Andrés de la Peña 
>> a écrit :
>>
>> +1 (nb)
>>
>> On Tue, 23 Aug 2022 at 06:14, Tommy Stendahl via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> +1 nb
>>
>> -Original Message-
>> *From*: Brandon Williams > >
>> *Reply-To*: dev@cassandra.apache.org
>> *To*: dev > >
>> *Subject*: Re: [VOTE] Release Apache Cassandra 4.0.6
>> *Date*: Mon, 22 Aug 2022 17:47:59 -0500
>>
>> +1
>>
>> On Sun, Aug 21, 2022 at 7:44 AM Mick Semb Wever <
>>
>> m...@apache.org
>>
>> > wrote:
>>
>> Proposing the test build of Cassandra 4.0.6 for release.
>>
>> sha1: eb2375718483f4c360810127ae457f2a26ccce67
>>
>> Git:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0.6-tentative
>>
>> Maven Artifacts:
>>
>> https://repository.apache.org/content/repositories/orgapachecassandra-/org/apache/cassandra/cassandra-all/4.0.6/
>>
>> The Source and Build Artifacts, and the Debian and RPM packages and 
>> repositories, are available here:
>>
>> https://dist.apache.org/repos/dist/dev/cassandra/4.0.6/
>>
>> The vote will be open for 72 hours (longer if needed). Everyone who has 
>> tested the build is invited to vote. Votes by PMC members are considered 
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>> [1]: CHANGES.txt:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0.6-tentative
>>
>> [2]: NEWS.txt:
>>
>> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0.6-tentative
>>
>>


[RELEASE] Apache Cassandra 4.0.6 released

2022-08-25 Thread Mick Semb Wever
The Cassandra team is pleased to announce the release of Apache Cassandra
version 4.0.6.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 4.0 series. As always, please
pay attention to the release notes[2] and Let us know[3] if you were to
encounter any problem.

[WARNING] Debian and RedHat package repositories have moved! Debian
`/etc/apt/sources.list.d/cassandra.sources.list` and RedHat
`/etc/yum.repos.d/cassandra.repo` files must be updated to the new
repository URLs. For Debian it is now https://debian.cassandra.apache.org .
For RedHat it is now https://redhat.cassandra.apache.org/40x/ .

Enjoy!

[1]: CHANGES.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.6
[2]: NEWS.txt
https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.6
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Invitation to take the 2022 ASF Community Survey

2022-08-25 Thread Paulo Motta
 Hello everyone,

The 2022 ASF Community Survey is looking to gather scientific data that
allows us to understand our community better, both in its demographic
composition, and also in collaboration styles and preferences. We want to
find areas where we can continue to do great work, and others where we need
to provide more support so that our projects can keep growing healthy and
diverse.

If you have an apache.org email, you should have received an email with an
invitation to take the 2022 ASF Community Survey. Please take 15 minutes to
complete it.

If you do not have an apache.org email address or you didn’t receive a
link, please follow this link to the survey:

https://edi-asf.limesurvey.net/912832?lang=en

You can find information about privacy on the survey’s Confluence page.
 The
last surveys of this kind were implemented in 2016 and 2020, which means we
are finally in a position to see trends over time.

Your participation is paramount to the success of this project! Please
consider filling out the survey, and share this news with your fellow
Apache contributors. As individuals form the Apache community, your opinion
matters: we want to hear your voice.

If you have any questions about the survey or otherwise, please reach out
to  Katia Rojas, ASF V.P. of Diversity and Inclusion.

Thanks,

Paulo


Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
I have modified the proposal adding a new SELECT_MASKED permission. Using
masked columns on WHERE/IF clauses would require having SELECT and either
UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
query results would always require both SELECT and UNMASK.

This way we can have the best of both worlds, allowing admins to decide
whether they trust their immediate users or not. wdyt?

On Wed, 24 Aug 2022 at 16:06, Henrik Ingo  wrote:

> This is the difference between security and compliance I guess :-D
>
> The way I see this, the attacker or threat in this concept is not the
> developer with access to the database. Rather a feature like this is just a
> convenient way to apply some masking rule in a centralized way. The
> protection is against an end user of the application, who should not be
> able to see the personal data of someone else. Or themselves, even. As long
> as the application end user doesn't have access to run arbitrary CQL, then
> these frorms of masking prevent accidental unauthorized use/leaking of
> personal data.
>
> henrik
>
>
>
> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>
>> Is it typical for a masking feature to make no effort to prevent
>> unmasking? I’m just struggling to see the value of this without such
>> mechanisms. Otherwise it’s just a default formatter, and we should consider
>> renaming the feature IMO
>>
>> On 23 Aug 2022, at 21:27, Andrés de la Peña  wrote:
>>
>> 
>> As mentioned in the CEP document, dynamic data masking doesn't try to
>> prevent malicious users with SELECT permissions to indirectly guess the
>> real value of the masked value. This can easily be done by just trying
>> values on the WHERE clause of SELECT queries. DDM would not be a
>> replacement for proper column-level permissions.
>>
>> The data served by the database is usually consumed by applications that
>> present this data to end users. These end users are not necessarily the
>> users directly connecting to the database. With DDM, it would be easy for
>> applications to mask sensitive data that is going to be consumed by the end
>> users. However, the users directly connecting to the database should be
>> trusted, provided that they have the right SELECT permissions.
>>
>> In other words, DDM doesn't directly protect the data, but it eases the
>> production of protected data.
>>
>> Said that, we could later go one step ahead and add a way to prevent
>> untrusted users from inferring the masked data. That could be done adding a
>> new permission required to use certain columns on WHERE clauses, different
>> to the current SELECT permission. That would play especially well with
>> column-level permissions, which is something that we still have pending.
>>
>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz  wrote:
>>
>>> Applying this should prevent querying on a field, else you could leak
 its contents, surely?

>>>
>>> In theory, yes.  Although I could see folks doing something like this:
>>>
>>> SELECT COUNT(*) FROM patients
>>> WHERE year_of_birth = 2002
>>> AND date_of_birth >= '2002-04-01'
>>> AND date_of_birth < '2002-11-01';
>>>
>>> In this case, the rows containing the masked key column(s) could be
>>> filtered on without revealing the actual data.  But again, that's probably
>>> better for a "phase 2" of the implementation.
>>>
>>> Agreed on not being a queryable field. That would also preclude
 secondary indexing, right?
>>>
>>>
>>> Yes, that's my thought as well.
>>>
>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <
>>> de...@chen-becker.org> wrote:
>>>
 Agreed on not being a queryable field. That would also preclude
 secondary indexing, right?

 On Tue, Aug 23, 2022 at 11:20 AM Benedict  wrote:

> Applying this should prevent querying on a field, else you could leak
> its contents, surely? This pretty much prohibits using it in a clustering
> key, and a partition key with the ordered partitioner - but probably also 
> a
> hashed partitioner since we do not use a cryptographic hash and the hash
> function is well defined.
>
> We probably also need to ensure that any ALLOW FILTERING queries on
> such a field are disabled.
>
> Plausibly the data could be cryptographically jumbled before using it
> in a primary key component (or permitting filtering), but it is probably
> easier and safer to exclude for now…
>
> On 23 Aug 2022, at 18:13, Aaron Ploetz  wrote:
>
> 
> Some thoughts on this one:
>
> In a prior job, we'd give app teams access to a single keyspace, and
> two roles: a read-write role and a read-only role.  In some cases, a
> "privileged" application role was also requested.  Depending on the
> requirements, I could see the UNMASK permission being applied to the RW or
> privileged roles.  But if there's a problem on the table and the operators
> go in to investigate, they will likely use a SUPERUSER account, and 
> t

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Derek Chen-Becker
To make sure I understand, if I wanted to use a masked column for a
conditional update, you're saying we would need SELECT_MASKED to use it in
the IF clause? I worry that this proposal is increasing in complexity; I
would actually be OK starting with something smaller in scope. Perhaps just
providing the masking functions and not tying masking to schema would be
sufficient for an initial goal? That wouldn't preclude additional
permissions, schema integration, or perhaps just plain Views in the future.

Cheers,

Derek

On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
wrote:

> I have modified the proposal adding a new SELECT_MASKED permission. Using
> masked columns on WHERE/IF clauses would require having SELECT and either
> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
> query results would always require both SELECT and UNMASK.
>
> This way we can have the best of both worlds, allowing admins to decide
> whether they trust their immediate users or not. wdyt?
>
> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
> wrote:
>
>> This is the difference between security and compliance I guess :-D
>>
>> The way I see this, the attacker or threat in this concept is not the
>> developer with access to the database. Rather a feature like this is just a
>> convenient way to apply some masking rule in a centralized way. The
>> protection is against an end user of the application, who should not be
>> able to see the personal data of someone else. Or themselves, even. As long
>> as the application end user doesn't have access to run arbitrary CQL, then
>> these frorms of masking prevent accidental unauthorized use/leaking of
>> personal data.
>>
>> henrik
>>
>>
>>
>> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>>
>>> Is it typical for a masking feature to make no effort to prevent
>>> unmasking? I’m just struggling to see the value of this without such
>>> mechanisms. Otherwise it’s just a default formatter, and we should consider
>>> renaming the feature IMO
>>>
>>> On 23 Aug 2022, at 21:27, Andrés de la Peña 
>>> wrote:
>>>
>>> 
>>> As mentioned in the CEP document, dynamic data masking doesn't try to
>>> prevent malicious users with SELECT permissions to indirectly guess the
>>> real value of the masked value. This can easily be done by just trying
>>> values on the WHERE clause of SELECT queries. DDM would not be a
>>> replacement for proper column-level permissions.
>>>
>>> The data served by the database is usually consumed by applications that
>>> present this data to end users. These end users are not necessarily the
>>> users directly connecting to the database. With DDM, it would be easy for
>>> applications to mask sensitive data that is going to be consumed by the end
>>> users. However, the users directly connecting to the database should be
>>> trusted, provided that they have the right SELECT permissions.
>>>
>>> In other words, DDM doesn't directly protect the data, but it eases the
>>> production of protected data.
>>>
>>> Said that, we could later go one step ahead and add a way to prevent
>>> untrusted users from inferring the masked data. That could be done adding a
>>> new permission required to use certain columns on WHERE clauses, different
>>> to the current SELECT permission. That would play especially well with
>>> column-level permissions, which is something that we still have pending.
>>>
>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz 
>>> wrote:
>>>
 Applying this should prevent querying on a field, else you could leak
> its contents, surely?
>

 In theory, yes.  Although I could see folks doing something like this:

 SELECT COUNT(*) FROM patients
 WHERE year_of_birth = 2002
 AND date_of_birth >= '2002-04-01'
 AND date_of_birth < '2002-11-01';

 In this case, the rows containing the masked key column(s) could be
 filtered on without revealing the actual data.  But again, that's probably
 better for a "phase 2" of the implementation.

 Agreed on not being a queryable field. That would also preclude
> secondary indexing, right?


 Yes, that's my thought as well.

 On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <
 de...@chen-becker.org> wrote:

> Agreed on not being a queryable field. That would also preclude
> secondary indexing, right?
>
> On Tue, Aug 23, 2022 at 11:20 AM Benedict  wrote:
>
>> Applying this should prevent querying on a field, else you could leak
>> its contents, surely? This pretty much prohibits using it in a clustering
>> key, and a partition key with the ordered partitioner - but probably 
>> also a
>> hashed partitioner since we do not use a cryptographic hash and the hash
>> function is well defined.
>>
>> We probably also need to ensure that any ALLOW FILTERING queries on
>> such a field are disabled.
>>
>> Plausibly the data could be cryptographically jumbled before using it
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
Note that conditional updates return true or false to notify whether the
update has happened or not. That can also be exploited to infer the masked
data. Indeed, at the moment they also require SELECT permissions.

The masking functions can always be used on their own, as any other CQL
function and without necessarily associating them to the schema.

You would only need either UNMASK or SELECT_MASKED permissions for a
conditional update if the masking function is attached to the column
declaration in the schema of the table.

There is a timeline section

of the CEP listing the planned development steps. The first step is adding
the functions on their own. The next steps are for allowing to attach those
functions to the columns with the mentioned permissions.

On Thu, 25 Aug 2022 at 20:16, Derek Chen-Becker 
wrote:

> To make sure I understand, if I wanted to use a masked column for a
> conditional update, you're saying we would need SELECT_MASKED to use it in
> the IF clause? I worry that this proposal is increasing in complexity; I
> would actually be OK starting with something smaller in scope. Perhaps just
> providing the masking functions and not tying masking to schema would be
> sufficient for an initial goal? That wouldn't preclude additional
> permissions, schema integration, or perhaps just plain Views in the future.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
> wrote:
>
>> I have modified the proposal adding a new SELECT_MASKED permission. Using
>> masked columns on WHERE/IF clauses would require having SELECT and either
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
>> query results would always require both SELECT and UNMASK.
>>
>> This way we can have the best of both worlds, allowing admins to decide
>> whether they trust their immediate users or not. wdyt?
>>
>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>> wrote:
>>
>>> This is the difference between security and compliance I guess :-D
>>>
>>> The way I see this, the attacker or threat in this concept is not the
>>> developer with access to the database. Rather a feature like this is just a
>>> convenient way to apply some masking rule in a centralized way. The
>>> protection is against an end user of the application, who should not be
>>> able to see the personal data of someone else. Or themselves, even. As long
>>> as the application end user doesn't have access to run arbitrary CQL, then
>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>> personal data.
>>>
>>> henrik
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>>>
 Is it typical for a masking feature to make no effort to prevent
 unmasking? I’m just struggling to see the value of this without such
 mechanisms. Otherwise it’s just a default formatter, and we should consider
 renaming the feature IMO

 On 23 Aug 2022, at 21:27, Andrés de la Peña 
 wrote:

 
 As mentioned in the CEP document, dynamic data masking doesn't try to
 prevent malicious users with SELECT permissions to indirectly guess the
 real value of the masked value. This can easily be done by just trying
 values on the WHERE clause of SELECT queries. DDM would not be a
 replacement for proper column-level permissions.

 The data served by the database is usually consumed by applications
 that present this data to end users. These end users are not necessarily
 the users directly connecting to the database. With DDM, it would be easy
 for applications to mask sensitive data that is going to be consumed by the
 end users. However, the users directly connecting to the database should be
 trusted, provided that they have the right SELECT permissions.

 In other words, DDM doesn't directly protect the data, but it eases the
 production of protected data.

 Said that, we could later go one step ahead and add a way to prevent
 untrusted users from inferring the masked data. That could be done adding a
 new permission required to use certain columns on WHERE clauses, different
 to the current SELECT permission. That would play especially well with
 column-level permissions, which is something that we still have pending.

 On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz 
 wrote:

> Applying this should prevent querying on a field, else you could leak
>> its contents, surely?
>>
>
> In theory, yes.  Although I could see folks doing something like this:
>
> SELECT COUNT(*) FROM patients
> WHERE year_of_birth = 2002
> AND date_of_birth >= '2002-04-01'
> AND date_of_birth < '2002-11-01';
>
> In this case, the rows containing the masked key column(s) could be
> filtered on without revealing the actual data.  But again, that's prob

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Benedict
I’m inclined to agree that this seems a more straightforward approach that 
makes fewer implied promises.

Perhaps we could deliver simple views backed by virtual tables, and model our 
approach on that of Postgres, MySQL et al?

Views in C* would be very simple, just offering a subset of fields with some 
UDFs applied. It would allow users to define roles with access only to the 
views, or for applications to use the views for presentation purposes.

It feels like a cleaner approach to me, and we’d get two features for the price 
of one. BUT I don’t feel super strongly about this.

> On 25 Aug 2022, at 20:16, Derek Chen-Becker  wrote:
> 
> 
> To make sure I understand, if I wanted to use a masked column for a 
> conditional update, you're saying we would need SELECT_MASKED to use it in 
> the IF clause? I worry that this proposal is increasing in complexity; I 
> would actually be OK starting with something smaller in scope. Perhaps just 
> providing the masking functions and not tying masking to schema would be 
> sufficient for an initial goal? That wouldn't preclude additional 
> permissions, schema integration, or perhaps just plain Views in the future.
> 
> Cheers,
> 
> Derek
> 
>> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña  
>> wrote:
>> I have modified the proposal adding a new SELECT_MASKED permission. Using 
>> masked columns on WHERE/IF clauses would require having SELECT and either 
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the query 
>> results would always require both SELECT and UNMASK.
>> 
>> This way we can have the best of both worlds, allowing admins to decide 
>> whether they trust their immediate users or not. wdyt?
>> 
>>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo  wrote:
>>> This is the difference between security and compliance I guess :-D
>>> 
>>> The way I see this, the attacker or threat in this concept is not the 
>>> developer with access to the database. Rather a feature like this is just a 
>>> convenient way to apply some masking rule in a centralized way. The 
>>> protection is against an end user of the application, who should not be 
>>> able to see the personal data of someone else. Or themselves, even. As long 
>>> as the application end user doesn't have access to run arbitrary CQL, then 
>>> these frorms of masking prevent accidental unauthorized use/leaking of 
>>> personal data.
>>> 
>>> henrik
>>> 
>>> 
>>> 
 On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
 Is it typical for a masking feature to make no effort to prevent 
 unmasking? I’m just struggling to see the value of this without such 
 mechanisms. Otherwise it’s just a default formatter, and we should 
 consider renaming the feature IMO
 
>> On 23 Aug 2022, at 21:27, Andrés de la Peña  wrote:
>> 
> 
> As mentioned in the CEP document, dynamic data masking doesn't try to 
> prevent malicious users with SELECT permissions to indirectly guess the 
> real value of the masked value. This can easily be done by just trying 
> values on the WHERE clause of SELECT queries. DDM would not be a 
> replacement for proper column-level permissions.
> 
> The data served by the database is usually consumed by applications that 
> present this data to end users. These end users are not necessarily the 
> users directly connecting to the database. With DDM, it would be easy for 
> applications to mask sensitive data that is going to be consumed by the 
> end users. However, the users directly connecting to the database should 
> be trusted, provided that they have the right SELECT permissions.
> 
> In other words, DDM doesn't directly protect the data, but it eases the 
> production of protected data.
> 
> Said that, we could later go one step ahead and add a way to prevent 
> untrusted users from inferring the masked data. That could be done adding 
> a new permission required to use certain columns on WHERE clauses, 
> different to the current SELECT permission. That would play especially 
> well with column-level permissions, which is something that we still have 
> pending. 
> 
> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz  wrote:
>>> Applying this should prevent querying on a field, else you could leak 
>>> its contents, surely?
>> 
>> In theory, yes.  Although I could see folks doing something like this:
>> 
>> SELECT COUNT(*) FROM patients
>> WHERE year_of_birth = 2002
>> AND date_of_birth >= '2002-04-01'
>> AND date_of_birth < '2002-11-01';
>> 
>> In this case, the rows containing the masked key column(s) could be 
>> filtered on without revealing the actual data.  But again, that's 
>> probably better for a "phase 2" of the implementation.
>> 
>>> Agreed on not being a queryable field. That would also preclude 
>>> secondary indexing, right?
>> 
>> Yes, that's my thought as well.

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Andrés de la Peña
>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?


The approach of PostgresSQL

allows attaching masking functions to columns and users with commands such
as:

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$**$$,2)';

MySQL however does only provide the masking functions without the ability
to attaching them to neither columns or users, as far as I know.

The most similar to the proposed one is the approach of Azure/SQL Server,
which is almost identical except for the CEP trying to address the recent
concerns about querying masked columns.



On Thu, 25 Aug 2022 at 22:10, Benedict  wrote:

> I’m inclined to agree that this seems a more straightforward approach that
> makes fewer implied promises.
>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?
>
> Views in C* would be very simple, just offering a subset of fields with
> some UDFs applied. It would allow users to define roles with access only to
> the views, or for applications to use the views for presentation purposes.
>
> It feels like a cleaner approach to me, and we’d get two features for the
> price of one. BUT I don’t feel super strongly about this.
>
> On 25 Aug 2022, at 20:16, Derek Chen-Becker  wrote:
>
> 
> To make sure I understand, if I wanted to use a masked column for a
> conditional update, you're saying we would need SELECT_MASKED to use it in
> the IF clause? I worry that this proposal is increasing in complexity; I
> would actually be OK starting with something smaller in scope. Perhaps just
> providing the masking functions and not tying masking to schema would be
> sufficient for an initial goal? That wouldn't preclude additional
> permissions, schema integration, or perhaps just plain Views in the future.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
> wrote:
>
>> I have modified the proposal adding a new SELECT_MASKED permission. Using
>> masked columns on WHERE/IF clauses would require having SELECT and either
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
>> query results would always require both SELECT and UNMASK.
>>
>> This way we can have the best of both worlds, allowing admins to decide
>> whether they trust their immediate users or not. wdyt?
>>
>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>> wrote:
>>
>>> This is the difference between security and compliance I guess :-D
>>>
>>> The way I see this, the attacker or threat in this concept is not the
>>> developer with access to the database. Rather a feature like this is just a
>>> convenient way to apply some masking rule in a centralized way. The
>>> protection is against an end user of the application, who should not be
>>> able to see the personal data of someone else. Or themselves, even. As long
>>> as the application end user doesn't have access to run arbitrary CQL, then
>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>> personal data.
>>>
>>> henrik
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>>>
 Is it typical for a masking feature to make no effort to prevent
 unmasking? I’m just struggling to see the value of this without such
 mechanisms. Otherwise it’s just a default formatter, and we should consider
 renaming the feature IMO

 On 23 Aug 2022, at 21:27, Andrés de la Peña 
 wrote:

 
 As mentioned in the CEP document, dynamic data masking doesn't try to
 prevent malicious users with SELECT permissions to indirectly guess the
 real value of the masked value. This can easily be done by just trying
 values on the WHERE clause of SELECT queries. DDM would not be a
 replacement for proper column-level permissions.

 The data served by the database is usually consumed by applications
 that present this data to end users. These end users are not necessarily
 the users directly connecting to the database. With DDM, it would be easy
 for applications to mask sensitive data that is going to be consumed by the
 end users. However, the users directly connecting to the database should be
 trusted, provided that they have the right SELECT permissions.

 In other words, DDM doesn't directly protect the data, but it eases the
 production of protected data.

 Said that, we could later go one step ahead and add a way to prevent
 untrusted users from inferring the masked data. That could be done adding a
 new permission required to use certain columns on WHERE clauses, different
 to the current SELECT permission. That would play especially well with
 column-level permissions, which is something that we still have pending.

 On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz 
>>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

2022-08-25 Thread Derek Chen-Becker
Yes, I was thinking that simple projection views (essentially a SELECT
statement with application of transform functions) would complement masking
functions, and from the discussion it sounds like this is basically what
some of the other databases do. Projection views seem like they would be
useful in their own right, so would it be proper to write a separate CEP
for that? I would be happy to help drive that document and discussion. I'm
not sure if it's the best name, but I'm trying to distinguish views that
expose a subset of an existing schema vs materialized views, which offer
more complex capabilities.

Cheers,

Derek

On Thu, Aug 25, 2022, 3:11 PM Benedict  wrote:

> I’m inclined to agree that this seems a more straightforward approach that
> makes fewer implied promises.
>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?
>
> Views in C* would be very simple, just offering a subset of fields with
> some UDFs applied. It would allow users to define roles with access only to
> the views, or for applications to use the views for presentation purposes.
>
> It feels like a cleaner approach to me, and we’d get two features for the
> price of one. BUT I don’t feel super strongly about this.
>
> On 25 Aug 2022, at 20:16, Derek Chen-Becker  wrote:
>
> 
> To make sure I understand, if I wanted to use a masked column for a
> conditional update, you're saying we would need SELECT_MASKED to use it in
> the IF clause? I worry that this proposal is increasing in complexity; I
> would actually be OK starting with something smaller in scope. Perhaps just
> providing the masking functions and not tying masking to schema would be
> sufficient for an initial goal? That wouldn't preclude additional
> permissions, schema integration, or perhaps just plain Views in the future.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña 
> wrote:
>
>> I have modified the proposal adding a new SELECT_MASKED permission. Using
>> masked columns on WHERE/IF clauses would require having SELECT and either
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
>> query results would always require both SELECT and UNMASK.
>>
>> This way we can have the best of both worlds, allowing admins to decide
>> whether they trust their immediate users or not. wdyt?
>>
>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo 
>> wrote:
>>
>>> This is the difference between security and compliance I guess :-D
>>>
>>> The way I see this, the attacker or threat in this concept is not the
>>> developer with access to the database. Rather a feature like this is just a
>>> convenient way to apply some masking rule in a centralized way. The
>>> protection is against an end user of the application, who should not be
>>> able to see the personal data of someone else. Or themselves, even. As long
>>> as the application end user doesn't have access to run arbitrary CQL, then
>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>> personal data.
>>>
>>> henrik
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict  wrote:
>>>
 Is it typical for a masking feature to make no effort to prevent
 unmasking? I’m just struggling to see the value of this without such
 mechanisms. Otherwise it’s just a default formatter, and we should consider
 renaming the feature IMO

 On 23 Aug 2022, at 21:27, Andrés de la Peña 
 wrote:

 
 As mentioned in the CEP document, dynamic data masking doesn't try to
 prevent malicious users with SELECT permissions to indirectly guess the
 real value of the masked value. This can easily be done by just trying
 values on the WHERE clause of SELECT queries. DDM would not be a
 replacement for proper column-level permissions.

 The data served by the database is usually consumed by applications
 that present this data to end users. These end users are not necessarily
 the users directly connecting to the database. With DDM, it would be easy
 for applications to mask sensitive data that is going to be consumed by the
 end users. However, the users directly connecting to the database should be
 trusted, provided that they have the right SELECT permissions.

 In other words, DDM doesn't directly protect the data, but it eases the
 production of protected data.

 Said that, we could later go one step ahead and add a way to prevent
 untrusted users from inferring the masked data. That could be done adding a
 new permission required to use certain columns on WHERE clauses, different
 to the current SELECT permission. That would play especially well with
 column-level permissions, which is something that we still have pending.

 On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz 
 wrote:

> Applying this should prevent querying on a field, else you could leak
>> its contents, surely?
>>