>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?


The approach of PostgresSQL
<https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/>
allows attaching masking functions to columns and users with commands such
as:

SECURITY LABEL FOR anon ON COLUMN people.phone
IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)';

MySQL however does only provide the masking functions without the ability
to attaching them to neither columns or users, as far as I know.

The most similar to the proposed one is the approach of Azure/SQL Server,
which is almost identical except for the CEP trying to address the recent
concerns about querying masked columns.



On Thu, 25 Aug 2022 at 22:10, Benedict <bened...@apache.org> wrote:

> I’m inclined to agree that this seems a more straightforward approach that
> makes fewer implied promises.
>
> Perhaps we could deliver simple views backed by virtual tables, and model
> our approach on that of Postgres, MySQL et al?
>
> Views in C* would be very simple, just offering a subset of fields with
> some UDFs applied. It would allow users to define roles with access only to
> the views, or for applications to use the views for presentation purposes.
>
> It feels like a cleaner approach to me, and we’d get two features for the
> price of one. BUT I don’t feel super strongly about this.
>
> On 25 Aug 2022, at 20:16, Derek Chen-Becker <de...@chen-becker.org> wrote:
>
> 
> To make sure I understand, if I wanted to use a masked column for a
> conditional update, you're saying we would need SELECT_MASKED to use it in
> the IF clause? I worry that this proposal is increasing in complexity; I
> would actually be OK starting with something smaller in scope. Perhaps just
> providing the masking functions and not tying masking to schema would be
> sufficient for an initial goal? That wouldn't preclude additional
> permissions, schema integration, or perhaps just plain Views in the future.
>
> Cheers,
>
> Derek
>
> On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña <adelap...@apache.org>
> wrote:
>
>> I have modified the proposal adding a new SELECT_MASKED permission. Using
>> masked columns on WHERE/IF clauses would require having SELECT and either
>> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the
>> query results would always require both SELECT and UNMASK.
>>
>> This way we can have the best of both worlds, allowing admins to decide
>> whether they trust their immediate users or not. wdyt?
>>
>> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo <henrik.i...@datastax.com>
>> wrote:
>>
>>> This is the difference between security and compliance I guess :-D
>>>
>>> The way I see this, the attacker or threat in this concept is not the
>>> developer with access to the database. Rather a feature like this is just a
>>> convenient way to apply some masking rule in a centralized way. The
>>> protection is against an end user of the application, who should not be
>>> able to see the personal data of someone else. Or themselves, even. As long
>>> as the application end user doesn't have access to run arbitrary CQL, then
>>> these frorms of masking prevent accidental unauthorized use/leaking of
>>> personal data.
>>>
>>> henrik
>>>
>>>
>>>
>>> On Wed, Aug 24, 2022 at 10:40 AM Benedict <bened...@apache.org> wrote:
>>>
>>>> Is it typical for a masking feature to make no effort to prevent
>>>> unmasking? I’m just struggling to see the value of this without such
>>>> mechanisms. Otherwise it’s just a default formatter, and we should consider
>>>> renaming the feature IMO
>>>>
>>>> On 23 Aug 2022, at 21:27, Andrés de la Peña <adelap...@apache.org>
>>>> wrote:
>>>>
>>>> 
>>>> As mentioned in the CEP document, dynamic data masking doesn't try to
>>>> prevent malicious users with SELECT permissions to indirectly guess the
>>>> real value of the masked value. This can easily be done by just trying
>>>> values on the WHERE clause of SELECT queries. DDM would not be a
>>>> replacement for proper column-level permissions.
>>>>
>>>> The data served by the database is usually consumed by applications
>>>> that present this data to end users. These end users are not necessarily
>>>> the users directly connecting to the database. With DDM, it would be easy
>>>> for applications to mask sensitive data that is going to be consumed by the
>>>> end users. However, the users directly connecting to the database should be
>>>> trusted, provided that they have the right SELECT permissions.
>>>>
>>>> In other words, DDM doesn't directly protect the data, but it eases the
>>>> production of protected data.
>>>>
>>>> Said that, we could later go one step ahead and add a way to prevent
>>>> untrusted users from inferring the masked data. That could be done adding a
>>>> new permission required to use certain columns on WHERE clauses, different
>>>> to the current SELECT permission. That would play especially well with
>>>> column-level permissions, which is something that we still have pending.
>>>>
>>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <aaronplo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Applying this should prevent querying on a field, else you could leak
>>>>>> its contents, surely?
>>>>>>
>>>>>
>>>>> In theory, yes.  Although I could see folks doing something like this:
>>>>>
>>>>> SELECT COUNT(*) FROM patients
>>>>> WHERE year_of_birth = 2002
>>>>> AND date_of_birth >= '2002-04-01'
>>>>> AND date_of_birth < '2002-11-01';
>>>>>
>>>>> In this case, the rows containing the masked key column(s) could be
>>>>> filtered on without revealing the actual data.  But again, that's probably
>>>>> better for a "phase 2" of the implementation.
>>>>>
>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>> secondary indexing, right?
>>>>>
>>>>>
>>>>> Yes, that's my thought as well.
>>>>>
>>>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker <
>>>>> de...@chen-becker.org> wrote:
>>>>>
>>>>>> Agreed on not being a queryable field. That would also preclude
>>>>>> secondary indexing, right?
>>>>>>
>>>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Applying this should prevent querying on a field, else you could
>>>>>>> leak its contents, surely? This pretty much prohibits using it in a
>>>>>>> clustering key, and a partition key with the ordered partitioner - but
>>>>>>> probably also a hashed partitioner since we do not use a cryptographic 
>>>>>>> hash
>>>>>>> and the hash function is well defined.
>>>>>>>
>>>>>>> We probably also need to ensure that any ALLOW FILTERING queries on
>>>>>>> such a field are disabled.
>>>>>>>
>>>>>>> Plausibly the data could be cryptographically jumbled before using
>>>>>>> it in a primary key component (or permitting filtering), but it is 
>>>>>>> probably
>>>>>>> easier and safer to exclude for now…
>>>>>>>
>>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 
>>>>>>> Some thoughts on this one:
>>>>>>>
>>>>>>> In a prior job, we'd give app teams access to a single keyspace, and
>>>>>>> two roles: a read-write role and a read-only role.  In some cases, a
>>>>>>> "privileged" application role was also requested.  Depending on the
>>>>>>> requirements, I could see the UNMASK permission being applied to the RW 
>>>>>>> or
>>>>>>> privileged roles.  But if there's a problem on the table and the 
>>>>>>> operators
>>>>>>> go in to investigate, they will likely use a SUPERUSER account, and 
>>>>>>> they'll
>>>>>>> see that data.
>>>>>>>
>>>>>>> How hard would it be for SUPERUSERs to *not* automatically get the
>>>>>>> UNMASK permission?
>>>>>>>
>>>>>>> I'll also echo the concerns around masking primary key components.
>>>>>>> It's highly likely that certain personal data properties would be used 
>>>>>>> as a
>>>>>>> partition or clustering key (ex: range query for people born within a
>>>>>>> certain timeframe).  In addition to the "breaks existing" concern, I'm
>>>>>>> curious about the challenges around getting that to work with the 
>>>>>>> current
>>>>>>> primary key implementation.
>>>>>>>
>>>>>>> Does this first implementation only apply to payload (non-key)
>>>>>>> columns?  The examples in the CEP currently do not show primary key
>>>>>>> components being masked.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Aaron
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo <
>>>>>>> henrik.i...@datastax.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña <
>>>>>>>> adelap...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> One thought: The way the CEP is currently written, it is only
>>>>>>>>>> possible to mask a column one way. You can only define one masking 
>>>>>>>>>> function
>>>>>>>>>> for a column, and since you use the original column name, you could 
>>>>>>>>>> only
>>>>>>>>>> return one version of it in the result set, even if you had a way to 
>>>>>>>>>> define
>>>>>>>>>> several functions.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, it's one single type of mapping per the column, declared on
>>>>>>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their 
>>>>>>>>> own
>>>>>>>>> masking function in SELECT statements if they have permissions for 
>>>>>>>>> seeing
>>>>>>>>> the clear data.
>>>>>>>>>
>>>>>>>>> For those cases where the data is automatically masked for an
>>>>>>>>> unprivileged user, I don't see the use of including different types of
>>>>>>>>> masking for the same column into the same result set. Instead, we 
>>>>>>>>> might be
>>>>>>>>> interested on having different types of masking associated to 
>>>>>>>>> different
>>>>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK statements,
>>>>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That 
>>>>>>>>> CREATE
>>>>>>>>> MASK statement would associate a masking function to a column and 
>>>>>>>>> role.
>>>>>>>>> However, I'm not sure we need that type of granularity instead of the
>>>>>>>>> simplicity of attaching the masking to the column declaration. wdyt?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> My gut feeling likewise is that this adds complexity but little
>>>>>>>> value.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Henrik Ingo
>>>>>>>>
>>>>>>>> +358 40 569 7354 <358405697354>
>>>>>>>>
>>>>>>>> [image: Visit us online.] <https://www.datastax.com/>  [image:
>>>>>>>> Visit us on Twitter.] <https://twitter.com/DataStaxEng>  [image:
>>>>>>>> Visit us on YouTube.]
>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>>>>>>   [image: Visit my LinkedIn profile.]
>>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu1wnvEAU$>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> +---------------------------------------------------------------+
>>>>>> | Derek Chen-Becker                                             |
>>>>>> | GPG Key available at https://keybase.io/dchenbecker
>>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu-uKf-oY$>
>>>>>> and       |
>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
>>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRuz_jdH0t$>
>>>>>> |
>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>> +---------------------------------------------------------------+
>>>>>>
>>>>>>
>>>
>>> --
>>>
>>> Henrik Ingo
>>>
>>> +358 40 569 7354 <358405697354>
>>>
>>> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
>>> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>> YouTube.]
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>>>   [image: Visit my LinkedIn profile.]
>>> <https://www.linkedin.com/in/heingo/>
>>>
>>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>

Reply via email to