> > Perhaps we could deliver simple views backed by virtual tables, and model > our approach on that of Postgres, MySQL et al?
The approach of PostgresSQL <https://postgresql-anonymizer.readthedocs.io/en/latest/dynamic_masking/> allows attaching masking functions to columns and users with commands such as: SECURITY LABEL FOR anon ON COLUMN people.phone IS 'MASKED WITH FUNCTION anon.partial(phone,2,$$******$$,2)'; MySQL however does only provide the masking functions without the ability to attaching them to neither columns or users, as far as I know. The most similar to the proposed one is the approach of Azure/SQL Server, which is almost identical except for the CEP trying to address the recent concerns about querying masked columns. On Thu, 25 Aug 2022 at 22:10, Benedict <bened...@apache.org> wrote: > I’m inclined to agree that this seems a more straightforward approach that > makes fewer implied promises. > > Perhaps we could deliver simple views backed by virtual tables, and model > our approach on that of Postgres, MySQL et al? > > Views in C* would be very simple, just offering a subset of fields with > some UDFs applied. It would allow users to define roles with access only to > the views, or for applications to use the views for presentation purposes. > > It feels like a cleaner approach to me, and we’d get two features for the > price of one. BUT I don’t feel super strongly about this. > > On 25 Aug 2022, at 20:16, Derek Chen-Becker <de...@chen-becker.org> wrote: > > > To make sure I understand, if I wanted to use a masked column for a > conditional update, you're saying we would need SELECT_MASKED to use it in > the IF clause? I worry that this proposal is increasing in complexity; I > would actually be OK starting with something smaller in scope. Perhaps just > providing the masking functions and not tying masking to schema would be > sufficient for an initial goal? That wouldn't preclude additional > permissions, schema integration, or perhaps just plain Views in the future. > > Cheers, > > Derek > > On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña <adelap...@apache.org> > wrote: > >> I have modified the proposal adding a new SELECT_MASKED permission. Using >> masked columns on WHERE/IF clauses would require having SELECT and either >> UNMASK or SELECT_MASKED permissions. Seeing the unmasked values in the >> query results would always require both SELECT and UNMASK. >> >> This way we can have the best of both worlds, allowing admins to decide >> whether they trust their immediate users or not. wdyt? >> >> On Wed, 24 Aug 2022 at 16:06, Henrik Ingo <henrik.i...@datastax.com> >> wrote: >> >>> This is the difference between security and compliance I guess :-D >>> >>> The way I see this, the attacker or threat in this concept is not the >>> developer with access to the database. Rather a feature like this is just a >>> convenient way to apply some masking rule in a centralized way. The >>> protection is against an end user of the application, who should not be >>> able to see the personal data of someone else. Or themselves, even. As long >>> as the application end user doesn't have access to run arbitrary CQL, then >>> these frorms of masking prevent accidental unauthorized use/leaking of >>> personal data. >>> >>> henrik >>> >>> >>> >>> On Wed, Aug 24, 2022 at 10:40 AM Benedict <bened...@apache.org> wrote: >>> >>>> Is it typical for a masking feature to make no effort to prevent >>>> unmasking? I’m just struggling to see the value of this without such >>>> mechanisms. Otherwise it’s just a default formatter, and we should consider >>>> renaming the feature IMO >>>> >>>> On 23 Aug 2022, at 21:27, Andrés de la Peña <adelap...@apache.org> >>>> wrote: >>>> >>>> >>>> As mentioned in the CEP document, dynamic data masking doesn't try to >>>> prevent malicious users with SELECT permissions to indirectly guess the >>>> real value of the masked value. This can easily be done by just trying >>>> values on the WHERE clause of SELECT queries. DDM would not be a >>>> replacement for proper column-level permissions. >>>> >>>> The data served by the database is usually consumed by applications >>>> that present this data to end users. These end users are not necessarily >>>> the users directly connecting to the database. With DDM, it would be easy >>>> for applications to mask sensitive data that is going to be consumed by the >>>> end users. However, the users directly connecting to the database should be >>>> trusted, provided that they have the right SELECT permissions. >>>> >>>> In other words, DDM doesn't directly protect the data, but it eases the >>>> production of protected data. >>>> >>>> Said that, we could later go one step ahead and add a way to prevent >>>> untrusted users from inferring the masked data. That could be done adding a >>>> new permission required to use certain columns on WHERE clauses, different >>>> to the current SELECT permission. That would play especially well with >>>> column-level permissions, which is something that we still have pending. >>>> >>>> On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz <aaronplo...@gmail.com> >>>> wrote: >>>> >>>>> Applying this should prevent querying on a field, else you could leak >>>>>> its contents, surely? >>>>>> >>>>> >>>>> In theory, yes. Although I could see folks doing something like this: >>>>> >>>>> SELECT COUNT(*) FROM patients >>>>> WHERE year_of_birth = 2002 >>>>> AND date_of_birth >= '2002-04-01' >>>>> AND date_of_birth < '2002-11-01'; >>>>> >>>>> In this case, the rows containing the masked key column(s) could be >>>>> filtered on without revealing the actual data. But again, that's probably >>>>> better for a "phase 2" of the implementation. >>>>> >>>>> Agreed on not being a queryable field. That would also preclude >>>>>> secondary indexing, right? >>>>> >>>>> >>>>> Yes, that's my thought as well. >>>>> >>>>> On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker < >>>>> de...@chen-becker.org> wrote: >>>>> >>>>>> Agreed on not being a queryable field. That would also preclude >>>>>> secondary indexing, right? >>>>>> >>>>>> On Tue, Aug 23, 2022 at 11:20 AM Benedict <bened...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Applying this should prevent querying on a field, else you could >>>>>>> leak its contents, surely? This pretty much prohibits using it in a >>>>>>> clustering key, and a partition key with the ordered partitioner - but >>>>>>> probably also a hashed partitioner since we do not use a cryptographic >>>>>>> hash >>>>>>> and the hash function is well defined. >>>>>>> >>>>>>> We probably also need to ensure that any ALLOW FILTERING queries on >>>>>>> such a field are disabled. >>>>>>> >>>>>>> Plausibly the data could be cryptographically jumbled before using >>>>>>> it in a primary key component (or permitting filtering), but it is >>>>>>> probably >>>>>>> easier and safer to exclude for now… >>>>>>> >>>>>>> On 23 Aug 2022, at 18:13, Aaron Ploetz <aaronplo...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Some thoughts on this one: >>>>>>> >>>>>>> In a prior job, we'd give app teams access to a single keyspace, and >>>>>>> two roles: a read-write role and a read-only role. In some cases, a >>>>>>> "privileged" application role was also requested. Depending on the >>>>>>> requirements, I could see the UNMASK permission being applied to the RW >>>>>>> or >>>>>>> privileged roles. But if there's a problem on the table and the >>>>>>> operators >>>>>>> go in to investigate, they will likely use a SUPERUSER account, and >>>>>>> they'll >>>>>>> see that data. >>>>>>> >>>>>>> How hard would it be for SUPERUSERs to *not* automatically get the >>>>>>> UNMASK permission? >>>>>>> >>>>>>> I'll also echo the concerns around masking primary key components. >>>>>>> It's highly likely that certain personal data properties would be used >>>>>>> as a >>>>>>> partition or clustering key (ex: range query for people born within a >>>>>>> certain timeframe). In addition to the "breaks existing" concern, I'm >>>>>>> curious about the challenges around getting that to work with the >>>>>>> current >>>>>>> primary key implementation. >>>>>>> >>>>>>> Does this first implementation only apply to payload (non-key) >>>>>>> columns? The examples in the CEP currently do not show primary key >>>>>>> components being masked. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Aaron >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo < >>>>>>> henrik.i...@datastax.com> wrote: >>>>>>> >>>>>>>> On Tue, Aug 23, 2022 at 1:10 PM Andrés de la Peña < >>>>>>>> adelap...@apache.org> wrote: >>>>>>>> >>>>>>>>> One thought: The way the CEP is currently written, it is only >>>>>>>>>> possible to mask a column one way. You can only define one masking >>>>>>>>>> function >>>>>>>>>> for a column, and since you use the original column name, you could >>>>>>>>>> only >>>>>>>>>> return one version of it in the result set, even if you had a way to >>>>>>>>>> define >>>>>>>>>> several functions. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Right, it's one single type of mapping per the column, declared on >>>>>>>>> CREATE/ALTER TABLE statements. Also, users can manually specify their >>>>>>>>> own >>>>>>>>> masking function in SELECT statements if they have permissions for >>>>>>>>> seeing >>>>>>>>> the clear data. >>>>>>>>> >>>>>>>>> For those cases where the data is automatically masked for an >>>>>>>>> unprivileged user, I don't see the use of including different types of >>>>>>>>> masking for the same column into the same result set. Instead, we >>>>>>>>> might be >>>>>>>>> interested on having different types of masking associated to >>>>>>>>> different >>>>>>>>> roles. We could do so with dedicated CREATE/DROP/LIST MASK statements, >>>>>>>>> instead of using the CREATE/ALTER/DESCRIBE TABLE statements. That >>>>>>>>> CREATE >>>>>>>>> MASK statement would associate a masking function to a column and >>>>>>>>> role. >>>>>>>>> However, I'm not sure we need that type of granularity instead of the >>>>>>>>> simplicity of attaching the masking to the column declaration. wdyt? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> My gut feeling likewise is that this adds complexity but little >>>>>>>> value. >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Henrik Ingo >>>>>>>> >>>>>>>> +358 40 569 7354 <358405697354> >>>>>>>> >>>>>>>> [image: Visit us online.] <https://www.datastax.com/> [image: >>>>>>>> Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: >>>>>>>> Visit us on YouTube.] >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> >>>>>>>> [image: Visit my LinkedIn profile.] >>>>>>>> <https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu1wnvEAU$> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> +---------------------------------------------------------------+ >>>>>> | Derek Chen-Becker | >>>>>> | GPG Key available at https://keybase.io/dchenbecker >>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu-uKf-oY$> >>>>>> and | >>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org >>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRuz_jdH0t$> >>>>>> | >>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>>>>> +---------------------------------------------------------------+ >>>>>> >>>>>> >>> >>> -- >>> >>> Henrik Ingo >>> >>> +358 40 569 7354 <358405697354> >>> >>> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us >>> on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >>> YouTube.] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> >>> [image: Visit my LinkedIn profile.] >>> <https://www.linkedin.com/in/heingo/> >>> >> > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > >