Agree with views, or alternatively, column permissions together with computed columns:

CREATE TABLE foo (

  id int PRIMARY KEY,

  unmasked_name text,

  name text GENERATED ALWAYS AS some_mask_function(text, 'xxx', 7)

)


(syntax from postgresql)


GRANT SELECT ON foo.name TO general_use;

GRANT SELECT ON foo.unmasked_name TO top_secret;


On 26/08/2022 00.10, Benedict wrote:
I’m inclined to agree that this seems a more straightforward approach that makes fewer implied promises.

Perhaps we could deliver simple views backed by virtual tables, and model our approach on that of Postgres, MySQL et al?

Views in C* would be very simple, just offering a subset of fields with some UDFs applied. It would allow users to define roles with access only to the views, or for applications to use the views for presentation purposes.

It feels like a cleaner approach to me, and we’d get two features for the price of one. BUT I don’t feel super strongly about this.

On 25 Aug 2022, at 20:16, Derek Chen-Becker <de...@chen-becker.org> wrote:


To make sure I understand, if I wanted to use a masked column for a conditional update, you're saying we would need SELECT_MASKED to use it in the IF clause? I worry that this proposal is increasing in complexity; I would actually be OK starting with something smaller in scope. Perhaps just providing the masking functions and not tying masking to schema would be sufficient for an initial goal? That wouldn't preclude additional permissions, schema integration, or perhaps just plain Views in the future.

Cheers,

Derek

On Thu, Aug 25, 2022 at 11:12 AM Andrés de la Peña <adelap...@apache.org> wrote:

    I have modified the proposal adding a new SELECT_MASKED
    permission. Using masked columns on WHERE/IF clauses would
    require having SELECT and either UNMASK or SELECT_MASKED
    permissions. Seeing the unmasked values in the query results
    would always require both SELECT and UNMASK.

    This way we can have the best of both worlds, allowing admins to
    decide whether they trust their immediate users or not. wdyt?

    On Wed, 24 Aug 2022 at 16:06, Henrik Ingo
    <henrik.i...@datastax.com> wrote:

        This is the difference between security and compliance I
        guess :-D

        The way I see this, the attacker or threat in this concept is
        not the developer with access to the database. Rather a
        feature like this is just a convenient way to apply some
        masking rule in a centralized way. The protection is against
        an end user of the application, who should not be able to see
        the personal data of someone else. Or themselves, even. As
        long as the application end user doesn't have access to run
        arbitrary CQL, then these frorms of masking prevent
        accidental unauthorized use/leaking of personal data.

        henrik



        On Wed, Aug 24, 2022 at 10:40 AM Benedict
        <bened...@apache.org> wrote:

            Is it typical for a masking feature to make no effort to
            prevent unmasking? I’m just struggling to see the value
            of this without such mechanisms. Otherwise it’s just a
            default formatter, and we should consider renaming the
            feature IMO

            On 23 Aug 2022, at 21:27, Andrés de la Peña
            <adelap...@apache.org> wrote:

            
            As mentioned in the CEP document, dynamic data masking
            doesn't try to prevent malicious users with SELECT
            permissions to indirectly guess the real value of the
            masked value. This can easily be done by just trying
            values on the WHERE clause of SELECT queries. DDM would
            not be a replacement for proper column-level permissions.

            The data served by the database is usually consumed by
            applications that present this data to end users. These
            end users are not necessarily the users directly
            connecting to the database. With DDM, it would be easy
            for applications to mask sensitive data that is going to
            be consumed by the end users. However, the users
            directly connecting to the database should be trusted,
            provided that they have the right SELECT permissions.

            In other words, DDM doesn't directly protect the data,
            but it eases the production of protected data.

            Said that, we could later go one step ahead and add a
            way to prevent untrusted users from inferring the masked
            data. That could be done adding a new permission
            required to use certain columns on WHERE clauses,
            different to the current SELECT permission. That would
            play especially well with column-level permissions,
            which is something that we still have pending.

            On Tue, 23 Aug 2022 at 19:13, Aaron Ploetz
            <aaronplo...@gmail.com> wrote:

                    Applying this should prevent querying on a
                    field, else you could leak its contents, surely?


                In theory, yes.  Although I could see folks doing
                something like this:

                SELECT COUNT(*) FROM patients
                WHERE year_of_birth = 2002
                AND date_of_birth >= '2002-04-01'
                AND date_of_birth < '2002-11-01';

                In this case, the rows containing the masked key
                column(s) could be filtered on without revealing the
                actual data.  But again, that's probably better for
                a "phase 2" of the implementation.

                    Agreed on not being a queryable field. That
                    would also preclude secondary indexing, right?


                Yes, that's my thought as well.

                On Tue, Aug 23, 2022 at 12:42 PM Derek Chen-Becker
                <de...@chen-becker.org> wrote:

                    Agreed on not being a queryable field. That
                    would also preclude secondary indexing, right?

                    On Tue, Aug 23, 2022 at 11:20 AM Benedict
                    <bened...@apache.org> wrote:

                        Applying this should prevent querying on a
                        field, else you could leak its contents,
                        surely? This pretty much prohibits using it
                        in a clustering key, and a partition key
                        with the ordered partitioner - but probably
                        also a hashed partitioner since we do not
                        use a cryptographic hash and the hash
                        function is well defined.

                        We probably also need to ensure that any
                        ALLOW FILTERING queries on such a field are
                        disabled.

                        Plausibly the data could be
                        cryptographically jumbled before using it in
                        a primary key component (or permitting
                        filtering), but it is probably easier and
                        safer to exclude for now…

                        On 23 Aug 2022, at 18:13, Aaron Ploetz
                        <aaronplo...@gmail.com> wrote:

                        
                        Some thoughts on this one:

                        In a prior job, we'd give app teams access
                        to a single keyspace, and two roles: a
                        read-write role and a read-only role.  In
                        some cases, a "privileged" application role
                        was also requested. Depending on the
                        requirements, I could see the UNMASK
                        permission being applied to the RW or
                        privileged roles.  But if there's a problem
                        on the table and the operators go in to
                        investigate, they will likely use a
                        SUPERUSER account, and they'll see that data.

                        How hard would it be for SUPERUSERs to
                        *not* automatically get the UNMASK permission?

                        I'll also echo the concerns around masking
                        primary key components.  It's highly likely
                        that certain personal data properties would
                        be used as a partition or clustering key
                        (ex: range query for people born within a
                        certain timeframe).  In addition to the
                        "breaks existing" concern, I'm curious
                        about the challenges around getting that to
                        work with the current primary key
                        implementation.

                        Does this first implementation only apply
                        to payload (non-key) columns? The examples
                        in the CEP currently do not show primary
                        key components being masked.

                        Thanks,

                        Aaron


                        On Tue, Aug 23, 2022 at 6:44 AM Henrik Ingo
                        <henrik.i...@datastax.com> wrote:

                            On Tue, Aug 23, 2022 at 1:10 PM Andrés
                            de la Peña <adelap...@apache.org> wrote:

                                    One thought: The way the CEP is
                                    currently written, it is only
                                    possible to mask a column one
                                    way. You can only define one
                                    masking function for a column,
                                    and since you use the original
                                    column name, you could only
                                    return one version of it in the
                                    result set, even if you had a
                                    way to define several functions.


                                Right, it's one single type of
                                mapping per the column, declared on
                                CREATE/ALTER TABLE statements.
                                Also, users can manually specify
                                their own masking function in
                                SELECT statements if they have
                                permissions for seeing the clear data.

                                For those cases where the data is
                                automatically masked for an
                                unprivileged user, I don't see the
                                use of including different types of
                                masking for the same column into
                                the same result set. Instead, we
                                might be interested on having
                                different types of masking
                                associated to different roles. We
                                could do so with dedicated
                                CREATE/DROP/LIST MASK statements,
                                instead of using the
                                CREATE/ALTER/DESCRIBE TABLE
                                statements. That CREATE MASK
                                statement would associate a masking
                                function to a column and role.
                                However, I'm not sure we need that
                                type of granularity instead of the
                                simplicity of attaching the masking
                                to the column declaration. wdyt?



                            My gut feeling likewise is that this
                            adds complexity but little value.




--
                            Henrik Ingo

                            +358 40 569 7354 <tel:358405697354>

                            Visit us online.
                            <https://www.datastax.com/>Visit us on
                            Twitter.
                            <https://twitter.com/DataStaxEng>Visit
                            us on YouTube.
                            
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>Visit
                            my LinkedIn profile.
                            
<https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu1wnvEAU$>



-- +---------------------------------------------------------------+
                    | Derek Chen-Becker        |
                    | GPG Key available at
                    https://keybase.io/dchenbecker
                    
<https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRu-uKf-oY$>and
 
                         |
                    |
                    
https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org
                    
<https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!YKhUm1ce3A3Djw9kupwqUWknncAxAeKovQ9vuMMPTMAubth1Zjbs8W62LQMY3KorY7W3H7Fhb1GRuz_jdH0t$>
                    |
                    | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5
                    AFEE 96E4 6ACC  |
                    
+---------------------------------------------------------------+



--
        Henrik Ingo

        +358 40 569 7354 <tel:358405697354>

        Visit us online. <https://www.datastax.com/>Visit us on
        Twitter. <https://twitter.com/DataStaxEng>Visit us on
        YouTube.
        
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>Visit
        my LinkedIn profile. <https://www.linkedin.com/in/heingo/>



--
+---------------------------------------------------------------+
| Derek Chen-Becker      |
| GPG Key available at https://keybase.io/dchenbeckerand  |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+

Reply via email to