Re: [DISCUSS] CEP-20: Dynamic Data Masking

Andrés de la Peña Tue, 13 Sep 2022 07:06:13 -0700

That's 5 votes for A and 2 votes for B so far. None of these options
opposes to the CEP, so I think we can probably start the vote, unless we
want to wait longer for the poll.


On Mon, 12 Sept 2022 at 13:51, Benjamin Lerer <ble...@apache.org> wrote:

> A
>
> Le mer. 7 sept. 2022 à 17:02, Jeremiah D Jordan <jeremiah.jor...@gmail.com>
> a écrit :
>
>> A
>>
>> On Sep 7, 2022, at 8:58 AM, Benedict <bened...@apache.org> wrote:
>>
>> Well, I am not convinced these changes will materially impact the
>> outcome, but at least we’ll have some extra fun collating the votes.
>>
>>
>> On 7 Sep 2022, at 14:05, Andrés de la Peña <adelap...@apache.org> wrote:
>>
>> 
>> The poll makes sense to me. I would slightly change it to:
>>
>> A) We shouldn't prefer neither approach, and I agree to the implementor
>> selecting the table schema approach for this CEP
>> B) We should prefer the view approach, but I am not opposed to the
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should
>> implement the view approach
>> D) We should NOT implement the table view approach, and should implement
>> the schema approach
>> E) We should NOT implement the table schema approach, and should
>> implement some other scheme (or not implement this feature)
>>
>> Where my vote is for A.
>>
>>
>> On Wed, 7 Sept 2022 at 13:12, Benedict <bened...@apache.org> wrote:
>>
>>> I’m not convinced there’s been adequate resolution over which approach
>>> is adopted. I know you have expressed a preference for the table schema
>>> approach, but the weight of other opinion so far appears to be against this
>>> approach - even if it is broadly adopted by other databases. I will note
>>> that Postgres does not adopt this approach, it has a more sophisticated
>>> security label approach that has not been proposed by anybody so far.
>>>
>>> I think extra weight should be given to the implementer’s preference, so
>>> while I personally do not like the table schema approach, I am happy to
>>> accept this is an industry norm, and leave the decision to you.
>>>
>>> However, we should ensure the community as a whole endorses this. I
>>> think an indicative poll should be undertaken first, eg:
>>>
>>> A) We should implement the table schema approach, as proposed
>>> B) We should prefer the view approach, but I am not opposed to the
>>> implementor selecting the table schema approach for this CEP
>>> C) We should NOT implement the table schema approach, and should
>>> implement the view approach
>>> D) We should NOT implement the table schema approach, and should
>>> implement some other scheme (or not implement this feature)
>>>
>>> Where my vote is B
>>>
>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <adelap...@apache.org> wrote:
>>>
>>> 
>>> If nobody has more concerns regarding the CEP I will start the vote
>>> tomorrow.
>>>
>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <adelap...@apache.org>
>>> wrote:
>>>
>>>> Is there enough support here for VIEWS to be the implementation
>>>>> strategy for displaying masking functions?
>>>>
>>>>
>>>> I'm not sure that views should be "the" strategy for masking functions.
>>>> We have multiple approaches here:
>>>>
>>>> 1) CQL functions only. Users can decide to use the masking functions on
>>>> their own will. I think most dbs allow this pattern of usage, which is
>>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce
>>>> users seeing only masked data. Nevertheless, it's still useful for trusted
>>>> database users generating masked data that will be consumed by the end
>>>> users of the application.
>>>>
>>>> 2) Masking functions attached to specific columns. This way the same
>>>> queries will see different data (masked or not) depending on the
>>>> permissions of the user running the query. It has the advantage of not
>>>> requiring to change the queries that users with different permissions run.
>>>> The downside is that users would need to query the schema if they need to
>>>> know whether a column is masked, unless we change the names of the returned
>>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM
>>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support
>>>> applying the masking function to columns on the base table, and some of
>>>> them also allow to apply masking to views.
>>>>
>>>> 3) Masking functions as part of projected views. This ways users might
>>>> need to query the view appropriate for their permissions instead of the
>>>> base table. This might mean changing the queries if the masking policy is
>>>> changed by the admin. MySQL recommends this approach on a blog entry,
>>>> although it's not part of its main documentation for data masking, and the
>>>> implementation has security issues. Some of the other databases offering
>>>> the approach 2) as their main option also support masking on view columns.
>>>>
>>>> Each approach has its own advantages and limitations, and I don't think
>>>> we necessarily have to choose. The CEP proposes implementing 1) and 2), but
>>>> no one impedes us to also have 3) if we get to have projected views.
>>>> However, I think that projected views is a new general-purpose feature with
>>>> its own complexities, so it would deserve its own CEP, if someone is
>>>> willing to work on the implementation.
>>>>
>>>>
>>>>
>>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> Is there enough support here for VIEWS to be the implementation
>>>>> strategy for displaying masking functions?
>>>>>
>>>>> It seems to me the view would have to store the query and apply a
>>>>> where clause to it, so the same PK would be in play.
>>>>>
>>>>> It has data leaking properties.
>>>>>
>>>>> It has more use cases as it can be used to
>>>>>
>>>>>    - construct views that filter out sensitive columns
>>>>>    - apply transforms to convert units of measure
>>>>>
>>>>> Are there more thoughts along this line?
>>>>>
>>>>
>>

Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to