My vote is B

On 07/09/2022 13:12, Benedict wrote:
I’m not convinced there’s been adequate resolution over which approach is adopted. I know you have expressed a preference for the table schema approach, but the weight of other opinion so far appears to be against this approach - even if it is broadly adopted by other databases. I will note that Postgres does not adopt this approach, it has a more sophisticated security label approach that has not been proposed by anybody so far.

I think extra weight should be given to the implementer’s preference, so while I personally do not like the table schema approach, I am happy to accept this is an industry norm, and leave the decision to you.

However, we should ensure the community as a whole endorses this. I think an indicative poll should be undertaken first, eg:

A) We should implement the table schema approach, as proposed
B) We should prefer the view approach, but I am not opposed to the implementor selecting the table schema approach for this CEP C) We should NOT implement the table schema approach, and should implement the view approach D) We should NOT implement the table schema approach, and should implement some other scheme (or not implement this feature)

Where my vote is B

On 7 Sep 2022, at 12:50, Andrés de la Peña <adelap...@apache.org> wrote:


If nobody has more concerns regarding the CEP I will start the vote tomorrow.

On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <adelap...@apache.org> wrote:

        Is there enough support here for VIEWS to be the
        implementation strategy for displaying masking functions?


    I'm not sure that views should be "the" strategy for masking
    functions. We have multiple approaches here:

    1) CQL functions only. Users can decide to use the masking
    functions on their own will. I think most dbs allow this pattern
    of usage, which is quite straightforward. Obviously, it doesn't
    allow admins to decide enforce users seeing only masked data.
    Nevertheless, it's still useful for trusted database users
    generating masked data that will be consumed by the end users of
    the application.

    2) Masking functions attached to specific columns. This way the
    same queries will see different data (masked or not) depending on
    the permissions of the user running the query. It has the
    advantage of not requiring to change the queries that users with
    different permissions run. The downside is that users would need
    to query the schema if they need to know whether a column is
    masked, unless we change the names of the returned columns. This
    is the approach offered by Azure/SQL Server, PostgreSQL, IBM Db2,
    Oracle, MariaDB/MaxScale and SnowFlake. All these databases
    support applying the masking function to columns on the base
    table, and some of them also allow to apply masking to views.

    3) Masking functions as part of projected views. This ways users
    might need to query the view appropriate for their permissions
    instead of the base table. This might mean changing the queries
    if the masking policy is changed by the admin. MySQL recommends
    this approach on a blog entry, although it's not part of its main
    documentation for data masking, and the implementation has
    security issues. Some of the other databases offering the
    approach 2) as their main option also support masking on view
    columns.

    Each approach has its own advantages and limitations, and I don't
    think we necessarily have to choose. The CEP proposes
    implementing 1) and 2), but no one impedes us to also have 3) if
    we get to have projected views. However, I think that projected
    views is a new general-purpose feature with its own complexities,
    so it would deserve its own CEP, if someone is willing to work on
    the implementation.



    On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev
    <dev@cassandra.apache.org> wrote:

        Is there enough support here for VIEWS to be the
        implementation strategy for displaying masking functions?

        It seems to me the view would have to store the query and
        apply a where clause to it, so the same PK would be in play.

        It has data leaking properties.

        It has more use cases as it can be used to

          * construct views that filter out sensitive columns
          * apply transforms to convert units of measure

        Are there more thoughts along this line?

Reply via email to