Is this an MV bug?
# Table definitions Table [ Primary key ] other data base [ A B C ] D E MV[ D C ] A B E # Initial data base -> MV [ a b c ] d e -> [d c] a b e [ a' b c ] d e -> [d c] a' b e ## Mutations -> expected outcome M1: base [ a b c ] d e' -> MV [ d c ] a b e' M2: base [ a b c ] d' e -> MV [ d' c ] a b e ## processing bug Assume lock can not be obtained during processing of M1. The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) Assume M2 obtains the lock and executes. MV is now [ d' c ] a b e M1 then obtains the lock and executes MV is now [ d c ] a b e' [ d' c] a b e base is [ a b c ] d e' MV entry "[ d' c ] a b e" is orphaned
Re: Is this an MV bug?
If M1 and M2 both operate over the same partition key they won’t be separate mutations, they should be combined into a single mutation before submission to SP.mutate > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev > wrote: > > > > # Table definitions > > Table [ Primary key ] other data > base [ A B C ] D E > MV[ D C ] A B E > > > # Initial data > base -> MV > [ a b c ] d e -> [d c] a b e > [ a' b c ] d e -> [d c] a' b e > > > ## Mutations -> expected outcome > > M1: base [ a b c ] d e' -> MV [ d c ] a b e' > M2: base [ a b c ] d' e -> MV [ d' c ] a b e > > ## processing bug > Assume lock can not be obtained during processing of M1. > > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) > > Assume M2 obtains the lock and executes. > > MV is now > [ d' c ] a b e > > M1 then obtains the lock and executes > > MV is now > [ d c ] a b e' > [ d' c] a b e > > base is > [ a b c ] d e' > > MV entry "[ d' c ] a b e" is orphaned > >
Re: Is this an MV bug?
If each mutation comes from a separate CQL they would be separate, no? On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: > If M1 and M2 both operate over the same partition key they won’t be > separate mutations, they should be combined into a single mutation before > submission to SP.mutate > > > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev < > dev@cassandra.apache.org> wrote: > > > > > > > > # Table definitions > > > > Table [ Primary key ] other data > > base [ A B C ] D E > > MV[ D C ] A B E > > > > > > # Initial data > > base -> MV > > [ a b c ] d e -> [d c] a b e > > [ a' b c ] d e -> [d c] a' b e > > > > > > ## Mutations -> expected outcome > > > > M1: base [ a b c ] d e' -> MV [ d c ] a b e' > > M2: base [ a b c ] d' e -> MV [ d' c ] a b e > > > > ## processing bug > > Assume lock can not be obtained during processing of M1. > > > > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) > > > > Assume M2 obtains the lock and executes. > > > > MV is now > > [ d' c ] a b e > > > > M1 then obtains the lock and executes > > > > MV is now > > [ d c ] a b e' > > [ d' c] a b e > > > > base is > > [ a b c ] d e' > > > > MV entry "[ d' c ] a b e" is orphaned > > > > > >
Re: Is this an MV bug?
Perhaps my diagram was not clear. I am starting with mutations on the base table. I assume they are not bundled together so from separate CQL statements. On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr wrote: > If each mutation comes from a separate CQL they would be separate, no? > > > On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: > >> If M1 and M2 both operate over the same partition key they won’t be >> separate mutations, they should be combined into a single mutation before >> submission to SP.mutate >> >> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev < >> dev@cassandra.apache.org> wrote: >> > >> > >> > >> > # Table definitions >> > >> > Table [ Primary key ] other data >> > base [ A B C ] D E >> > MV[ D C ] A B E >> > >> > >> > # Initial data >> > base -> MV >> > [ a b c ] d e -> [d c] a b e >> > [ a' b c ] d e -> [d c] a' b e >> > >> > >> > ## Mutations -> expected outcome >> > >> > M1: base [ a b c ] d e' -> MV [ d c ] a b e' >> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e >> > >> > ## processing bug >> > Assume lock can not be obtained during processing of M1. >> > >> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) >> > >> > Assume M2 obtains the lock and executes. >> > >> > MV is now >> > [ d' c ] a b e >> > >> > M1 then obtains the lock and executes >> > >> > MV is now >> > [ d c ] a b e' >> > [ d' c] a b e >> > >> > base is >> > [ a b c ] d e' >> > >> > MV entry "[ d' c ] a b e" is orphaned >> > >> > >> >>
[DISCUSS] CEP-20: Dynamic Data Masking
Hi everyone, I'd like to start a discussion about this proposal for dynamic data masking: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking Dynamic data masking allows to obscure sensitive information without changing the stored data. It would be based on a set of native CQL functions providing different types of masking, such as replacing the column value by "". These functions could be used as regular functions or attached to table columns with CREATE/ALTER table. There would be a new UNMASK permission, so only the users with this permissions would be able to see the unmasked column values. It would be possible to customize masking by using UDFs as masking functions. Thanks,
Re: Is this an MV bug?
You mean entirely distinct CQL statements issued by the same client “concurrently”? If they’re submitted to the same coordinator then M2 will have a higher timestamp than M1, so if M2 applies first then M1 will be a no-op and should not generate any view update. If submitted to different coordinators with server-issued timestamps then unless timestamps clash, one of them will win, but it may not be M2. > On 19 Aug 2022, at 11:14, Claude Warren, Jr via dev > wrote: > > Perhaps my diagram was not clear. I am starting with mutations on the base > table. I assume they are not bundled together so from separate CQL > statements. > > On Fri, Aug 19, 2022 at 11:11 AM Claude Warren, Jr > wrote: >> If each mutation comes from a separate CQL they would be separate, no? >> >> >> On Fri, Aug 19, 2022 at 10:17 AM Benedict wrote: >>> If M1 and M2 both operate over the same partition key they won’t be >>> separate mutations, they should be combined into a single mutation before >>> submission to SP.mutate >>> >>> > On 19 Aug 2022, at 10:05, Claude Warren, Jr via dev >>> > wrote: >>> > >>> > >>> > >>> > # Table definitions >>> > >>> > Table [ Primary key ] other data >>> > base [ A B C ] D E >>> > MV[ D C ] A B E >>> > >>> > >>> > # Initial data >>> > base -> MV >>> > [ a b c ] d e -> [d c] a b e >>> > [ a' b c ] d e -> [d c] a' b e >>> > >>> > >>> > ## Mutations -> expected outcome >>> > >>> > M1: base [ a b c ] d e' -> MV [ d c ] a b e' >>> > M2: base [ a b c ] d' e -> MV [ d' c ] a b e >>> > >>> > ## processing bug >>> > Assume lock can not be obtained during processing of M1. >>> > >>> > The mutation M1 sleeps to wait for lock. (Trunk Keyspace.java : 601 ) >>> > >>> > Assume M2 obtains the lock and executes. >>> > >>> > MV is now >>> > [ d' c ] a b e >>> > >>> > M1 then obtains the lock and executes >>> > >>> > MV is now >>> > [ d c ] a b e' >>> > [ d' c] a b e >>> > >>> > base is >>> > [ a b c ] d e' >>> > >>> > MV entry "[ d' c ] a b e" is orphaned >>> > >>> >
Re: [DISCUSS] CEP-20: Dynamic Data Masking
sounds interesting. I would like to understand a couple things here. If the column names are the same for masked and unmasked data, it would impact existing applications. I am curious what the transition plan look like for applications that expect unmasked data? For example, let’s say you store SSNs and Birth dates. Upon enabling this feature, let’s say the app user is not given the UNMASK permission. Now the app is receiving masked values for these columns. This is fine for most read only applications. However, a lot of times these columns may be used as primary keys or part of primary keys in other tables. This would break existing applications. How would this work in mixed mode when ew nodes in the cluster are masking data and others aren’t? How would it impact the driver? How would the application learn that the column values are masked? This is important in case a user has UNMASK permission and then later taken away. Again this would break a lot of applications. Dinesh > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without changing > the stored data. It would be based on a set of native CQL functions providing > different types of masking, such as replacing the column value by "". > These functions could be used as regular functions or attached to table > columns with CREATE/ALTER table. There would be a new UNMASK permission, so > only the users with this permissions would be able to see the unmasked column > values. It would be possible to customize masking by using UDFs as masking > functions. > > Thanks,
Re: [DISCUSS] CEP-20: Dynamic Data Masking
This type of feature is very useful, but it may be easier to analyze this proposal if it’s compared with other DDM implementations from other databases? Would it be reasonable to add a table to the proposal comparing syntax and output from eg Azure SQL vs Cassandra vs whatever ? > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without changing > the stored data. It would be based on a set of native CQL functions providing > different types of masking, such as replacing the column value by "". > These functions could be used as regular functions or attached to table > columns with CREATE/ALTER table. There would be a new UNMASK permission, so > only the users with this permissions would be able to see the unmasked column > values. It would be possible to customize masking by using UDFs as masking > functions. > > Thanks,
Re: [DISCUSS] CEP-20: Dynamic Data Masking
> > > This type of feature is very useful, but it may be easier to analyze > this proposal if it’s compared with other DDM implementations from other > databases? Would it be reasonable to add a table to the proposal comparing > syntax and output from eg Azure SQL vs Cassandra vs whatever ? Good idea. I have added a section at the end of the document briefly describing how some other databases deal with data masking, and with links to their documentation for the topic. I am not an expert in none of those databases, so please take my comments there with a grain of salt. On Fri, 19 Aug 2022 at 17:30, Jeff Jirsa wrote: > This type of feature is very useful, but it may be easier to analyze this > proposal if it’s compared with other DDM implementations from other > databases? Would it be reasonable to add a table to the proposal comparing > syntax and output from eg Azure SQL vs Cassandra vs whatever ? > > > On Aug 19, 2022, at 4:50 AM, Andrés de la Peña > wrote: > > > Hi everyone, > > I'd like to start a discussion about this proposal for dynamic data > masking: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking > > Dynamic data masking allows to obscure sensitive information without > changing the stored data. It would be based on a set of native CQL > functions providing different types of masking, such as replacing the > column value by "". These functions could be used as regular functions > or attached to table columns with CREATE/ALTER table. There would be a new > UNMASK permission, so only the users with this permissions would be able to > see the unmasked column values. It would be possible to customize masking > by using UDFs as masking functions. > > Thanks, > >