[
https://issues.apache.org/jira/browse/CASSANDRA-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069417#comment-18069417
]
Dmitry Konstantinov commented on CASSANDRA-21152:
-------------------------------------------------
I’ve debugged the original issue to better understand what’s going on.
When a row is deleted from a table, the current MV change calculation logic
uses the timestamp of the existing MV row to determine the timestamp for the
delete, instead of using the original one. As a result, we end up writing a
TTL-ed row to the MV table with the same timestamp as the previous row state.
In this scenario, cursor compaction and standard compaction (as well as merge
logic during reads) behave differently. With standard compaction, the deleted
TTL-ed row supersedes the previous one, whereas with cursor compaction, it
behaves differently.
Tracing this discrepancy leads to the following logic in
org.apache.cassandra.db.compaction.CursorCompactor#mergeRows:
{code:java}
for (int i = 1; i < rowMergeLimit; i++)
{
row = sstableCursors[i].unfiltered();
if (row.livenessInfo().supersedes(mergedRowInfo))
mergedRowInfo = row.livenessInfo();
if (row.deletionTime().supersedes(mergedRowDeletion))
mergedRowDeletion = row.deletionTime();
}
{code}
The issue originates in the livenessInfo().supersedes(...) implementation:
{code:java}
default boolean supersedes(LivenessInfo other)
{
long tTimestamp = timestamp();
long oTimestamp = other.timestamp();
if (tTimestamp != oTimestamp)
return tTimestamp > oTimestamp;
if (isExpired() ^ other.isExpired())
return isExpired();
if (isExpiring() == other.isExpiring())
{
return localExpirationTime() > other.localExpirationTime() ||
(localExpirationTime() == other.localExpirationTime() && ttl() <
other.ttl());
}
return isExpiring();
}
{code}
When candidate rows have equal timestamps, the logic relies on isExpired() for
comparison.
The key difference is that, in standard compaction and read-time merging, we
deal with ExpiringLivenessInfo vs. ExpiredLivenessInfo, which correctly return
false and true, respectively. However, in cursor compaction,
ReusableLivenessInfo is used, and it does not implement isExpired(), so it
always returns false.
!non_cursor_row_merge.png|width=1000!
vs cursor compaction values:
{code}
UnfilteredDescriptor{rowLivenessInfo=ReusableLivenessInfo{timestamp=1774711566649000,
ttl=6000, localExpirationTime=1774717566}, deletionTime=LIVE, position=7,
flags=12, extFlags=0, unfilteredSize=5, prevUnfilteredSize=0,
unfilteredDataStart=14, rowColumns=[],
clusteringTypes=[org.apache.cassandra.db.marshal.Int32Type]}
UnfilteredDescriptor{rowLivenessInfo=ReusableLivenessInfo{timestamp=1774711566649000,
ttl=2147483647, localExpirationTime=1774711567}, deletionTime=LIVE,
position=7, flags=12, extFlags=0, unfilteredSize=5, prevUnfilteredSize=7,
unfilteredDataStart=14, rowColumns=[],
clusteringTypes=[org.apache.cassandra.db.marshal.Int32Type]}
{code}
As a result, the comparison behaves incorrectly.
The fix is to add a proper implementation of isExpired() to
ReusableLivenessInfo:
{code:java}
@Override
public boolean isExpired()
{
return ttl == EXPIRED_LIVENESS_TTL;
}
{code}
> Test failure: dtest.TestMaterializedViews.test_mv_with_default_ttl_with_flush
> ------------------------------------------------------------------------------
>
> Key: CASSANDRA-21152
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21152
> Project: Apache Cassandra
> Issue Type: Bug
> Reporter: Dmitry Konstantinov
> Assignee: Arvind Kandpal
> Priority: Normal
> Fix For: 6.x
>
> Attachments: breaking_point.png, non_cursor_row_merge.png
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It fails consistently now, example:
> [https://ci-cassandra.apache.org/job/Cassandra-trunk/2391/testReport/junit/dtest.materialized_views_test/TestMaterializedViews/Tests___dtest_jdk11_1_64___test_mv_with_default_ttl_with_flush/]
> {code:java}
> self = <materialized_views_test.TestMaterializedViews object at
> 0x7f808eb3b3d0>
> @since('3.0')
> def test_mv_with_default_ttl_with_flush(self):
> > self._test_mv_with_default_ttl(True)
> materialized_views_test.py:1333:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> materialized_views_test.py:1368: in _test_mv_with_default_ttl
> assert_none(session, "SELECT k,a,b FROM mv2")
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _
> session = <cassandra.cluster.Session object at 0x7f808dcc1d60>
> query = 'SELECT k,a,b FROM mv2', cl = None
> def assert_none(session, query, cl=None):
> """
> Assert query returns nothing
> @param session Session to use
> @param query Query to run
> @param cl Optional Consistency Level setting. Default ONE
>
> Examples:
> assert_none(self.session1, "SELECT * FROM test where key=2;")
> assert_none(cursor, "SELECT * FROM test WHERE k=2",
> cl=ConsistencyLevel.SERIAL)
> """
> simple_query = SimpleStatement(query, consistency_level=cl)
> res = session.execute(simple_query)
> list_res = _rows_to_list(res)
> > assert list_res == [], "Expected nothing from {}, but got
> > {}".format(query, list_res)
> E AssertionError: Expected nothing from SELECT k,a,b FROM mv2, but got
> [[1, 1, None]]
> tools/assertions.py:149: AssertionError
> {code}
> !breaking_point.png|width=700!
> it was broken between 2364 and 2367 Cassandra trunk runs.
> [https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/materialized_views_test/TestMaterializedViews/test_mv_with_default_ttl_with_flush]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]