Re: [PR] CASSANALYTICS-27: Cassandra 5 bridges without BTI support [cassandra-analytics]

via GitHub Tue, 12 Aug 2025 23:53:48 -0700


yifan-c commented on code in PR #131:
URL: 
https://github.com/apache/cassandra-analytics/pull/131#discussion_r2271640977



##########
cassandra-analytics-common/src/main/java/org/apache/cassandra/spark/data/SSTable.java:
##########
@@ -95,11 +101,38 @@ public void verify() throws IncompleteSSTableException
             throw new IncompleteSSTableException(FileType.STATISTICS);
         }
         // Need Summary.db or Index.db to read first/last partition key
-        if (isMissing(FileType.SUMMARY) && isMissing(FileType.INDEX))
+        if (isBigFormat() && isMissing(FileType.SUMMARY) && 
isMissing(FileType.INDEX))
         {
             throw new IncompleteSSTableException(FileType.SUMMARY, 
FileType.INDEX);
         }
+        // For BTI format, we just need partitions index
+        if (isBtiFormat() && isMissing(FileType.PARTITIONS_INDEX))
+        {
+            throw new IncompleteSSTableException(FileType.PARTITIONS_INDEX);
+        }
     }
 
     public abstract String getDataFileName();
+
+    public boolean isBigFormat()
+    {
+        return getDataFileName().contains("-big-");
+    }
+
+    public boolean isBtiFormat()
+    {
+        return getDataFileName().contains("-bti-");
+    }
+
+    public String getFormat()
+    {
+        List<String> tokens = filenameSplitter.splitToList(getDataFileName());
+        return tokens.get(tokens.size() - 2);
+    }
+
+    public String getVersion()
+    {
+        List<String> tokens = filenameSplitter.splitToList(getDataFileName());
+        return tokens.get(tokens.size() - 4);
+    }

Review Comment:
   I am a bit worried about depending on the file name format, since it could 
change. Just a minor one. Not need to take any action.



##########
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/bulkwriter/RecordWriterTest.java:
##########
@@ -74,8 +78,12 @@ class RecordWriterTest
 {
     private static final int REPLICA_COUNT = 3;
     private static final int FILES_PER_SSTABLE = 8;
-    // writing 270 rows with sstable size cap of 1 MB should produce 2 sstable
-    private static final int UPLOADED_SSTABLES = 2;
+    // writing 270 rows with sstable size cap of 1 MB should produce 2 
sstables (Cassandra 4) or 3 sstables (Cassandra 5)
+    private static final Map<Integer, Integer> 
UPLOADED_SSTABLES_PER_CASSANDRA_VERSION = ImmutableMap.<Integer, 
Integer>builder()
+                                                                               
                      .put(40, 2)
+                                                                               
                      .put(41, 2)
+                                                                               
                      .put(50, 3)

Review Comment:
   I figure out why running with 50 outputs has 3 sstables. 
   The key difference is at this method 
`org.apache.cassandra.spark.bulkwriter.TableSchemaTestCommon.MockTableInfoProvider#getCompression`.
 
   
   50 is using lz4. And 40 is using zstd. They make significant difference 
since the generated testing data is highly compressible. 



##########
cassandra-analytics-core/src/main/java/org/apache/cassandra/spark/bulkwriter/BulkSparkConf.java:
##########
@@ -98,6 +98,7 @@ public class BulkSparkConf implements Serializable
     //       which will throw a configuration exception for each setting with 
that prefix it does not recognize
     public static final String SETTING_PREFIX = "spark.cassandra_analytics.";
 
+    public static final String CASSANDRA_VERSION                       = 
SETTING_PREFIX + "cassandra.version";

Review Comment:
   This option is not used for testing. I tried to set the value to either 
4.0.0 or 5.0.0 in the integration test. Both run passed. 
   
   Beside that, can you document what is the expected value and what is the 
expected effect for this option?



##########
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/TestUtils.java:
##########
@@ -278,6 +278,14 @@ public static Gen<CassandraVersion> versions()
         return arbitrary().pick(CassandraVersion.implementedVersions());
     }
 
+    public static Gen<CassandraVersion> versions(CassandraBridge bridge)
+    {
+        List<CassandraVersion> versions = 
Arrays.stream(CassandraVersion.implementedVersions())
+                                                .filter(v -> 
bridge.getVersion().equals(v))
+                                                .collect(Collectors.toList());
+        return arbitrary().pick(versions);
+    }

Review Comment:
   hmm.. what is the point to having the `Gen` here? It will only produce one 
and only one version. 



##########
cassandra-analytics-cdc-sidecar/build.gradle:
##########
@@ -43,20 +43,34 @@ configurations {
     fourzero {
         description = 'Cassandra 4.0 dependency'
     }
-    fourzerobridge {
+    fourzeroBridge {

Review Comment:
   OK. I think it reads better. +1



##########
cassandra-analytics-cdc/build.gradle:
##########
@@ -67,15 +86,22 @@ dependencies {
     implementation 
"com.fasterxml.jackson.core:jackson-annotations:${jacksonVersion}"
     implementation "com.fasterxml.jackson.core:jackson-core:${jacksonVersion}"
     implementation 
"com.fasterxml.jackson.core:jackson-databind:${jacksonVersion}"
+    implementation 
"com.fasterxml.jackson.datatype:jackson-datatype-jsr310:${jacksonVersion}"
 
     testImplementation project(":cassandra-analytics-common")
 
     // pull in cassandra-bridge so we can re-use TestSchema to generate 
arbitrary schemas for the cdc tests
     testImplementation project(":cassandra-bridge")
     testImplementation(testFixtures(project(':cassandra-bridge')))
-    testImplementation project(":cassandra-four-zero-bridge")
+
     testImplementation project(":cassandra-four-zero-types")
+    testImplementation project(":cassandra-four-zero-bridge")
     testImplementation project(path: ':cassandra-four-zero', configuration: 
'shadow')
+    // unit tests are performed only with Cassandra 4
+    // testImplementation project(":cassandra-five-zero-bridge")
+    // testImplementation project(":cassandra-five-zero-types")
+    // testImplementation project(path: ':cassandra-five-zero', configuration: 
'shadow')

Review Comment:
   It is supposed to be a TODO or something else? 



##########
cassandra-analytics-core/src/test/java/org/apache/cassandra/spark/TestUtils.java:
##########
@@ -290,6 +298,16 @@ public static List<CassandraVersion> testableVersions()
         return ImmutableList.copyOf(CassandraVersion.implementedVersions());
     }
 
+    public static List<CassandraVersion> testableVersions(CassandraBridge 
bridge)
+    {
+        return Arrays.stream(CassandraVersion.implementedVersions()).filter(v 
-> bridge.getVersion().equals(v)).collect(Collectors.toList());
+    }

Review Comment:
   Same. only one version will be produced. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANALYTICS-27: Cassandra 5 bridges without BTI support [cassandra-analytics]

Reply via email to