lukasz-antoniak commented on code in PR #102:
URL:
https://github.com/apache/cassandra-analytics/pull/102#discussion_r1993301580
##########
cassandra-four-zero-types/build.gradle:
##########
@@ -33,6 +33,7 @@ dependencies {
compileOnly project(":cassandra-analytics-common")
compileOnly(project(path: ':cassandra-four-zero', configuration: 'shadow'))
compileOnly "com.esotericsoftware:kryo-shaded:${kryoVersion}"
+ compileOnly(group: "${sparkGroupId}", name:
"spark-core_${scalaMajorVersion}", version:
"${project.rootProject.sparkVersion}")
Review Comment:
FWIW, I cannot easily move logic of `Duration` to `SparkDuration`, because
Cassandra serializer expects `org.apache.cassandra.cql3.Duration`. To achieve
it, I would need to add Cassandra dependency to
`cassandra-analytics-spark-converter` module. Is that OK?
I am not sure if `toSparkSqlType()` shall return
`org.apache.cassandra.cql.Duration`, as this is C* type. I guess I might have
misunderstood your intention.
```
@Override
public Object toSparkSqlType(@NotNull Object value, boolean isFrozen)
{
CalendarInterval cl = (CalendarInterval) value;
return Duration.newInstance(cl.months, cl.days, cl.microseconds * 1000);
}
```
```
Caused by: java.lang.ClassCastException: class
org.apache.cassandra.cql3.Duration cannot be cast to class
org.apache.spark.unsafe.types.CalendarInterval
(org.apache.cassandra.cql3.Duration and
org.apache.spark.unsafe.types.CalendarInterval are in unnamed module of loader
'app')
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInterval(rows.scala:49)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInterval$(rows.scala:49)
at
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getInterval(rows.scala:195)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
```
I have tried to exclude Spark dependency from `cassandra-four-zero-types`
module in various ways. Best I could come up with, was to introduce a POJO
`CqlDuration` that is able to map internally to `CalendarInterval`. See commit:
https://github.com/apache/cassandra-analytics/pull/102/commits/df40c73b4e73728d8455609526314f01e845df13.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]