amogh-jahagirdar commented on code in PR #7593:
URL: https://github.com/apache/iceberg/pull/7593#discussion_r1192863722
##########
spark/v3.4/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateOrReplaceBranchExec.scala:
##########
@@ -41,6 +41,9 @@ case class CreateOrReplaceBranchExec(
override protected def run(): Seq[InternalRow] = {
catalog.loadTable(ident) match {
case iceberg: SparkTable =>
+ if (iceberg.table.currentSnapshot() == null) {
+ throw new UnsupportedOperationException(s"The Iceberg table:
$iceberg" + " has no snapshots")
+ }
Review Comment:
We definitely should throw a better exception here thanks for identifying
this issue! Few points:
1.) I don't think the fix is correct since it is a valid case that there is
no `main` branch and thus table.currentSnapshot() is null, and the user
specifies an explicit snapshot to create from some other non-main branch. We
don't want to throw unnecessarily
So the commit graph looks something like
```
(empty main table state)
S4 (Branch "b")
/
S1 -> S2 -> S3 (Branch "a")
```
Creating branch b should still succeed but with this implementation we'll
fail unnecessarily.
So in the core library invalid snapshots are prevented here
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1181
but this case is a bit different just because in the spark procedure we
default to choosing the latest main snapshot if a snapshot is not specified. I
think one simple way forward is assign snapshotId to a `Long` which can be null
in case table.currentSnapshot() is null. Then before the API call to create the
branch, throw our own exception in case the current snapshot is null.
2.) `UnsupportedOperationException` is not the right exception imo. I think
we should be throwing an `IllegalArgumentException` to indicate that there is
no latest main snapshot, and that the user should specify an explicit snapshot.
3.) Could we just keep the PR focused to 3.4 and also add unit tests to
cover this case? We can later backport and address if any other DDLs are
impacted
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]