rdblue commented on code in PR #6965:
URL: https://github.com/apache/iceberg/pull/6965#discussion_r1125772423
##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java:
##########
@@ -214,11 +220,14 @@ public void
testMergeIntoEmptyTargetInsertAllNonMatchingRows() {
row(3, "emp-id-3") // new
);
assertEquals(
- "Should have expected rows", expectedRows, sql("SELECT * FROM %s ORDER
BY id", tableName));
+ "Should have expected rows",
+ expectedRows,
+ sql("SELECT * FROM %s ORDER BY id", selectTarget()));
}
@Test
public void testMergeIntoEmptyTargetInsertOnlyMatchingRows() {
+ Assume.assumeTrue("Branch can only be used for non-empty table", branch ==
null);
Review Comment:
Looking into these test assumptions made me think more about what should be
valid.
Currently, all of the test cases use a branch for all reads and writes, so
there isn't anything testing implicit branching from the current table state.
That is when writing, we will automatically create a branch from the current
table state if it does not already exist. That was intended for convenience, so
you can set an environment branch and auto-create on write.
For commands like MERGE that use the current table state, I'm wondering if
that makes any sense because the command needs to create a read for the
branch's state, which doesn't yet exist.
I'm leaning toward not allowing that for MERGE, UPDATE, or DELETE because it
is so strange. Does that mean we shouldn't allow implicit branch creation for
INSERT either? On one hand, that's what these tests rely on to write from the
start into a branch. But on the other, there could be an expectation that
writing to a branch that doesn't exist actually creates it from an empty state.
So...
1. I think we need tests for all of the operations that validate what
happens when (a) the table is empty and an operation uses a branch that doesn't
exist and (b) when the table is not empty and an operation uses a branch that
doesn't exist
2. I think that MERGE, UPDATE, and DELETE should fail right away with a
"Branch does not exist: %s" error (currently this is an NPE in job planning)
3. I think INSERT behavior should be clearly defined for branches, whether
it is based on the current table state or an empty table state.
@aokolnychyi, @jackye1995, @danielcweeks, @amogh-jahagirdar, @namrathamyske,
what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]