amogh-jahagirdar commented on PR #6651: URL: https://github.com/apache/iceberg/pull/6651#issuecomment-1427162229
I raised some more tests to @namrathamyske branch https://github.com/namrathamyske/iceberg/pull/26 please take a look. A couple points I want to raise: 1.) We won't be able to really write tests for the SparkPositionDeltaWrite until the SQL conf change is in because this PR just focuses on data frame write options. Currently (as far as I can tell) there aren't any row level operations that can be done through dataframes. So here we have 2 options, a.) either just leave the code in SparkPositionDeltaWrite as is, and address the tests in the SQL conf PR or b.) we just table the change for SparkPositionDelta for the SQL PR. My thinking is favoring option b, we shouldn't be merging any changes without test coverage there. Then this PR just focuses on coverage of all the dataframe write operations, which my PR above does address the remainder. 2.) Behavior of createOrReplaceTable when the branch write option is passed in. Say for example a user does the following: ``` data.writeTo("prod.db.table") .tableProperty("write.format.default", "orc") .partitionedBy($"level", days($"ts")) .writeOption(SparkWriteOption.BRANCH, "branch") .createOrReplace() ``` What should the desired behavior be? Currently what happens is the write will be performed on branch and then if it's a replace the main table state will stay the same. So the `replace` loses its meaning. Another possible desired behavior is the branch is written to and replace will actually perform a replace branch to replace the head of main with the head of the target branch. This is what makes sense to me but I think it'll be hard to achieve this based on catalog/planner abstractions. My thinking for this is we ideally prevent create/replace when there's a write option present just to avoid any confusion. Thoughts @namrathamyske @jackye1995 @aokolnychyi @rdblue ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org