amogh-jahagirdar commented on PR #6651:
URL: https://github.com/apache/iceberg/pull/6651#issuecomment-1427162229

   I raised some more tests to @namrathamyske branch 
https://github.com/namrathamyske/iceberg/pull/26 please take a look. A couple 
points I want to raise:
   
   1.) We won't be able to really write tests for the SparkPositionDeltaWrite 
until the SQL conf change is in because this PR just focuses on data frame 
write options. Currently (as far as I can tell) there aren't any row level 
operations that can be done through dataframes. So here we have 2 options, a.) 
either just leave the code in SparkPositionDeltaWrite as is, and address the 
tests in the SQL conf PR or b.) we just table the change for SparkPositionDelta 
for the SQL PR. My thinking is favoring option b, we shouldn't be merging any 
changes without test coverage there. Then this PR just focuses on coverage of 
all the dataframe write operations, which my PR above does address the 
remainder.
   
   2.) Behavior of createOrReplaceTable when the branch write option is passed 
in. Say for example a user does the following:
   
   ```
   data.writeTo("prod.db.table")
       .tableProperty("write.format.default", "orc")
       .partitionedBy($"level", days($"ts"))
      .writeOption(SparkWriteOption.BRANCH, "branch")
       .createOrReplace()
   ```
   What should the desired behavior be? Currently what happens is the write 
will be performed on branch and then if it's a replace the main table state 
will stay the same. So the `replace`  loses its meaning. Another possible 
desired behavior is the branch is written to and replace will actually perform 
a replace branch to replace the head of main with the head of the target 
branch. This is what makes sense to me but I think it'll be hard to achieve 
this based on catalog/planner abstractions.
   
   My thinking for this is we ideally prevent create/replace when there's a 
write option present just to avoid any confusion. Thoughts @namrathamyske 
@jackye1995 @aokolnychyi @rdblue ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to