amogh-jahagirdar commented on code in PR #6965:
URL: https://github.com/apache/iceberg/pull/6965#discussion_r1121979187
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java:
##########
@@ -1017,4 +1019,41 @@ public String unknown(
return String.format("%s(%s) %s %s", transform, sourceName, direction,
nullOrder);
}
}
+
+ public static Map<String, String> buildWriteOptions(String branch,
LogicalWriteInfo info) {
+ ImmutableMap.Builder<String, String> builder =
+ ImmutableMap.<String, String>builder().putAll(info.options());
+
+ if (branch != null) {
+ String optionBranch = info.options().get(SparkReadOptions.BRANCH);
+ if (optionBranch == null) {
+ builder.put(SparkWriteOptions.BRANCH, branch);
+ } else if (!optionBranch.equals(branch)) {
+ throw new ValidationException(
+ "Cannot override branch to write more than once, received %s in
identifier and %s in write options",
+ branch, optionBranch);
+ }
+ }
+
+ return builder.build();
+ }
+
+ public static CaseInsensitiveStringMap buildScanOptions(
+ String branch, CaseInsensitiveStringMap options) {
+ ImmutableMap.Builder<String, String> builder =
+ ImmutableMap.<String, String>builder().putAll(options);
+
+ if (branch != null) {
+ String optionBranch = options.get(SparkReadOptions.BRANCH);
+ if (optionBranch == null) {
+ builder.put(SparkReadOptions.BRANCH, branch);
+ } else if (!optionBranch.equals(branch)) {
+ throw new ValidationException(
+ "Cannot override branch to read more than once, received %s in
identifier and %s in scan options",
Review Comment:
Yeah so I think this is a point of discussion, my take was we deprecate the
SparkReadOptions just so from a user experience perspective there's not a bunch
of ways to query the head of a branch. Every operation (SQL reads and writes or
Dataframe) has the branch in the identifier. The only addition for SQL is the
time travel syntax.
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java:
##########
@@ -659,10 +659,7 @@ private Table load(Identifier ident) {
Matcher branch = BRANCH.matcher(ident.name());
if (branch.matches()) {
- Snapshot branchSnapshot = table.snapshot(branch.group(1));
- if (branchSnapshot != null) {
- return new SparkTable(table, branchSnapshot.snapshotId(),
!cacheEnabled);
- }
+ return new SparkTable(table, branch.group(1), !cacheEnabled);
Review Comment:
Do we want to keep the "branch_" prefix in the identifier? I just removed it
in the previous PR and everything after the table part in the identifier is
treated as a branch.
https://github.com/apache/iceberg/pull/6651/files#diff-6028f80634c027dee9ea947b9f0e50a51b1e5ced1834b902f0f47303514d0e96L660
I'm okay either way one benefit is that from a user experience point of view
it's explicit it's a branch and will less likely collide with column
identifier. It's just a bit more verbose.
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java:
##########
@@ -252,12 +266,12 @@ public WriteBuilder newWriteBuilder(LogicalWriteInfo
info) {
Preconditions.checkArgument(
snapshotId == null, "Cannot write to table at a specific snapshot:
%s", snapshotId);
- return new SparkWriteBuilder(sparkSession(), icebergTable, info);
+ return new SparkWriteBuilder(sparkSession(), icebergTable, branch, info);
}
@Override
public RowLevelOperationBuilder
newRowLevelOperationBuilder(RowLevelOperationInfo info) {
- return new SparkRowLevelOperationBuilder(sparkSession(), icebergTable,
info);
+ return new SparkRowLevelOperationBuilder(sparkSession(), icebergTable,
branch, info);
Review Comment:
https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L290
SparkTable#canDeleteUsingMetadata needs to be updated to use the branch as
well during the scan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]