1raghavmahajan commented on code in PR #14291:
URL: https://github.com/apache/iceberg/pull/14291#discussion_r2421192085
##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/CreateChangelogViewProcedure.java:
##########
@@ -314,6 +310,18 @@ private static Column[] sortSpec(Dataset<Row> df, Column[]
repartitionSpec, bool
return sortSpec;
}
+ private static Column[] getColumns(Dataset<Row> df, String[] initialColumns,
String... addCols) {
Review Comment:
This procedure is has a few places
([#1](https://github.com/apache/iceberg/blob/7f411135562e682916f8f411bfc7455e3f79956d/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/CreateChangelogViewProcedure.java#L189),
[#2](https://github.com/apache/iceberg/blob/7f411135562e682916f8f411bfc7455e3f79956d/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/procedures/CreateChangelogViewProcedure.java#L304))
where it needs to build the column references for a list of identifier column
strings and combine them with the metadata columns (change ordinal / change
type).
This leads to a verbose piece of code where just to build a partition spec.
I wasn't able to generalize the second instance, but I do think it cleans
things up a bit on the callee side.
I agree with you that it can lead to confusion since it's still doing a bit
too much and now we lack the context of _why_. Let me see if I can refactor
this better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]