Re: [PR] Spark 4.0: Row Lineage support [iceberg]

via GitHub Thu, 10 Jul 2025 11:14:30 -0700


amogh-jahagirdar commented on code in PR #13310:
URL: https://github.com/apache/iceberg/pull/13310#discussion_r2198388161



##########
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java:
##########
@@ -426,17 +428,35 @@ public DeltaWriter<InternalRow> createWriter(int 
partitionId, long taskId) {
               .writeProperties(writeProperties)
               .build();
 
+      Function<InternalRow, InternalRow> rowLineageProjector =
+          context.dataSchema() != null
+                  && 
context.dataSchema().findField(MetadataColumns.ROW_ID.fieldId()) != null
+              ? new ProjectRowLineageFromMetadata()

Review Comment:
   >ok. looks like context.dataSchema() can be null. when could it be null? if 
it is null, row lineage is not carried over. doesn't it violate the spec?
   
   Yeah I observed that during the very initial analysis rules, 
context.dataSchema() won't technically be defined at certain points but 
SparkPositionDeltaWrite will still attempted to be built. Ultimately before 
execution it'll always be non-null because there will be some output schema for 
a write. I added the null check to be defensive otherwise we'd fail with an NPE 
in the middle of analysis needlessly when trying to lookup to determine if 
lineage fields are defined.
   
   
   But I do hear your bigger point, which is it's probably cleaner to try and 
abstract as much as possible behind the project logic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 4.0: Row Lineage support [iceberg]

Reply via email to