amogh-jahagirdar commented on code in PR #13310: URL: https://github.com/apache/iceberg/pull/13310#discussion_r2198388161
########## spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java: ########## @@ -426,17 +428,35 @@ public DeltaWriter<InternalRow> createWriter(int partitionId, long taskId) { .writeProperties(writeProperties) .build(); + Function<InternalRow, InternalRow> rowLineageProjector = + context.dataSchema() != null + && context.dataSchema().findField(MetadataColumns.ROW_ID.fieldId()) != null + ? new ProjectRowLineageFromMetadata() Review Comment: >ok. looks like context.dataSchema() can be null. when could it be null? if it is null, row lineage is not carried over. doesn't it violate the spec? Yeah I observed that during the very initial analysis rules, context.dataSchema() won't technically be defined at certain points but SparkPositionDeltaWrite will still attempted to be built. Ultimately before execution it'll always be non-null because there will be some output schema for a write. I added the null check to be defensive otherwise we'd fail with an NPE in the middle of analysis needlessly when trying to lookup to determine if lineage fields are defined. But I do hear your bigger point, which is it's probably cleaner to try and abstract as much as possible behind the project logic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org