Re: [PR] Collect list of columns written in file [iceberg]

via GitHub Wed, 24 Sep 2025 15:10:20 -0700


manirajv06 commented on code in PR #14126:
URL: https://github.com/apache/iceberg/pull/14126#discussion_r2376497638



##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetrics.java:
##########
@@ -87,6 +92,7 @@ static Metrics metrics(
         }
 
         int fieldId = id.intValue();
+        columnsWritten.add(fieldId);

Review Comment:
   My understanding is, More than one name could be mapped to the same field 
id. Say, Name1 & Name2 mapped to field id 1. `ColumnsWritten` would contain `1` 
always while reading above blocks irrespective of whether Name1 or Name2 has 
been used as column name at the time of write operations.
   
   When `Expression` and/or `Predicates` has columns (either Name1 or Name2) 
specified in it, both would be resolved to the same field id `1` and passed to 
evaluation (strict, inclusive etc) process. Including data files containing 
values written using `Name1` and files containing values written using `Name2` 
for remaining scan process is not going to cause any problems. It does not 
really matter how it is being called outside as long as we are using the same 
field id internally during reads and writes.
   
   Is my understanding correct?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Collect list of columns written in file [iceberg]

Reply via email to