manirajv06 commented on issue #13855:
URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3217957429

   @RussellSpitzer Thanks for sharing your views.
   
   Yes, problem statement is to know the list of columns in each file. I can 
also understand the problems with optional fields. In case of null values for 
all records in data file, we might get mis leaded if we go by linked schema id. 
However, max field id of linked schema id could help us in making lenient (with 
few % error rate) decisions which might not be correct for the places where 
decisions have to be made strictly.
   
   "columns written" as a metric for file helps in making accurate decisions 
and suits for all places. It can be added easily as we must be having all the 
info in our hand but comes up with a cost of keeping the list for all files. If 
we think this cost is not really a  overhead to worry about it, then we can go 
with this approach.
   
   @Fokko @amogh-jahagirdar @rdblue Please share your views as well so that we 
can move head with making changes.
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to