emkornfield commented on issue #13855:
URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3246179320

   > Currently metrics are not good for this because we have almost no way to 
determine the difference between, didn't store metrics for a column and column 
wasn't written. In general we assume that if we don't see metrics, that column 
exists.
   
   I agree with this.
   
   > Linking Schema has a similar issue, If I don't have metrics for an 
optional column it could be missing or It could have values, so I can't make 
the call.
   
   IIUC, the proposal is not meant to address this.  There are two cases:
   
   1.  The column did not exist at the time or writing (and therefore we 
couldn't possibly write statistics for that file) so we need some additional 
metadata to record this fact (e.g. schema ID).
   2. Some writers are not writing out optional columns even if they are in the 
schema or we want to identify all null values for a column that existed in the 
schema.
   
   (1) seems reasonable.  (2) seems like it can already covered (see below).
   
   > I just worry that the more common case is schemas with optional columns 
(possibly many many optional columns) where we aren't storing metrics.
   
   This is an implementation issue though?  It seems it can be mostly solved by 
adapting metadata writers to write out null-counts when null-count = value 
count? (It might be a little awkward in V4).  But we have a [current PR 
reiterating that all columns must be written even if values are 
null](https://github.com/apache/iceberg/pull/13936).  Or is the argument here 
solely that the cost to store the statistics in the current data-structures too 
expensive?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to