chenjunjiedada commented on PR #6026:
URL: https://github.com/apache/iceberg/pull/6026#issuecomment-1287561773

   > 
   > I'm a bit confused of this behavior: `ReadConf.startRowPositions` is valid 
only if `_pos` column exists in the `expectedSchema` due to #1716. Are there 
use cases that `_pos` is absent and we still need `ReadConf.startRowPositions`? 
By looking at the class `VectorizedParquetReader` and `ParquetReader` who are 
consuming `ReadConf.startRowPositions`, it seems likely the schema doesn't have 
`_pos`. cc @chenjunjiedada @aokolnychyi
   
   The row group start positions are always computed but are only correct when 
it is projected right now.  That's intended because we don't want to read the 
parquet footer one more time.  But since the footer must be read at least once, 
we should be able to cache some content during the first access to avoid the 
current optimization logic and thus simply the logic to check `_pos` column.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to