wypoon commented on code in PR #11520: URL: https://github.com/apache/iceberg/pull/11520#discussion_r1841283880
########## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java: ########## @@ -55,8 +55,14 @@ public ColumnarBatchReader(List<VectorizedReader<?>> readers) { @Override public void setRowGroupInfo( PageReadStore pageStore, Map<ColumnPath, ColumnChunkMetaData> metaData, long rowPosition) { - super.setRowGroupInfo(pageStore, metaData, rowPosition); - this.rowStartPosInBatch = rowPosition; + setRowGroupInfo(pageStore, metaData); + } + + @Override + public void setRowGroupInfo( + PageReadStore pageStore, Map<ColumnPath, ColumnChunkMetaData> metaData) { + super.setRowGroupInfo(pageStore, metaData); + this.rowStartPosInBatch = pageStore.getRowIndexOffset().orElse(0L); Review Comment: That is a good question. As I understand it, the `PageReadStore` implementation (`ColumnChunkPageReadStore`) is normally constructed with the `rowIndexOffset`, but if the offset is not available then it is constructed with -1 for the `rowIndexOffset`. `PageReadStore::getRowIndexOffset()` will not return a negative value; it will return `Optional.empty()` in that case. I suppose we can throw an IllegalArgumentException instead in such a situation, instead of setting `rowStartPosInBatch` to 0. @flyrain do you have an opinion on this? Is there someone who knows Parquet well who can confirm that in normal operation, `PageReadStore::getRowIndexOffset()` should *not* return `Optional.empty()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org