wypoon commented on code in PR #11520:
URL: https://github.com/apache/iceberg/pull/11520#discussion_r1841283880


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/data/vectorized/ColumnarBatchReader.java:
##########
@@ -55,8 +55,14 @@ public ColumnarBatchReader(List<VectorizedReader<?>> 
readers) {
   @Override
   public void setRowGroupInfo(
       PageReadStore pageStore, Map<ColumnPath, ColumnChunkMetaData> metaData, 
long rowPosition) {
-    super.setRowGroupInfo(pageStore, metaData, rowPosition);
-    this.rowStartPosInBatch = rowPosition;
+    setRowGroupInfo(pageStore, metaData);
+  }
+
+  @Override
+  public void setRowGroupInfo(
+      PageReadStore pageStore, Map<ColumnPath, ColumnChunkMetaData> metaData) {
+    super.setRowGroupInfo(pageStore, metaData);
+    this.rowStartPosInBatch = pageStore.getRowIndexOffset().orElse(0L);

Review Comment:
   That is a good question.
   As I understand it, the `PageReadStore` implementation 
(`ColumnChunkPageReadStore`) is normally constructed with the `rowIndexOffset`, 
but if the offset is not available then it is constructed with -1 for the 
`rowIndexOffset`. `PageReadStore::getRowIndexOffset()` will not return a 
negative value; it will return `Optional.empty()` in that case.
   I suppose we can throw an IllegalArgumentException instead in such a 
situation, instead of setting `rowStartPosInBatch` to 0.
   @flyrain do you have an opinion on this?
   Is there someone who knows Parquet well who can confirm that in normal 
operation, `PageReadStore::getRowIndexOffset()` should *not* return 
`Optional.empty()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to