[I] [Parquet][C++] PageIndex is useless with current API [arrow]

via GitHub Thu, 16 Jan 2025 04:45:06 -0800


mpoeter opened a new issue, #45284:
URL: https://github.com/apache/arrow/issues/45284


   ### Describe the enhancement requested
   
   The `ParquetFileReader` provides a `PageIndexReader` via which we can 
eventually get to a `ColumnIndex` and an `OffsetIndex` - so far so good. Those 
indexes provide page based information, but in virtually all APIs the concept 
of pages is completely abstracted away. For higher level APIs that makes sense, 
but even if we go down to the level of the `PageReader` we can only read all 
pages serially of after the other. The only way I found to skip some pages is 
via the `PageReader`'s data page filter, but that operates on the page's 
metadata and does not utilize the index. I did not find a way to load a 
specific page (e.g.,via index or file offset). But then I don't see how one can 
utilize the PageIndex with the current API. Did I miss anything?
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Parquet][C++] PageIndex is useless with current API [arrow]

Reply via email to