[I] [Go][Parquet] Does this implementation support page indexes? [arrow-go]

via GitHub Thu, 29 Aug 2024 18:47:46 -0700


jhump opened a new issue, #33:
URL: https://github.com/apache/arrow-go/issues/33


   I would like to create page indexes (per [this 
doc](https://github.com/apache/parquet-format/blob/master/PageIndex.md)) for a 
column when writing a parquet file, and then use that index to seek to a 
particular row. I have a case where the file is sorted by a particular ID, and 
the queries often want to start with a particular ID and then read all rows 
thereafter. Without an index, I can find the right row group using statistics, 
but then have to scan through all values in the column in the row group to find 
the ID and determine the right row.
   
   I've gone through all of the code and API in the parquet package and 
sub-packages, and all I can find are columns in Thrift-generated code for this 
and accessors in the column chunk metadata that return the file offset for 
where the index is stored. But there seems to be no API to actually read the 
index and use it. And there is no configuration, on the write side, for whether 
to create an index or not.
   
   When will this be supported? How active is development on the Go runtime?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Go][Parquet] Does this implementation support page indexes? [arrow-go]

Reply via email to