abbit opened a new issue, #142:
URL: https://github.com/apache/arrow-go/issues/142

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   `parquet.thrift` in parquet-format  repo describes `RowGroup` 
`total_byte_size` field meaning 
[as](https://github.com/apache/parquet-format/blob/737ea12e56357e83b14fd3e27ef274145beed399/src/main/thrift/parquet.thrift#L920C7-L920C76)
   > Total byte size of all the uncompressed column data in this row group
   
   This is also the case for C++ implementation of parquet in arrow repo.
   
   But in case of Go implementation `total_byte_size` is described 
[as](https://github.com/apache/arrow-go/blob/14844aea32054a0b7cc086df58a4a74610b0b306/parquet/metadata/row_group.go#L62)
   > TotalByteSize is the total size of this rowgroup on disk
   
   The difference between these values can be large, when compression is 
applied to column chunks.
   
   My question is: Is that intentional inconsistency with format definition and 
other implementations? And if so, why does this distinction has been made?
   
   ### Component(s)
   
   Go, Parquet
   
   P.S.:
   This issue is duplicate of https://github.com/apache/arrow/issues/44205, but 
as I see this repo is now main location of Arrow Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to