mapleFU opened a new issue, #45257:
URL: https://github.com/apache/arrow/issues/45257

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The code[1] would cleanup the min-max stats in Parquet. For ByteArray, we 
may "Merge" multiple stats when reading from file. Things would be  tricky in 
the code below when `min = ""`
   
   1. Code in [2] is empty, so `PlainDecode` will not be called, and 
`has_min_max_` is `true`. But `ByteArray` keeps default constructor, which 
leaves `ptr == nullptr` [3]
   2. When call `TypedStatistics::Merge`, this will call Cleanup [1], and 
finally, the min-max statistics would leave unchanged.
   
   So, when `min = ""` being merged, the min-max will keep the old statistics.
   
   [1] 
https://github.com/apache/arrow/blob/ea47172bd80b5ee040c19e605f7e4a6f872b470f/cpp/src/parquet/statistics.cc#L408
   [2] 
https://github.com/apache/arrow/blob/ea47172bd80b5ee040c19e605f7e4a6f872b470f/cpp/src/parquet/statistics.cc#L609
   [3] 
https://github.com/apache/arrow/blob/ea47172bd80b5ee040c19e605f7e4a6f872b470f/cpp/src/parquet/types.h#L587
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to