adamreeve opened a new issue, #45185:
URL: https://github.com/apache/arrow/issues/45185

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When looking into #45073 I found that Arrow doesn't raise an error when 
reading data with invalid repetition levels into Arrow list arrays.
   
   The encryption test files included an int64 list column with leaf-values 
equal to i * 1,000,000,000,000, where i is the leaf-value index. The repetition 
level was set to 1 for even leaf indices and 0 for odd indices, meaning the 
first repetition level was 1 which is invalid. This file is read by PyArrow 
without any error being raised though, and the first leaf value (0) is skipped:
   ```
   pyarrow.Table
   int64_field: list<int64_field: int64 not null> not null
     child 0, int64_field: int64 not null
   ----
   int64_field: 
[[[1000000000000,2000000000000],[3000000000000,4000000000000],...,[97000000000000,98000000000000],[99000000000000]]]
   ```
   
   I wouldn't expect an error to be raised if reading the raw values and 
repetition levels with the lower-level Parquet C++ API, but think reading this 
data as an Arrow list should raise an error.
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to