etseidl opened a new issue, #34086: URL: https://github.com/apache/arrow/issues/34086
### Describe the bug, including details regarding any error messages, version, and platform. When writing Parquet files with version 2 page headers, the `num_rows` field is incorrect. This appears to be because in `column_writer.cc ColumnWriterImpl::BuildDataPageV2()` `num_values` is passed twice to the constructor for `DataPageV2`. The 4th argument should be `num_rows`. To reproduce: ```python import pyarrow.parquet as pq import pyarrow as pa table = pa.table({'col0': [[1,2,3]]}) pq.write_table(table, 'bug.parquet', data_page_version="2.0") ``` Examining with parquet-cli: ```sh % parquet-cli pages bug.parquet Column: col0.list.item -------------------------------------------------------------------------------- page type enc count avg size size rows nulls min / max 0-D dict S _ 3 8.00 B 24 B 0-1 data _ R 3 2.67 B 8 B 3 0 "1" / "3" ``` "rows" should be 1. Rewriting the file with parquet-mr gives: ```sh % parquet-cli pages bug-mr.parquet Column: col0.list.element -------------------------------------------------------------------------------- page type enc count avg size size rows nulls min / max 0-0 data _ D 3 5.00 B 15 B 1 0 ``` ### Component(s) C++, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org