leaves12138 opened a new issue, #46010:
URL: https://github.com/apache/arrow/issues/46010
### Describe the usage question you have. Please include as many useful
details as possible.
I am using arrow parquet to write arrow data to parquet files. But for now,
arrow parquet options only support `max_row_group_length` which define the max
row contained in one row_group. But I am puzzled when I have to limit the
stripe bytes size but row numbers.
If I change writer.cc in src/parquet/arrow method WriteRecordBatch from :
```
// Initialize a new buffered row group writer if necessary.
if (row_group_writer_ == nullptr || !row_group_writer_->buffered() ||
row_group_writer_->num_rows() >= max_row_group_length ) {
RETURN_NOT_OK(NewBufferedRowGroup());
}```
to
```
// Initialize a new buffered row group writer if necessary.
if (row_group_writer_ == nullptr || !row_group_writer_->buffered() ||
row_group_writer_->num_rows() >= max_row_group_length ||
(row_group_writer_ -> total_compressed_bytes()
+ row_group_writer_ ->total_compressed_bytes_written()
>= stripe_size_expected) ) {
RETURN_NOT_OK(NewBufferedRowGroup());
}
```
Could I settle this?
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]