leaves12138 opened a new issue, #46010:
URL: https://github.com/apache/arrow/issues/46010

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I am using arrow parquet to write arrow data to parquet files. But for now, 
arrow parquet options only support `max_row_group_length` which define the max 
row contained in one row_group. But I am puzzled when I have to limit the 
stripe bytes size but row numbers.
   
   If I change writer.cc in src/parquet/arrow method WriteRecordBatch from :
       ```
     // Initialize a new buffered row group writer if necessary.
       if (row_group_writer_ == nullptr || !row_group_writer_->buffered() ||
           row_group_writer_->num_rows() >= max_row_group_length ) {
         RETURN_NOT_OK(NewBufferedRowGroup());
       }```
   
   to
   ```
       // Initialize a new buffered row group writer if necessary.
       if (row_group_writer_ == nullptr || !row_group_writer_->buffered() ||
           row_group_writer_->num_rows() >= max_row_group_length ||
           (row_group_writer_ -> total_compressed_bytes()
                   + row_group_writer_ ->total_compressed_bytes_written()
               >= stripe_size_expected) ) {
         RETURN_NOT_OK(NewBufferedRowGroup());
       }
   ```
   
   Could I settle this?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to