xiewajueji opened a new issue, #45061:
URL: https://github.com/apache/arrow/issues/45061

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I use this method to estimate the size of current RowGroup being written. In 
the situation 1000 columns and average 20MB a row, the sum of all dict is more 
than 1GB which is larger than RowGroup size setting.
   
   ```cpp
     int64_t EstimatedDataEncodedSize() override {
       return kDataPageBitWidthBytes +
              RlePreserveBufferSize(static_cast<int>(buffered_indices_.size()), 
bit_width());
     }
   ```
   
   I found this method not count dict size while Java implementation count. Are 
there any problem if dict size is counted?
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to