[I] Performance Concern: fillHoles() method and read buffer expansion efficiency. [arrow-java]

via GitHub Mon, 10 Feb 2025 01:20:01 -0800


sherman opened a new issue, #599:
URL: https://github.com/apache/arrow-java/issues/599


   ### Describe the enhancement requested
   
   Hi everyone,
   
   I have the following use case: I’m benchmarking the read throughput 
performance when dealing with a large number of non-dictionary string columns 
(300 columns). Based on the profiler output (see the attached picture), I’ve 
noticed that a significant amount of time is spent in the fillHoles() method, 
which is part of the read buffer expansion process.
   
   My question is: why is the buffer filled one element at a time instead of 
using a bulk operation? Wouldn’t a batch approach be more efficient?
   
   Looking forward to your insights. Thanks!
   
   
![Image](https://github.com/user-attachments/assets/57108c7a-126d-4370-9a01-4f0aa85218d9)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] Performance Concern: fillHoles() method and read buffer expansion efficiency. [arrow-java]

Reply via email to