rbalamohan opened a new issue, #6320:
URL: https://github.com/apache/iceberg/issues/6320

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   When running queries like Q27 in iceberg V2 with vectorized parquet reading, 
it was observed that it is slower than traditional 
spark+vectorized_parq_reading.  Profile more revealed that cache allocation was 
causing pressure on JVM which is covered in 
https://github.com/apache/iceberg/issues/6319.
   
   I added a local patch to disable the cache and profile for CPU. This was 
done to get past this issue and look for other bottlenecks. This revealed that 
good amount of CPU was spent on ArrowBuf boundary checks. This can be disabled 
by having "-Darrow.enable_unsafe_memory_access=true" in the JVM options.  I 
observed 25% improvement in runtime with vectorized processing in q27 with both 
these issues addressed. Need to check if this option can be enabled in iceberg 
directly, or it needs to be documented so that users can include it in executor 
& driver JVM options.
   
   
![q27_ice_alloc_cpu_v2](https://user-images.githubusercontent.com/7969713/204759995-feaea6f6-1fd8-45a3-915c-9a7294863ff3.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to