rbalamohan opened a new issue, #6320: URL: https://github.com/apache/iceberg/issues/6320
### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 When running queries like Q27 in iceberg V2 with vectorized parquet reading, it was observed that it is slower than traditional spark+vectorized_parq_reading. Profile more revealed that cache allocation was causing pressure on JVM which is covered in https://github.com/apache/iceberg/issues/6319. I added a local patch to disable the cache and profile for CPU. This was done to get past this issue and look for other bottlenecks. This revealed that good amount of CPU was spent on ArrowBuf boundary checks. This can be disabled by having "-Darrow.enable_unsafe_memory_access=true" in the JVM options. I observed 25% improvement in runtime with vectorized processing in q27 with both these issues addressed. Need to check if this option can be enabled in iceberg directly, or it needs to be documented so that users can include it in executor & driver JVM options.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org