jeesou commented on PR #10659: URL: https://github.com/apache/iceberg/pull/10659#issuecomment-2252176015
Hi @huaxingao , @karuppayya cc : @RussellSpitzer We were running some tests on Spark with the latest codes. We took the changes of the previous PR https://github.com/apache/iceberg/pull/10288, along with this PR changes, and generated the .stats file and we can see the ndv values in the metadata. So to verify the performance enhancement we ran the TPCH queries. On running the query at 1000 G scale we are facing some issues on certain queries (query umbers - 5,7,8,9,10) where while performing the broadcast join, some error occurred. I am sharing the log for query number 8 you can check. I am sharing logical plan as well.   Sharing the error log for query 5 as well  Sharing the config we used for reference - "spark.executor.cores": "6", "spark.executor.memory": "24G", "spark.driver.cores": "6", "spark.driver.memory": "24G", "spark.driver.maxResultSize":"0", "spark.sql.iceberg.enable-column-stats": "true", "spark.sql.cbo.enabled": "true", "spark.sql.cbo.joinReorder.enabled": "true", we have tried by upscaling the executor and driver cores and memoryto 12/48 scale. but received the same issue. Kindly help us understand if we are missing anything out, or is this an issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org