ajayky-os commented on PR #14333:
URL: https://github.com/apache/iceberg/pull/14333#issuecomment-3628508658

   > > Ran TPC-DS and TP-CH suite on a spark cluster to validate the 
functionality.
   > 
   > @prudhvimaharishi could you share the results if possible?
   
   We are working on verifying the scan time improvements, micro benchmark we 
have documented now at 
https://github.com/GoogleCloudPlatform/gcs-analytics-core?tab=readme-ov-file#micro-benchmarks.
   
   For end to end query execution time, we did 2 set of benchmarks:
   
   1. (gcs-analytics-core enabled, vectored read enabled) vs 
(gcs-analytics-core disabled and vectored read disabled)
   2. (gcs-analytics-core enabled, vectored read enabled) vs 
(gcs-analytics-core disabled and vectored read enabled)
   
   E2E benchmarking setup: 
   - 3 iterations of TPCDS and TPCH queries were run in spark job, sparkMeasure 
was used to capture stats like CPU time, gc time, etc.
   - E2E is end to end time for running the spark sql query.
   - -ve percentage change means gcs-analytics-core performed better.
   
   **(gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core 
disabled and vectored read disabled)**
   
   Median of 3 iterations:
   
   |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc 
Time|% Change Shuffle Fetch Wait Time|
   |---|---|---|---|---|
   |tpcds_sf10|-4.81|5.50|28.31|36.51|
   |tpcds_sf100|-4.33|38.60|78.18|60.40|
   |tpcds_sf1000|-6.59|9.63|95.48|204.39|
   |tpch_sf10|-1.76|-4.68|49.11|42.96|
   |tpch_sf100|-2.73|-5.27|113.64|73.02|
   |tpch_sf1000|-7.87|-4.08|64.41|89.27|
   
   Average of 3 iterations:
   
   |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc 
Time|% Change Shuffle Fetch Wait Time|
   |---|---|---|---|---|
   |tpcds_sf10|-4.14|5.49|40.84|150.38|
   |tpcds_sf100|-3.45|6.67|85.25|140.36|
   |tpcds_sf1000|-4.92|8.46|100.96|113.59|
   |tpch_sf10|-2.34|-4.53|41.88|95.81|
   |tpch_sf100|-2.68|-5.10|104.57|298.36|
   |tpch_sf1000|-6.79|-4.66|84.59|102.29|
   
   
   **(gcs-analytics-core enabled, vectored read enabled) vs (gcs-analytics-core 
disabled and vectored read enabled)**
   
   
   Median of 3 iterations
   
   
   |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc 
Time|% Change Shuffle Fetch Wait Time|
   |---|---|---|---|---|
   |tpcds_sf10|-7.66|34.07|5.94|-27.07|
   |tpcds_sf100|-0.81|4701.41|20.89|15.47|
   |tpcds_sf1000|-8.93|38.85|33.71|35.10|
   |tpch_sf10|-2.21|-8.04|-17.37|-3.11|
   |tpch_sf100|-1.40|-5.30|9.73|-3.21|
   |tpch_sf1000|6.73|-6.28|17.29|647.98|
   
   Average of 3 iterations:
   
   |Schema Size|% Change E2E Time|% Change Executor Cpu Time|% Change Jvm Gc 
Time|% Change Shuffle Fetch Wait Time|
   |---|---|---|---|---|
   |tpcds_sf10|-8.14|28.59|18.44|56.34|
   |tpcds_sf100|-1.28|4051.78|48.83|89.41|
   |tpcds_sf1000|-4.10|30301.87|6431.37|-9.43|
   |tpch_sf10|-2.15|-7.66|-20.96|-27.98|
   |tpch_sf100|-3.14|-5.11|10.50|169.43|
   |tpch_sf1000|7.89|-5.54|17.97|318.64|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to