jinyangli34 commented on PR #11258: URL: https://github.com/apache/iceberg/pull/11258#issuecomment-2403412692
Run benchmark again, increased `NUM_RECORDS` from 1M to 5M Tested 4 groups: **main**: main branch without change in this PR **PR**: this PR **PR+2**: two more getBufferedSize calls per add value ``` @Override public void add(T value) { recordCount += 1; + long size1 = writeStore.getBufferedSize(); + long size2 = writeStore.getBufferedSize(); + if (size1 != size2) { + throw new RuntimeException("Buffered size changed after adding a record"); + } long sizeBeforeWrite = writeStore.getBufferedSize(); model.write(0, value); this.currentRawBufferedSize += writeStore.getBufferedSize() - sizeBeforeWrite; ``` **PR+4**: four more getBufferedSize calls per add value ``` @Override public void add(T value) { recordCount += 1; + long size1 = writeStore.getBufferedSize(); + long size2 = writeStore.getBufferedSize(); + long size3 = writeStore.getBufferedSize(); + long size4 = writeStore.getBufferedSize(); + if (size1 != size2 || size3 != size4) { + throw new RuntimeException("Buffered size changed after adding a record"); + } long sizeBeforeWrite = writeStore.getBufferedSize(); model.write(0, value); this.currentRawBufferedSize += writeStore.getBufferedSize() - sizeBeforeWrite; ``` Avg numbers: ``` Flat Benchmark Avg Main PR PR+2 PR+4 writeUsingIcebergWriter 15.773 15.976 16.672 17.133 writeUsingSparkWriter 16.056 15.826 15.830 15.891 Nested Benchmark Avg Main PR PR+2 PR+4 writeUsingIcebergWriter 9.683 9.775 9.978 10.199 writeUsingSparkWriter 10.156 9.676 9.698 9.683 ``` Comparing this PR vs main branch, after this change: Iceberg Writer is 1.3% slower for flat data and 0.95% slower for nested data Spark Writer is 1.4% faster for flat data and 4.7% faster for nested data [iceberg-pr-11258-perf-test.csv](https://github.com/user-attachments/files/17318460/iceberg-pr-11258-perf-test.csv) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org