paleolimbot commented on PR #737:
URL: https://github.com/apache/sedona-db/pull/737#issuecomment-4143136709
Thanks @pwrliang!
I ran the spill benchmarks and this PR does cause changes (full output
below). My personal summary is:
- Spilling with LZ4 compression is slower because the spill files are bigger
and it takes more time to compress/decompress than just reconstructing the
bounding rectangles on the fly
- Spilling without compression is faster because the bounding rectangles
don't have to be recomputed and reading is fast without compression.
@Kontinuation if you have bandwidth, do you have an opinion on whether or
not it is worth threading the `SpatialJoinProvider` through the spill reader
machinery is worth it? (It's a bit ugly but can be done). (Also any critiques
of the approach are very welcome!)
<details>
```
Gnuplot not found, using plotters backend
evaluated_batch_spill/wkb/uncompressed/spill_writer/rows_8192_batches_64
time: [54.238 ms 81.606 ms 112.42 ms]
thrpt: [4.6636 Melem/s 6.4246 Melem/s 9.6665
Melem/s]
change:
time: [−16.184% +38.800% +123.16%] (p = 0.21 >
0.05)
thrpt: [−55.189% −27.954% +19.310%]
No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe
evaluated_batch_spill/wkb/uncompressed/spill_reader/rows_8192_batches_64
time: [8.9170 ms 8.9792 ms 9.0470 ms]
thrpt: [57.952 Melem/s 58.389 Melem/s 58.796
Melem/s]
change:
time: [−22.032% −21.417% −20.801%] (p = 0.00 <
0.05)
thrpt: [+26.264% +27.254% +28.258%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
Benchmarking
evaluated_batch_spill/wkb/uncompressed/spill_reader_raw/rows_8192_batches_64:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 7.2s, enable flat sampling, or reduce sample count to 50.
evaluated_batch_spill/wkb/uncompressed/spill_reader_raw/rows_8192_batches_64
time: [1.4042 ms 1.4155 ms 1.4282 ms]
thrpt: [367.08 Melem/s 370.39 Melem/s 373.36
Melem/s]
change:
time: [+44.828% +46.664% +48.687%] (p = 0.00 <
0.05)
thrpt: [−32.744% −31.817% −30.953%]
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
evaluated_batch_spill/wkb/lz4/spill_writer/rows_8192_batches_64
time: [7.1988 ms 7.4609 ms 7.7557 ms]
thrpt: [67.600 Melem/s 70.271 Melem/s 72.830
Melem/s]
change:
time: [−65.392% −51.088% −25.395%] (p = 0.00 <
0.05)
thrpt: [+34.040% +104.45% +188.95%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
evaluated_batch_spill/wkb/lz4/spill_reader/rows_8192_batches_64
time: [29.579 ms 30.308 ms 31.130 ms]
thrpt: [16.842 Melem/s 17.299 Melem/s 17.725
Melem/s]
change:
time: [+57.496% +61.635% +66.431%] (p = 0.00 <
0.05)
thrpt: [−39.915% −38.132% −36.506%]
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
evaluated_batch_spill/wkb/lz4/spill_reader_raw/rows_8192_batches_64
time: [17.819 ms 18.098 ms 18.439 ms]
thrpt: [28.434 Melem/s 28.969 Melem/s 29.423
Melem/s]
change:
time: [+140.87% +148.00% +155.46%] (p = 0.00 <
0.05)
thrpt: [−60.854% −59.677% −58.484%]
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
evaluated_batch_spill/wkb_view/uncompressed/spill_writer/rows_8192_batches_64
time: [32.857 ms 47.923 ms 65.380 ms]
thrpt: [8.0191 Melem/s 10.940 Melem/s 15.957
Melem/s]
change:
time: [−59.034% −30.761% +21.066%] (p = 0.21 >
0.05)
thrpt: [−17.401% +44.426% +144.11%]
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
evaluated_batch_spill/wkb_view/uncompressed/spill_reader/rows_8192_batches_64
time: [9.8706 ms 10.117 ms 10.386 ms]
thrpt: [50.480 Melem/s 51.822 Melem/s 53.116
Melem/s]
change:
time: [−18.493% −16.582% −14.334%] (p = 0.00 <
0.05)
thrpt: [+16.732% +19.878% +22.689%]
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
8 (8.00%) high mild
8 (8.00%) high severe
Benchmarking
evaluated_batch_spill/wkb_view/uncompressed/spill_reader_raw/rows_8192_batches_64:
Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 9.1s, enable flat sampling, or reduce sample count to 50.
evaluated_batch_spill/wkb_view/uncompressed/spill_reader_raw/rows_8192_batches_64
time: [1.7920 ms 1.7987 ms 1.8048 ms]
thrpt: [290.50 Melem/s 291.48 Melem/s 292.56
Melem/s]
change:
time: [+39.303% +40.498% +41.641%] (p = 0.00 <
0.05)
thrpt: [−29.399% −28.825% −28.214%]
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
evaluated_batch_spill/wkb_view/lz4/spill_writer/rows_8192_batches_64
time: [13.684 ms 14.010 ms 14.360 ms]
thrpt: [36.509 Melem/s 37.422 Melem/s 38.313
Melem/s]
change:
time: [+25.113% +28.922% +32.844%] (p = 0.00 <
0.05)
thrpt: [−24.724% −22.434% −20.072%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
evaluated_batch_spill/wkb_view/lz4/spill_reader/rows_8192_batches_64
time: [31.779 ms 32.911 ms 34.165 ms]
thrpt: [15.346 Melem/s 15.931 Melem/s 16.498
Melem/s]
change:
time: [+54.988% +60.293% +66.871%] (p = 0.00 <
0.05)
thrpt: [−40.074% −37.614% −35.479%]
Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
18 (18.00%) high severe
evaluated_batch_spill/wkb_view/lz4/spill_reader_raw/rows_8192_batches_64
time: [20.703 ms 21.384 ms 22.131 ms]
thrpt: [23.690 Melem/s 24.518 Melem/s 25.325
Melem/s]
change:
time: [+110.17% +116.91% +125.53%] (p = 0.00 <
0.05)
thrpt: [−55.660% −53.899% −52.419%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
6 (6.00%) high mild
10 (10.00%) high severe
Running bench/evaluated_batch/external_evaluated_batch_stream.rs
(target/release/deps/external_evaluated_batch_stream-d4faefea9c812cdb)
Gnuplot not found, using plotters backend
Benchmarking
external_evaluated_batch_stream/wkb/uncompressed/external_stream/rows_8192_batches_64:
Collecting 100 samples in estimated 5.4726 s (700 iteratio
external_evaluated_batch_stream/wkb/uncompressed/external_stream/rows_8192_batches_64
time: [7.8205 ms 7.8563 ms 7.9107 ms]
thrpt: [66.276 Melem/s 66.734 Melem/s 67.040
Melem/s]
change:
time: [−26.831% −26.480% −25.997%] (p = 0.00 <
0.05)
thrpt: [+35.129% +36.018% +36.670%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
external_evaluated_batch_stream/wkb/lz4/external_stream/rows_8192_batches_64
time: [20.015 ms 20.235 ms 20.528 ms]
thrpt: [25.540 Melem/s 25.910 Melem/s 26.195
Melem/s]
change:
time: [+81.356% +83.458% +86.292%] (p = 0.00 <
0.05)
thrpt: [−46.321% −45.492% −44.860%]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking
external_evaluated_batch_stream/wkb_view/uncompressed/external_stream/rows_8192_batches_64:
Collecting 100 samples in estimated 5.4978 s (700 ite
external_evaluated_batch_stream/wkb_view/uncompressed/external_stream/rows_8192_batches_64
time: [7.8501 ms 7.8644 ms 7.8796 ms]
thrpt: [66.538 Melem/s 66.666 Melem/s 66.788
Melem/s]
change:
time: [−27.684% −27.499% −27.327%] (p = 0.00 <
0.05)
thrpt: [+37.603% +37.929% +38.281%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
external_evaluated_batch_stream/wkb_view/lz4/external_stream/rows_8192_batches_64
time: [22.449 ms 22.549 ms 22.652 ms]
thrpt: [23.145 Melem/s 23.251 Melem/s 23.355
Melem/s]
change:
time: [+96.852% +97.931% +99.105%] (p = 0.00 <
0.05)
thrpt: [−49.775% −49.477% −49.200%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]