[PR] perf(rust/sedona-spatial-join): Use row count first to decide join order [sedona-db]

via GitHub Thu, 19 Mar 2026 03:50:22 -0700


2010YOUY01 opened a new pull request, #725:
URL: https://github.com/apache/sedona-db/pull/725


   The code change is around 10 lines, PR is mostly test
   
   # Motivation
   I noticed a bad join order in SpatialBench Q10:
   - 
https://sedona.apache.org/spatialbench/queries/#q10-zone-statistics-for-trips-starting-within-each-zone
   
   Here is a reproducer in `sedona-cli` for a simplified Q10
   ```sh
   yongting@Yongtings-MacBook-Pro-2 ~/C/s/sedona-cli (main)> cargo run 
--profile release -- -m unlimited
   
   Sedona CLI v0.4.0
   > CREATE EXTERNAL TABLE building
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/building.parquet';
   
   CREATE EXTERNAL TABLE customer
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/customer.parquet';
   
   CREATE EXTERNAL TABLE driver
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/driver.parquet';
   
   CREATE EXTERNAL TABLE trip
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/trip.parquet';
   
   CREATE EXTERNAL TABLE vehicle
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/vehicle.parquet';
   
   CREATE EXTERNAL TABLE zone
   STORED AS PARQUET
   LOCATION '/Users/yongting/data/geo/zone.parquet';
   
   explain analyze select *
   from zone join trip
   on st_intersects(st_geomfromwkb(zone.z_boundary), 
st_geomfromwkb(trip.t_pickuploc));
   ```
   
   The explain output said the join order is swapped, `trip` table is the build 
side, and `zone` table is the probe side.
   
   Their stats are
   - trip: 6M rows, 300MiB size
   - zone: 150k rows, 1.3GiB size
   
   The existing spatial join reordering heuristic is
   1. first compare size, if memory size for plan statistics available, swap 
the smaller side to the build(left) side
   2. If size NA, compare number of rows, and swap the side with fewer rows to 
the build side
   ...
   
   As a result, the `trip` table is swapped to the build side because it has a 
smaller in-mem size, even its row count is larger. This make the query runs 
less efficient
   
   # Purposed Change
   First compare row count, then memory size for spatial join reordering.
   
   I think this works better in general workloads, and it's not a overfit for 
the benchmark. The only influenced workload is like the above `zone` and `trip` 
table example, `zone` has fewer rows, but it might be wider, or its geometry 
payload has larger per-entry size (polygons are larger than points for memory 
size).
   If we build the r-tree with `zone`, the r-tree will be smaller and it is 
faster to probe, the remaining work remains the same, because the r-tree is 
built/probed with rectangles, and building it with more complex geometries 
won't get slower.
   
   (Though it's only a simple default rule, join reordering is a very hard 
topic, I believe there are much more work to do in the future, after DataFusion 
has better statistics module for cardinality estimation)
   
   After the change, the query in the above example runs in:
   ```
   Before 1.9s
   After 1.3s
   ```
   
   `SpatialBench Q10` should also get faster
   
   # Implementation changes
   1. Spatial join reordering rule change
   2. UTs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] perf(rust/sedona-spatial-join): Use row count first to decide join order [sedona-db]

Reply via email to