2010YOUY01 commented on code in PR #733:
URL: https://github.com/apache/sedona-db/pull/733#discussion_r2978755863


##########
python/sedonadb/tests/test_sjoin.py:
##########
@@ -68,6 +69,96 @@ def test_spatial_join(join_type, on):
         eng_postgis.assert_query_result(sql, sedonadb_results)
 
 
+def _plan_text(df):
+    query_plan = df.to_pandas()
+    return "\n".join(query_plan.iloc[:, 1].astype(str).tolist())
+
+
+def _spatial_join_side_file_names(plan_text):
+    """Extract the left/right parquet file names used by `SpatialJoinExec`.
+
+    Example input:
+        SpatialJoinExec: join_type=Inner, on=ST_intersects(geo_right@0, 
geo_left@0)
+          ProjectionExec: expr=[geometry@0 as geo_right]
+            DataSourceExec: file_groups={1 group: 
[[.../natural-earth_countries_geo.parquet]]}, projection=[geometry], 
file_type=parquet
+          ProbeShuffleExec: partitioning=RoundRobinBatch(1)
+            ProjectionExec: expr=[geometry@0 as geo_left]
+              DataSourceExec: file_groups={1 group: 
[[.../natural-earth_cities_geo.parquet]]}, projection=[geometry], 
file_type=parquet
+
+    Example output:
+        ["natural-earth_countries_geo", "natural-earth_cities_geo"]
+    """
+    spatial_join_idx = plan_text.find("SpatialJoinExec:")
+    assert spatial_join_idx != -1, plan_text
+
+    file_names = re.findall(
+        r"DataSourceExec:.*?/([^/\]]+)\.parquet", plan_text[spatial_join_idx:]
+    )
+    assert len(file_names) >= 2, plan_text
+    return file_names[:2]
+
+
+def test_spatial_join_reordering_can_be_disabled_e2e(geoarrow_data):

Review Comment:
   One consideration is the test’s running time: the two Parquet files are 
around 10 KB, and this test runs in about 0.1 s on my machine.
   
   I couldn’t find another way to perform the e2e test. The 
`sd_random_geometry()` table function does not seem to produce statistics, so 
spatial join reordering cannot occur.
   
   Hope this is okay.



##########
rust/sedona-spatial-join/src/planner/physical_planner.rs:
##########
@@ -192,8 +197,20 @@ impl ExtensionPlanner for SpatialJoinExtensionPlanner {
 ///    produce a smaller and more efficient spatial index (R-tree).
 /// 2. If row-count statistics are unavailable (for example, for CSV sources),
 ///    fall back to total input size as an estimate.
-/// 3. Do not swap the join order if no relevant statistics are available.
-fn should_swap_join_order(left: &dyn ExecutionPlan, right: &dyn ExecutionPlan) 
-> Result<bool> {
+/// 3. Do not swap the join order if join reordering is disabled or no relevant
+///    statistics are available.
+fn should_swap_join_order(
+    spatial_join_options: &SpatialJoinOptions,

Review Comment:
   Use `SpatialJoinOptions` arg instead of a boolean flag, since this function 
will likely require additional options in the future and this approach is more 
extensible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to