jiayuasu commented on issue #2861:
URL: https://github.com/apache/sedona/issues/2861#issuecomment-4322972958

   ## Update: validated internally
   
   I implemented this exact proposal in our internal Sedona fork as a sanity 
check. A few notes from doing it on the full Catalog:
   
   **1. Easy diff, registration order preserved.**  
   Sole change is splitting the flat `Seq[FunctionDescription]` into named 
`val`s and concatenating them in the same order. No new types, no behavior 
change. The diff is large (lots of moved lines) but reviewable line-by-line 
because every function maps cleanly to a docs section.
   
   **2. Two functions don't fit any docs category — propose handling each:**
   - `Barrier` — internal helper for join planning, not in any docs page. 
Suggest adding it to a small `otherExprs` catch-all.
   - `ST_KNN` — has its own docs page (`NearestNeighbourSearching.md`) but 
isn't listed in any of the 18 categories on `/api/sql/Geometry-Functions/`. 
Suggest grouping it under `spatialIndexingExprs` since it's a nearest-neighbor 
lookup helper.
   
   **3. dbx-incompatible group splits naturally by docs category:**
   The current OSS `geoStatsFunctions()` block already maps cleanly:
   - `ST_DBSCAN`, `ST_LocalOutlierFactor` → `Clustering-Functions`
   - `ST_GLocal`, `ST_BinaryDistanceBandColumn`, 
`ST_WeightedDistanceBandColumn` → `Spatial-Statistics`
   
   So the function can be split into two categorized sub-sequences, and a CI 
test like "every registered function has a docs category" becomes possible.
   
   **4. Catch test (worth landing).**  
   A tiny test that asserts every entry in `Catalog.expressions` lives in 
exactly one of the named sequences caught a real regression for me — three 
functions had been added on master but missed during the move. Without that 
test you can drop functions silently. Strongly recommend adding it as part of 
the PR.
   
   **5. No need for a parallel `functionCategories` map in OSS.**  
   Since OSS doesn't have telemetry / docs-generation consumers today, the 
named sequences alone are enough. If a consumer shows up later, deriving a 
category map from the named sequences is ~10 lines.
   
   Happy to send a PR aligned to the body of this issue — the only thing I want 
maintainer input on first is the tradeoff in (5): include the 
`functionCategories` map upfront for forks/downstream to consume, or keep the 
change minimal and let it be added on demand?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to