jiayuasu commented on issue #2861: URL: https://github.com/apache/sedona/issues/2861#issuecomment-4322972958
## Update: validated internally I implemented this exact proposal in our internal Sedona fork as a sanity check. A few notes from doing it on the full Catalog: **1. Easy diff, registration order preserved.** Sole change is splitting the flat `Seq[FunctionDescription]` into named `val`s and concatenating them in the same order. No new types, no behavior change. The diff is large (lots of moved lines) but reviewable line-by-line because every function maps cleanly to a docs section. **2. Two functions don't fit any docs category — propose handling each:** - `Barrier` — internal helper for join planning, not in any docs page. Suggest adding it to a small `otherExprs` catch-all. - `ST_KNN` — has its own docs page (`NearestNeighbourSearching.md`) but isn't listed in any of the 18 categories on `/api/sql/Geometry-Functions/`. Suggest grouping it under `spatialIndexingExprs` since it's a nearest-neighbor lookup helper. **3. dbx-incompatible group splits naturally by docs category:** The current OSS `geoStatsFunctions()` block already maps cleanly: - `ST_DBSCAN`, `ST_LocalOutlierFactor` → `Clustering-Functions` - `ST_GLocal`, `ST_BinaryDistanceBandColumn`, `ST_WeightedDistanceBandColumn` → `Spatial-Statistics` So the function can be split into two categorized sub-sequences, and a CI test like "every registered function has a docs category" becomes possible. **4. Catch test (worth landing).** A tiny test that asserts every entry in `Catalog.expressions` lives in exactly one of the named sequences caught a real regression for me — three functions had been added on master but missed during the move. Without that test you can drop functions silently. Strongly recommend adding it as part of the PR. **5. No need for a parallel `functionCategories` map in OSS.** Since OSS doesn't have telemetry / docs-generation consumers today, the named sequences alone are enough. If a consumer shows up later, deriving a category map from the named sequences is ~10 lines. Happy to send a PR aligned to the body of this issue — the only thing I want maintainer input on first is the tradeoff in (5): include the `functionCategories` map upfront for forks/downstream to consume, or keep the change minimal and let it be added on demand? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
