jiayuasu opened a new issue, #2861:
URL: https://github.com/apache/sedona/issues/2861
## Description
`Catalog.scala` currently registers ~340 functions in a single flat
`Seq[FunctionDescription]`, with comments delimiting groups (`// Expression for
vectors`, `// Expression for rasters`, `// geom <-> geog conversion
functions`). The flat structure has two practical drawbacks:
1. **Categorization drifts.** The comment-based grouping is loose, and over
time predicates, accessors, and operations end up interleaved in the vector
section. New contributors don't have a clear hint about where to add a new
function.
2. **Hard to reuse the grouping.** Downstream consumers that want
category-level information (for telemetry buckets, docs generation, or registry
partitioning) have to maintain a parallel mapping that drifts from the
canonical list.
## Proposal
Split the flat `expressions` list into named category sequences and
concatenate them, so the file's structure encodes the categorization explicitly:
```scala
val stConstructorExprs: Seq[FunctionDescription] = Seq(...)
val stPredicateExprs: Seq[FunctionDescription] = Seq(...)
val stAccessorExprs: Seq[FunctionDescription] = Seq(...)
val stOperationExprs: Seq[FunctionDescription] = Seq(...)
val stSerializationExprs: Seq[FunctionDescription] = Seq(...)
val stIndexingExprs: Seq[FunctionDescription] = Seq(...)
val stJoinExprs: Seq[FunctionDescription] = Seq(...)
val stGeographyExprs: Seq[FunctionDescription] = Seq(...)
val otherExprs: Seq[FunctionDescription] = Seq(...)
val rsConstructorExprs: Seq[FunctionDescription] = Seq(...)
val rsAccessorExprs: Seq[FunctionDescription] = Seq(...)
val rsOperationExprs: Seq[FunctionDescription] = Seq(...)
val rsOutputExprs: Seq[FunctionDescription] = Seq(...)
override val expressions: Seq[FunctionDescription] =
stConstructorExprs ++ stGeographyExprs ++ stPredicateExprs ++
stAccessorExprs ++ stOperationExprs ++ stSerializationExprs ++
stIndexingExprs ++ stJoinExprs ++ otherExprs ++
rsConstructorExprs ++ rsAccessorExprs ++ rsOperationExprs ++
rsOutputExprs ++ geoStatsFunctions()
```
### Benefits
- **Explicit categorization** at the type/code level, not just in comments.
Adding a new function requires picking a category sequence, which is a much
clearer hint than "add it somewhere in this 340-line list".
- **Reusable for downstream needs.** Anyone wanting category-level
information (e.g., for usage telemetry buckets, docs generation, or selective
registration) can map over the named sequences directly.
- **Preserved registration order.** Concatenating in the same order as today
keeps registration semantics identical, so there is no behavior change.
### Non-goals
- No new functions, no removals, no signature changes. Pure code
organization.
- The category names are not part of any public API and can be tuned during
review.
## Categories (proposed)
| Category | Examples |
|----------|----------|
| `stConstructorExprs` | ST_Point, ST_GeomFromText, ST_MakeLine |
| `stGeographyExprs` | ST_GeogFromText, ST_GeogFromWKB, ST_GeomToGeography |
| `stPredicateExprs` | ST_Intersects, ST_Contains, ST_Within, ST_DWithin |
| `stAccessorExprs` | ST_Area, ST_Length, ST_Envelope, ST_X, ST_Y |
| `stOperationExprs` | ST_Buffer, ST_Union, ST_Transform, ST_Simplify |
| `stSerializationExprs` | ST_AsText, ST_AsGeoJSON, ST_GeoHash |
| `stIndexingExprs` | ST_H3CellIDs, ST_S2CellIDs, ST_BingTile |
| `stJoinExprs` | ST_KNN |
| `otherExprs` | ExpandAddress, ParseAddress, Barrier |
| `rsConstructorExprs` | RS_FromGeoTiff, RS_MakeRaster, RS_AsRaster |
| `rsAccessorExprs` | RS_Envelope, RS_Metadata, RS_Value |
| `rsOperationExprs` | RS_MapAlgebra, RS_Add, RS_Clip, RS_Tile |
| `rsOutputExprs` | RS_AsGeoTiff, RS_AsPNG, RS_AsBase64 |
## Backward compatibility
None affected. `expressions` is still a `Seq[FunctionDescription]` of the
same size and order; `registerAll` is unchanged.
I'd like to send a PR for this if there is interest. Happy to take feedback
on the category names and granularity before coding it up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]