jiayuasu opened a new pull request, #2952: URL: https://github.com/apache/sedona/pull/2952
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format \`[GH-XXX] my subject\`. Closes #2927. ## What changes were proposed in this PR? Spark's \`Cast.canCast\` rejects arbitrary UDT-to-UDT casts, so today users have to call \`ST_Box2D(geom)\` / \`ST_GeomFromBox2D(box)\` explicitly. This PR teaches Catalyst to handle the conversions as ordinary SQL casts: - SQL \`CAST(geom AS box2d)\` and \`CAST(box AS geometry)\` parse and execute. - DataFrame \`col.cast(Box2DUDT)\` and \`col.cast(GeometryUDT())\` work the same way. ### How it works A new analyzer rule \`Box2DCastResolutionRule\` rewrites Cast nodes during resolution, before \`CheckAnalysis\` runs: | Cast | Rewritten to | | ------------------------------- | ----------------------------- | | \`Cast(geom, Box2DUDT)\` | \`ST_Box2D(geom)\` | | \`Cast(box, GeometryUDT)\` | \`ST_GeomFromBox2D(box)\` | Because the rewrite happens before \`CheckAnalysis\`, Catalyst never observes the rejected Cast — the downstream optimizer and codegen path see the same expression tree that the explicit \`ST_Box2D(geom)\` / \`ST_GeomFromBox2D(box)\` constructs already produce. No new code-gen path, no \`canCast\` change, no extra null/eval handling. The rule is registered via \`SparkSessionExtensions.injectResolutionRule\` from \`SedonaSqlExtensions\`, so it activates whenever users wire Sedona through \`spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions\` (the standard install). ### SQL type keyword \`SedonaSqlAstBuilder.visitPrimitiveDataType\` already recognized \`GEOMETRY\` as a type keyword. This PR adds \`BOX2D\` alongside it, across all supported Spark versions (3.4 / 3.5 / 4.0 / 4.1). Without it, the SQL parser would reject \`CAST(... AS box2d)\` before any analyzer rule could fire. ### Scope notes Implicit type coercion in function dispatch (e.g. passing a Geometry directly into a Box2D-typed function argument without an explicit cast) is intentionally out of scope here — it requires hooking into Catalyst's type coercion rules and is tracked as a follow-up. ## How was this patch tested? - \`Box2DCastResolutionRuleSuite\` (spark/common): unit test of the rewrite for both directions, plus a no-op assertion for unrelated casts. - \`Box2DCastSuite\` (spark-3.5): end-to-end tests covering SQL \`CAST\` (both directions), DataFrame \`.cast(Box2DUDT)\` / \`.cast(GeometryUDT())\`, the round-trip \`Geometry → Box2D → Geometry\`, and NULL propagation. ## Did this PR include necessary documentation updates? - No, this PR does not affect any public SQL API documentation surface in isolation. The new cast syntax is covered by the consolidated Phase 1+2+3 Box2D docs update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
