jiayuasu opened a new pull request, #2952:
URL: https://github.com/apache/sedona/pull/2952

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format \`[GH-XXX] my subject\`. Closes 
#2927.
   
   ## What changes were proposed in this PR?
   
   Spark's \`Cast.canCast\` rejects arbitrary UDT-to-UDT casts, so today users 
have to call \`ST_Box2D(geom)\` / \`ST_GeomFromBox2D(box)\` explicitly. This PR 
teaches Catalyst to handle the conversions as ordinary SQL casts:
   
   - SQL \`CAST(geom AS box2d)\` and \`CAST(box AS geometry)\` parse and 
execute.
   - DataFrame \`col.cast(Box2DUDT)\` and \`col.cast(GeometryUDT())\` work the 
same way.
   
   ### How it works
   
   A new analyzer rule \`Box2DCastResolutionRule\` rewrites Cast nodes during 
resolution, before \`CheckAnalysis\` runs:
   
   | Cast                            | Rewritten to                  |
   | ------------------------------- | ----------------------------- |
   | \`Cast(geom, Box2DUDT)\`          | \`ST_Box2D(geom)\`              |
   | \`Cast(box, GeometryUDT)\`        | \`ST_GeomFromBox2D(box)\`       |
   
   Because the rewrite happens before \`CheckAnalysis\`, Catalyst never 
observes the rejected Cast — the downstream optimizer and codegen path see the 
same expression tree that the explicit \`ST_Box2D(geom)\` / 
\`ST_GeomFromBox2D(box)\` constructs already produce. No new code-gen path, no 
\`canCast\` change, no extra null/eval handling.
   
   The rule is registered via \`SparkSessionExtensions.injectResolutionRule\` 
from \`SedonaSqlExtensions\`, so it activates whenever users wire Sedona 
through \`spark.sql.extensions=org.apache.sedona.sql.SedonaSqlExtensions\` (the 
standard install).
   
   ### SQL type keyword
   
   \`SedonaSqlAstBuilder.visitPrimitiveDataType\` already recognized 
\`GEOMETRY\` as a type keyword. This PR adds \`BOX2D\` alongside it, across all 
supported Spark versions (3.4 / 3.5 / 4.0 / 4.1). Without it, the SQL parser 
would reject \`CAST(... AS box2d)\` before any analyzer rule could fire.
   
   ### Scope notes
   
   Implicit type coercion in function dispatch (e.g. passing a Geometry 
directly into a Box2D-typed function argument without an explicit cast) is 
intentionally out of scope here — it requires hooking into Catalyst's type 
coercion rules and is tracked as a follow-up.
   
   ## How was this patch tested?
   
   - \`Box2DCastResolutionRuleSuite\` (spark/common): unit test of the rewrite 
for both directions, plus a no-op assertion for unrelated casts.
   - \`Box2DCastSuite\` (spark-3.5): end-to-end tests covering SQL \`CAST\` 
(both directions), DataFrame \`.cast(Box2DUDT)\` / \`.cast(GeometryUDT())\`, 
the round-trip \`Geometry → Box2D → Geometry\`, and NULL propagation.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public SQL API documentation surface in 
isolation. The new cast syntax is covered by the consolidated Phase 1+2+3 Box2D 
docs update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to