andygrove commented on PR #21508: URL: https://github.com/apache/datafusion/pull/21508#issuecomment-4229845772
@shehabgamin @parthchandra @comphead PTAL. I made some improvements based on
feedback so far.
PR Review Feedback
- Fixed SQL translation bugs raised by @parthchandra:
- Handle escaped quotes in string literals ('Andy''s'::TYPE)
- Handle commas inside quoted strings in arrow_cast() parsing
- Skip unsupported cast types instead of passing them through and hoping
for the best
- Added type mappings per @shehabgamin: Utf8View→STRING, LargeUtf8→STRING,
BinaryView→BINARY, LargeBinary→BINARY
- Added Decimal Arrow type handling: Decimal32/64/128/256(p,s) →
DECIMAL(p,s) with precision validation
Multi-version CI
- Changed workflow from single PySpark version to a matrix strategy testing
3.4.4, 3.5.8, and 4.1.1
- Added version-conditional known-failures syntax ([spark>=4.0]
math/abs.slt) so entries can target specific Spark versions
- Added Spark runtime version detection (spark_version())
Unresolved Function Handling
- UNRESOLVED_ROUTINE errors from PySpark are now skipped instead of failed —
these indicate functions not available in the current Spark version (e.g.,
bitmap_bit_position in 3.4, try_parse_url in 3.5), not bugs
Code Cleanup
- Removed dead UNSUPPORTED_ARROW_TYPES empty set and its check
- Pre-compiled Decimal regex at module level
- Extracted _parse_version() helper
- Fixed in_quote type from bool|str to Optional[str]
- Decoupled known-failures loading from version resolution
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
