kosiew opened a new issue, #22818:
URL: https://github.com/apache/datafusion/issues/22818

   ## Summary
   
   DataFusion currently keeps its own SQL dialect metadata in several separate 
places that must stay manually synchronized:
   
   - `datafusion/common/src/config.rs`
     - `Dialect` enum variants
     - `Dialect::AVAILABLE`
     - `impl AsRef<str> for Dialect`
     - `impl FromStr for Dialect`
   
   This creates drift risk whenever a dialect is added, renamed, aliased, or 
removed. Keep this refactor focused on DataFusion's configuration metadata; do 
not move SQL parser construction into `datafusion-common`.
   
   ## Problem
   
   Adding a new dialect requires editing multiple independent lists/matches:
   
   1. Add a `Dialect` enum variant.
   2. Add the human-readable name to `Dialect::AVAILABLE`.
   3. Add canonical serialization in `AsRef<str>` / `Display`.
   4. Add config parsing and aliases in `FromStr`.
   5. Update error messages and docs generated from config metadata.
   6. Separately verify, in an SQL-enabled crate if needed, that canonical 
names remain accepted by `sqlparser::dialect::dialect_from_str`.
   
   If any step is missed, users can see inconsistent behavior. Examples of 
possible drift:
   
   - A dialect appears in an error message but is not accepted by config 
parsing.
   - A dialect is accepted by `FromStr` but omitted from `Dialect::AVAILABLE`.
   - Aliases are accepted by DataFusion config but not covered by tests.
   - CLI/core error messages drift because they all rely on the same manually 
maintained available-dialect string.
   
   The recent Spark dialect addition is correct, but it shows the maintenance 
cost: `Spark` needed updates in the enum, `AVAILABLE`, `AsRef`, `FromStr`, 
docs/config output, and downstream parser error messages.
   
   ## Proposed direction
   
   Introduce one central dialect metadata source in `datafusion-common`, then 
derive the public behavior from it.
   
   Possible shape:
   
   ```rust
   struct DialectInfo {
       dialect: Dialect,
       canonical_name: &'static str,
       display_name: &'static str,
       aliases: &'static [&'static str],
   }
   
   const DIALECTS: &[DialectInfo] = &[
       DialectInfo {
           dialect: Dialect::Generic,
           canonical_name: "generic",
           display_name: "Generic",
           aliases: &[],
       },
       // ...
   ];
   ```
   
   Then implement:
   
   - `Dialect::as_ref()` from `canonical_name`
   - `Display` from `canonical_name`
   - `FromStr` from `canonical_name` + `aliases`
   - `Dialect::AVAILABLE` or `Dialect::available()` from `display_name`
   
   Keep parser construction outside `datafusion-common`. In particular, do not 
add a direct `sqlparser` dependency to `datafusion-common` for this refactor.
   
   ## Acceptance criteria
   
   - Dialect names, aliases, display names, and canonical serialized values are 
defined in one place in `datafusion-common`.
   - `Dialect::from_str`, `Dialect::as_ref`, and supported-dialect error 
messages are generated from that source.
   - Existing accepted values continue to work, including aliases such as 
`postgres` and `sparksql`.
   - Error messages remain clear and include all supported dialects.
   - Existing config docs output remains equivalent except for intentional 
formatting changes.
   - No new `sqlparser` dependency is added to `datafusion-common`.
   - Tests cover:
     - every `Dialect` variant round-trips through `as_ref()` / `Display` / 
`FromStr`
     - documented aliases parse to the expected variant
     - available-dialect listing includes all variants exactly once
   
   ## Suggested tests
   
   Add unit tests in `datafusion/common/src/config.rs`:
   
   - Iterate over the central dialect table and assert canonical names parse 
back to the same variant.
   - Assert every alias parses to the expected variant.
   - Assert `Dialect::available()` or equivalent contains each display name 
once.
   
   Optionally add an integration-level test in a crate that already depends on 
SQL parser support to ensure every canonical `Dialect::as_ref()` string 
accepted by DataFusion is also accepted by 
`sqlparser::dialect::dialect_from_str`. Do not add that dependency to 
`datafusion-common`.
   
   ## Related PR
   #22529 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to