adriangb opened a new pull request, #21965:
URL: https://github.com/apache/datafusion/pull/21965

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   When you point `CREATE EXTERNAL TABLE` at an empty directory (or one that 
does not exist yet) without specifying an explicit column list, DataFusion 
silently creates a table with **0 columns**. Any query against that table then 
fails with a confusing "column not found" / "no such column" error that gives 
no hint that the underlying issue is actually that schema inference had nothing 
to look at.
   
   This is the same root cause as the discussion on 
https://github.com/apache/datafusion/pull/21806#issuecomment-4355371528 — that 
thread covered it from the angle of benchmark runners hitting it, but the 
confusion is not specific to benchmarks. Failing at `CREATE EXTERNAL TABLE` 
time with a clear, actionable message seemed like the right fix overall.
   
   ## What changes are included in this PR?
   
   `ListingOptions::infer_schema` now returns a `Plan` error when the location 
yields no files (after the existing 0-byte filter), telling the user to either 
add data files or declare an explicit schema:
   
   ```
   Error during planning: No files found at file:///tmp/empty_dir/. Cannot 
infer schema from an empty location; either add data files or declare an 
explicit schema for the table.
   ```
   
   Pre-declaring an empty table with an explicit schema (e.g. `CREATE EXTERNAL 
TABLE t(x int) STORED AS PARQUET LOCATION '...'` for later `INSERT`) still 
works — the inference path is only triggered when no schema is provided.
   
   ## Are these changes tested?
   
   Yes. New cases in `datafusion/sqllogictest/test_files/ddl.slt` cover:
   - Parquet, CSV, and JSON over an empty location without an explicit schema → 
all return the new `Plan` error.
   - An empty location with an explicit schema → still works and queries 
cleanly.
   - Schema inference still succeeds once files exist at the location, so the 
new check does not regress the happy path.
   
   ## Are there any user-facing changes?
   
   Yes — `CREATE EXTERNAL TABLE ... LOCATION '<empty-dir>'` without an explicit 
schema now errors at planning time instead of creating a 0-column table. Anyone 
relying on the previous behavior must add an explicit schema declaration. The 
error message tells them how.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to