truffle-dev commented on issue #22506:
URL: https://github.com/apache/datafusion/issues/22506#issuecomment-4582934307
Did some empirical poking at Postgres 16.13 and DuckDB v1.4.1 since the
question was what other systems do.
### Postgres: first-occurrence anchors, implicit cast everywhere else
Postgres walks the AST, picks the type of `$N`'s first concrete-context
occurrence, then requires every other occurrence to find an implicit cast or
operator that lets the anchored type interact with the other side. If no such
operator exists, PREPARE fails at planning time.
**In-family widening (string types):**
```sql
PREPARE qA AS SELECT $1 = s /* varchar */, $1 = x /* text */ FROM t, t_text;
PREPARE qB AS SELECT $1 = x /* text */, $1 = s /* varchar */ FROM t_text, t;
SELECT parameter_types FROM pg_prepared_statements WHERE name IN ('qA','qB');
-- both: {text}
```
varchar and text are binary-coercible, so both orderings resolve to
`{text}`. Looks like "widening" but it's anchor + cast where the cast is no-op.
**Numeric widening, order-dependent:**
```sql
PREPARE qC AS SELECT $1 = a.n /* int */, $1 = b.n /* bigint */; --
{integer}
PREPARE qD AS SELECT $1 = b.n /* bigint */, $1 = a.n /* int */; --
{bigint}
```
First occurrence anchors; second occurrence gets the implicit int/bigint
cast.
**Cross-family conflict (RatulDawar's example):**
```sql
PREPARE q AS SELECT $1 = s FROM t JOIN t2 ON t2.id = $1;
-- ERROR: operator does not exist: integer = character varying
```
Anchors to varchar from `$1 = s`, then `t2.id = $1` needs `integer =
character varying` operator. None exists. PREPARE fails at parse time, not
EXECUTE.
**Pure UNKNOWN defaults to text:**
```sql
PREPARE q AS SELECT $1; -- {text}
PREPARE q AS SELECT $1 = NULL FROM t; -- {text}
```
### DuckDB: defer the conflict to EXECUTE
```sql
PREPARE q AS SELECT $1 = s /* varchar */ FROM t JOIN t2 ON t2.id = $1;
EXECUTE q(1);
-- Conversion Error: Could not convert string 'a' to INT32 when casting from
source column s
```
DuckDB picks int at PREPARE time and only errors when casting varchar row
data to int. Less predictable for the user, since PREPARE looks fine and the
failure mode depends on the row contents.
### How this maps onto the issue
@cetra3's failing case (`$1 = 'x'` Utf8 and `$1 = s` Utf8View) parallels
postgres's varchar+text test. Postgres resolves to the wider string type for
both orderings, which is the `comparison_coercion`-equivalent result.
@RatulDawar's instinct is correct for the in-family case.
@RatulDawar's conflict example (`$1 = s` varchar joined with `t2.id = $1`
int) is the cross-family case. Postgres errors at PREPARE. DuckDB defers it to
EXECUTE. Either is defensible. Which one datafusion picks depends on whether
you want the same coercion rules at the placeholder site as at every other
binary-comparison site (see the recommendation below).
### Subtlety on the explicit-cast escape hatch
One thing worth knowing when documenting the user-side workaround: in
postgres, casting only the parameter is not sufficient when the column type
itself is incompatible.
```sql
PREPARE q AS SELECT $1::int = s FROM t JOIN t2 ON t2.id = $1::int;
-- ERROR: operator does not exist: integer = character varying
```
`s` is still varchar and `int = varchar` has no implicit operator, so the
cast on `$1` alone does not save the comparison. Users have to either cast the
column side too (`s::int`) or cast `t2.id` to varchar so the parameter can stay
in the varchar family. Worth surfacing in whatever error hint datafusion emits
if it adopts the error-at-plan model.
### Suggested shape for the fix
For the immediate case in this issue, replacing the strict-equality merge in
[`get_parameter_fields`](https://github.com/apache/datafusion/blob/2e7b8e1b2d3cd800823b3813f2f7efdde0d21af8/datafusion/expr/src/logical_plan/plan.rs#L1626-L1658)
with
[`comparison_coercion`](https://github.com/apache/datafusion/blob/2e7b8e1b2d3cd800823b3813f2f7efdde0d21af8/datafusion/expr-common/src/type_coercion/binary.rs#L924-L941)
looks like the smallest change that fixes cetra3's example and stays
consistent with how datafusion treats binary-comparison contexts elsewhere.
That gets the Utf8/Utf8View widening for free (via `string_coercion`), the
decimal/integer widening for free (via `binary_numeric_coercion`), and degrades
to `None` only when no coercion path exists in either direction.
The RatulDawar concern about Int + Utf8/Utf8View is worth pulling apart from
the placeholder issue. Today `comparison_coercion` is permissive about
string-to-numeric pairs (the [`string_numeric_coercion`
arm](https://github.com/apache/datafusion/blob/2e7b8e1b2d3cd800823b3813f2f7efdde0d21af8/datafusion/expr-common/src/type_coercion/binary.rs#L936)),
so `comparison_coercion(Int32, Utf8View)` returns `Some(Int32)` rather than
`None`. Using it for placeholder merge therefore gives DuckDB-shaped behavior
in that case: PREPARE succeeds, then EXECUTE either succeeds or fails depending
on whether row data parses as Int32. If the team wants postgres-shaped errors
at PREPARE for cross-family conflicts, that is a separate question about
`comparison_coercion`'s string-to-numeric policy across the whole project, not
just at the placeholder site.
So as a layered call: ship the `comparison_coercion` swap to unblock the
in-family case in this issue, and treat the strictness question as a follow-up
that probably wants a wider RFC since it touches every binary-comparison site,
not just placeholders.
For the "$N has zero concrete contexts" tail case (e.g. `SELECT $1;`),
postgres defaults to text. Defaulting to `Utf8` would match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]