kosiew commented on code in PR #21402:
URL: https://github.com/apache/datafusion/pull/21402#discussion_r3136075495
##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -642,20 +659,34 @@ impl InListExpr {
/// Create a new InList expression directly from an array, bypassing
expression evaluation.
///
- /// This is more efficient than `in_list()` when you already have the list
as an array,
- /// as it avoids the conversion: `ArrayRef -> Vec<PhysicalExpr> ->
ArrayRef -> StaticFilter`.
- /// Instead it goes directly: `ArrayRef -> StaticFilter`.
+ /// This is more efficient than [`InListExpr::try_new`] when you already
have the list
+ /// as an array, as it builds the static filter directly from the array
instead of
+ /// reconstructing an intermediate array from literal expressions.
+ ///
+ /// The `list` field is populated with literal expressions extracted from
+ /// the array, and the array is used to build a static filter for
+ /// efficient set membership evaluation.
///
- /// The `list` field will be empty when using this constructor, as the
array is stored
- /// directly in the static filter.
+ /// The `array` may be dictionary-encoded — it will be flattened to its
+ /// value type such that specialized filters are used.
///
- /// This does not make the expression any more performant at runtime, but
it does make it slightly
- /// cheaper to build.
+ /// Returns an error if the expression's data type and the array's data
type
+ /// are not logically equal. Null arrays are always accepted.
pub fn try_new_from_array(
Review Comment:
I noticed that `try_new_from_array` now does its own logical type
validation, but `try_new` still has a very similar check a few lines below.
Would it make sense to extract this into a small helper so both constructors
share the same validation path and error message? That might help avoid subtle
drift in the future, especially around dictionary or logical equality handling.
##########
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs:
##########
Review Comment:
Small cohesion nit: this comment still mentions an `in_list_from_array()`
helper, but the code now calls `InListExpr::try_new_from_array(...)` directly.
Updating the wording so it matches the current API would make it easier to
follow when tracing the dynamic filter path.
##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -4045,43 +4080,182 @@ mod tests {
Ok(())
}
+ fn make_int32_dict_array(values: Vec<Option<i32>>) -> ArrayRef {
+ let mut builder = PrimitiveDictionaryBuilder::<Int8Type,
Int32Type>::new();
+ for v in values {
+ match v {
+ Some(val) => builder.append_value(val),
+ None => builder.append_null(),
+ }
+ }
+ Arc::new(builder.finish())
+ }
+
+ fn make_f64_dict_array(values: Vec<Option<f64>>) -> ArrayRef {
+ let mut builder = PrimitiveDictionaryBuilder::<Int8Type,
Float64Type>::new();
+ for v in values {
+ match v {
+ Some(val) => builder.append_value(val),
+ None => builder.append_null(),
+ }
+ }
+ Arc::new(builder.finish())
+ }
+
+ #[test]
+ fn test_try_new_from_array_dict_haystack_int32() -> Result<()> {
Review Comment:
Nice coverage here for primitive, string, and float dictionary haystacks.
One gap I still see is the multi-column or `StructArray` path, which is what
`shared_bounds.rs` builds for composite hash-join filters.
It could be worth adding a focused `try_new_from_array` test with a struct
haystack to cover the exact dynamic filter path this PR is enabling.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]