kosiew commented on code in PR #21402:
URL: https://github.com/apache/datafusion/pull/21402#discussion_r3136075495


##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -642,20 +659,34 @@ impl InListExpr {
 
     /// Create a new InList expression directly from an array, bypassing 
expression evaluation.
     ///
-    /// This is more efficient than `in_list()` when you already have the list 
as an array,
-    /// as it avoids the conversion: `ArrayRef -> Vec<PhysicalExpr> -> 
ArrayRef -> StaticFilter`.
-    /// Instead it goes directly: `ArrayRef -> StaticFilter`.
+    /// This is more efficient than [`InListExpr::try_new`] when you already 
have the list
+    /// as an array, as it builds the static filter directly from the array 
instead of
+    /// reconstructing an intermediate array from literal expressions.
+    ///
+    /// The `list` field is populated with literal expressions extracted from
+    /// the array, and the array is used to build a static filter for
+    /// efficient set membership evaluation.
     ///
-    /// The `list` field will be empty when using this constructor, as the 
array is stored
-    /// directly in the static filter.
+    /// The `array` may be dictionary-encoded — it will be flattened to its
+    /// value type such that specialized filters are used.
     ///
-    /// This does not make the expression any more performant at runtime, but 
it does make it slightly
-    /// cheaper to build.
+    /// Returns an error if the expression's data type and the array's data 
type
+    /// are not logically equal. Null arrays are always accepted.
     pub fn try_new_from_array(

Review Comment:
   I noticed that `try_new_from_array` now does its own logical type 
validation, but `try_new` still has a very similar check a few lines below. 
   Would it make sense to extract this into a small helper so both constructors 
share the same validation path and error message? That might help avoid subtle 
drift in the future, especially around dictionary or logical equality handling.



##########
datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs:
##########


Review Comment:
   Small cohesion nit: this comment still mentions an `in_list_from_array()` 
helper, but the code now calls `InListExpr::try_new_from_array(...)` directly. 
   
   Updating the wording so it matches the current API would make it easier to 
follow when tracing the dynamic filter path.



##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -4045,43 +4080,182 @@ mod tests {
         Ok(())
     }
 
+    fn make_int32_dict_array(values: Vec<Option<i32>>) -> ArrayRef {
+        let mut builder = PrimitiveDictionaryBuilder::<Int8Type, 
Int32Type>::new();
+        for v in values {
+            match v {
+                Some(val) => builder.append_value(val),
+                None => builder.append_null(),
+            }
+        }
+        Arc::new(builder.finish())
+    }
+
+    fn make_f64_dict_array(values: Vec<Option<f64>>) -> ArrayRef {
+        let mut builder = PrimitiveDictionaryBuilder::<Int8Type, 
Float64Type>::new();
+        for v in values {
+            match v {
+                Some(val) => builder.append_value(val),
+                None => builder.append_null(),
+            }
+        }
+        Arc::new(builder.finish())
+    }
+
+    #[test]
+    fn test_try_new_from_array_dict_haystack_int32() -> Result<()> {

Review Comment:
   Nice coverage here for primitive, string, and float dictionary haystacks. 
One gap I still see is the multi-column or `StructArray` path, which is what 
`shared_bounds.rs` builds for composite hash-join filters. 
   It could be worth adding a focused `try_new_from_array` test with a struct 
haystack to cover the exact dynamic filter path this PR is enabling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to