Copilot commented on code in PR #21701:
URL: https://github.com/apache/datafusion/pull/21701#discussion_r3101391479
##########
datafusion/expr/Cargo.toml:
##########
@@ -63,6 +63,11 @@ serde_json = { workspace = true }
sqlparser = { workspace = true, optional = true }
[dev-dependencies]
+criterion = { version = "0.5" }
Review Comment:
`criterion` is pinned to version 0.5 here, but the workspace already defines
`criterion = "0.8"` and other DataFusion crates use `criterion = { workspace =
true }`. Pinning 0.5 introduces a second Criterion major version into the build
(see Cargo.lock), increasing compile time and dependency surface. Use the
workspace dependency instead (and add any needed features on top) to keep the
workspace consistent.
```suggestion
criterion = { workspace = true }
```
##########
datafusion/expr/benches/map_expressions.rs:
##########
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Micro-benchmark for `LogicalPlan::map_expressions` on Extension nodes.
+//!
+//! Extension nodes can have many children but no expressions. When
+//! `expressions()` returns empty, `map_expressions` should skip the
+//! expensive clone-all-inputs + `with_exprs_and_inputs` rebuild.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
+use datafusion_common::tree_node::{Transformed, TreeNode};
Review Comment:
`use datafusion_common::tree_node::{Transformed, TreeNode};` imports
`TreeNode`, but it isn't referenced in this benchmark (since `map_expressions`
is an inherent method on `LogicalPlan`). Dropping the unused import avoids
warnings and keeps the bench focused on the required dependencies.
```suggestion
use datafusion_common::tree_node::Transformed;
```
##########
datafusion/expr/benches/map_expressions.rs:
##########
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Micro-benchmark for `LogicalPlan::map_expressions` on Extension nodes.
+//!
+//! Extension nodes can have many children but no expressions. When
+//! `expressions()` returns empty, `map_expressions` should skip the
+//! expensive clone-all-inputs + `with_exprs_and_inputs` rebuild.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
+use datafusion_common::tree_node::{Transformed, TreeNode};
+use datafusion_common::{DFSchema, DFSchemaRef, Result};
+use datafusion_expr::{col, Extension, Expr, LogicalPlan,
UserDefinedLogicalNodeCore};
+use std::fmt;
+use std::hash::{Hash, Hasher};
+use std::sync::Arc;
+
+// ---------------------------------------------------------------------------
+// Extension node with NO expressions (e.g. OneOf in view matching)
+// ---------------------------------------------------------------------------
+
+#[derive(Debug, Clone)]
+struct NoExprExtension {
+ children: Vec<LogicalPlan>,
+ schema: DFSchemaRef,
+}
+
+impl PartialEq for NoExprExtension {
+ fn eq(&self, _: &Self) -> bool {
+ false
+ }
+}
+impl Eq for NoExprExtension {}
+
+impl Hash for NoExprExtension {
+ fn hash<H: Hasher>(&self, _: &mut H) {}
+}
+
+impl PartialOrd for NoExprExtension {
+ fn partial_cmp(&self, _: &Self) -> Option<std::cmp::Ordering> {
+ None
Review Comment:
Both `PartialEq::eq` implementations return `false` unconditionally, but the
types also implement `Eq`. This violates the `Eq` contract (reflexivity: `a ==
a` must be true) and can break any code that relies on equality/hashing for
extension nodes (e.g., HashMap/HashSet usage, plan comparisons) during the
benchmark run. Implement a reflexive `PartialEq` (and make `Hash`/`PartialOrd`
consistent with it), even if it is only based on a small stable subset of
fields.
```suggestion
fn eq(&self, other: &Self) -> bool {
self.children.len() == other.children.len()
}
}
impl Eq for NoExprExtension {}
impl Hash for NoExprExtension {
fn hash<H: Hasher>(&self, state: &mut H) {
self.children.len().hash(state);
}
}
impl PartialOrd for NoExprExtension {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.children.len().cmp(&other.children.len()))
```
##########
datafusion/expr/benches/map_expressions.rs:
##########
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Micro-benchmark for `LogicalPlan::map_expressions` on Extension nodes.
+//!
+//! Extension nodes can have many children but no expressions. When
+//! `expressions()` returns empty, `map_expressions` should skip the
+//! expensive clone-all-inputs + `with_exprs_and_inputs` rebuild.
+
+use arrow::datatypes::{DataType, Field, Schema};
+use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion};
+use datafusion_common::tree_node::{Transformed, TreeNode};
+use datafusion_common::{DFSchema, DFSchemaRef, Result};
+use datafusion_expr::{col, Extension, Expr, LogicalPlan,
UserDefinedLogicalNodeCore};
+use std::fmt;
+use std::hash::{Hash, Hasher};
+use std::sync::Arc;
+
+// ---------------------------------------------------------------------------
+// Extension node with NO expressions (e.g. OneOf in view matching)
+// ---------------------------------------------------------------------------
+
+#[derive(Debug, Clone)]
+struct NoExprExtension {
+ children: Vec<LogicalPlan>,
+ schema: DFSchemaRef,
+}
+
+impl PartialEq for NoExprExtension {
+ fn eq(&self, _: &Self) -> bool {
+ false
+ }
+}
+impl Eq for NoExprExtension {}
+
+impl Hash for NoExprExtension {
+ fn hash<H: Hasher>(&self, _: &mut H) {}
+}
+
+impl PartialOrd for NoExprExtension {
+ fn partial_cmp(&self, _: &Self) -> Option<std::cmp::Ordering> {
+ None
+ }
+}
+
+impl UserDefinedLogicalNodeCore for NoExprExtension {
+ fn name(&self) -> &str {
+ "NoExprExtension"
+ }
+ fn inputs(&self) -> Vec<&LogicalPlan> {
+ self.children.iter().collect()
+ }
+ fn schema(&self) -> &DFSchemaRef {
+ &self.schema
+ }
+ fn expressions(&self) -> Vec<Expr> {
+ vec![] // Key: no expressions
+ }
+ fn fmt_for_explain(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "NoExprExtension(children={})", self.children.len())
+ }
+ fn with_exprs_and_inputs(
+ &self,
+ _: Vec<Expr>,
+ inputs: Vec<LogicalPlan>,
+ ) -> Result<Self> {
+ Ok(Self {
+ children: inputs,
+ schema: Arc::clone(&self.schema),
+ })
+ }
+}
+
+// ---------------------------------------------------------------------------
+// Extension node WITH expressions (control group)
+// ---------------------------------------------------------------------------
+
+#[derive(Debug, Clone)]
+struct WithExprExtension {
+ children: Vec<LogicalPlan>,
+ exprs: Vec<Expr>,
+ schema: DFSchemaRef,
+}
+
+impl PartialEq for WithExprExtension {
+ fn eq(&self, _: &Self) -> bool {
+ false
+ }
+}
+impl Eq for WithExprExtension {}
+
Review Comment:
Same issue as above: `PartialEq::eq` always returning `false` while also
implementing `Eq` violates the `Eq` contract. Please make `PartialEq` reflexive
and keep `Hash`/`PartialOrd` consistent for `WithExprExtension` as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]