rdblue commented on code in PR #9176:
URL: https://github.com/apache/iceberg/pull/9176#discussion_r1468947657
##########
api/src/main/java/org/apache/iceberg/expressions/ValueAggregate.java:
##########
@@ -30,13 +30,16 @@ protected ValueAggregate(Operation op, BoundTerm<T> term) {
@Override
public T eval(StructLike struct) {
- return term().eval(struct);
+ if (struct.size() > 1) {
+ throw new UnsupportedOperationException("Expected struct like of size
1");
+ }
+
+ return (T) struct.get(0, term().type().typeId().javaClass());
}
@Override
public T eval(DataFile file) {
- valueStruct.setValue(evaluateRef(file));
- return term().eval(valueStruct);
+ return (T) evaluateRef(file);
Review Comment:
The case that I think is missing is when the term is not a simple reference
and is instead a transformed value. That's why the `SingleValueStruct` class is
here. The idea is to evaluate the original term (which could be a transform of
a field) on the value that was retrieved from the `DataFile` metadata.
Also, it's good to keep in mind what each `eval` method is for. The
`eval(StructLike)` method is for evaluating the aggregate on individual rows.
That's because this is generic and we can use the aggregate framework to
calculate aggregates of normal rows. That means the change to that method isn't
correct. That eval method should call the accessor that was bound to the schema
of incoming rows.
This eval method, `eval(DataFile)` is called to produce a value for a set of
rows. As I said, that value may be then transformed by the expression so we
need to run the expression term on it. To be able to run the term, we need a
row that can supply the value that came from the `DataFile` and that's where
`SingleValueStruct` comes in: it returns the `DataFile` value for _any_
position that is queried. It should work with any accessor. The problem is that
I didn't consider nested cases when writing it. It always returns the value,
but it should return itself if the caller expected the value to be nested.
Here's an alternative fix for `SingleValueStruct` that keeps the original
`eval` implementations but handles nested values:
```java
public <T> T get(int pos, Class<T> javaClass) {
if (javaClass.isAssignableFrom(StructLike.class)) {
return (T) this;
} else {
return (T) value;
}
}
```
The tests from this PR pass with that change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]