mapleFU opened a new issue, #41923:
URL: https://github.com/apache/arrow/issues/41923
### Describe the enhancement requested
ExecBatch says it's a `array` or `scalar` for execution:
```
/// \brief A unit of work for kernel execution. It contains a collection of
/// Array and Scalar values and an optional SelectionVector indicating that
/// there is an unmaterialized filter that either must be materialized, or
(if
/// the kernel supports it) pushed down into the kernel implementation.
///
/// ExecBatch is semantically similar to RecordBatch in that in a SQL context
/// it represents a collection of records, but constant "columns" are
/// represented by Scalar values rather than having to be converted into
arrays
/// with repeated values.
///
/// TODO: Datum uses arrow/util/variant.h which may be a bit heavier-weight
/// than is desirable for this class. Microbenchmarks would help determine
for
/// sure. See ARROW-8928.
/// \addtogroup acero-internals
/// @{
struct ARROW_EXPORT ExecBatch {
```
However, some code implicit shows that it might contains chunked array, see:
```
Datum WrapResults(const std::vector<Datum>& inputs,
const std::vector<Datum>& outputs) override {
// If execution yielded multiple chunks (because large arrays were split
// based on the ExecContext parameters, then the result is a ChunkedArray
if (HaveChunkedArray(inputs) || outputs.size() > 1) {
return ToChunkedArray(outputs, output_type_);
} else {
// Outputs have just one element
return outputs[0];
}
}
```
And:
```
ExecBatch ExecBatch::Slice(int64_t offset, int64_t length) const {
ExecBatch out = *this;
for (auto& value : out.values) {
if (value.is_scalar()) {
// keep value as is
} else if (value.is_array()) {
value = value.array()->Slice(offset, length);
} else if (value.is_chunked_array()) {
value = value.chunked_array()->Slice(offset, length);
} else {
ARROW_DCHECK(false);
}
}
out.length = std::min(length, this->length - offset);
return out;
}
```
If the input contains chunked array, the problem might happens here:
```
bool all_scalar = true;
for (size_t i = 0; i < arguments.size(); ++i) {
ARROW_ASSIGN_OR_RAISE(
arguments[i], ExecuteScalarExpression(call->arguments[i], input,
exec_context));
if (arguments[i].is_array()) {
all_scalar = false;
}
}
int64_t input_length;
if (!arguments.empty() && all_scalar) {
// all inputs are scalar, so use a 1-long batch to avoid
// computing input.length equivalent outputs
input_length = 1;
} else {
input_length = input.length;
}
```
The `arguments[i].is_array()` flag should be `is_scalar()` if non-array
element could be in here.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]