mapleFU opened a new issue, #41923: URL: https://github.com/apache/arrow/issues/41923
### Describe the enhancement requested ExecBatch says it's a `array` or `scalar` for execution: ``` /// \brief A unit of work for kernel execution. It contains a collection of /// Array and Scalar values and an optional SelectionVector indicating that /// there is an unmaterialized filter that either must be materialized, or (if /// the kernel supports it) pushed down into the kernel implementation. /// /// ExecBatch is semantically similar to RecordBatch in that in a SQL context /// it represents a collection of records, but constant "columns" are /// represented by Scalar values rather than having to be converted into arrays /// with repeated values. /// /// TODO: Datum uses arrow/util/variant.h which may be a bit heavier-weight /// than is desirable for this class. Microbenchmarks would help determine for /// sure. See ARROW-8928. /// \addtogroup acero-internals /// @{ struct ARROW_EXPORT ExecBatch { ``` However, some code implicit shows that it might contains chunked array, see: ``` Datum WrapResults(const std::vector<Datum>& inputs, const std::vector<Datum>& outputs) override { // If execution yielded multiple chunks (because large arrays were split // based on the ExecContext parameters, then the result is a ChunkedArray if (HaveChunkedArray(inputs) || outputs.size() > 1) { return ToChunkedArray(outputs, output_type_); } else { // Outputs have just one element return outputs[0]; } } ``` And: ``` ExecBatch ExecBatch::Slice(int64_t offset, int64_t length) const { ExecBatch out = *this; for (auto& value : out.values) { if (value.is_scalar()) { // keep value as is } else if (value.is_array()) { value = value.array()->Slice(offset, length); } else if (value.is_chunked_array()) { value = value.chunked_array()->Slice(offset, length); } else { ARROW_DCHECK(false); } } out.length = std::min(length, this->length - offset); return out; } ``` If the input contains chunked array, the problem might happens here: ``` bool all_scalar = true; for (size_t i = 0; i < arguments.size(); ++i) { ARROW_ASSIGN_OR_RAISE( arguments[i], ExecuteScalarExpression(call->arguments[i], input, exec_context)); if (arguments[i].is_array()) { all_scalar = false; } } int64_t input_length; if (!arguments.empty() && all_scalar) { // all inputs are scalar, so use a 1-long batch to avoid // computing input.length equivalent outputs input_length = 1; } else { input_length = input.length; } ``` The `arguments[i].is_array()` flag should be `is_scalar()` if non-array element could be in here. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org