mapleFU opened a new issue, #41923:
URL: https://github.com/apache/arrow/issues/41923

   ### Describe the enhancement requested
   
   ExecBatch says it's a `array` or `scalar` for execution:
   
   ```
   /// \brief A unit of work for kernel execution. It contains a collection of
   /// Array and Scalar values and an optional SelectionVector indicating that
   /// there is an unmaterialized filter that either must be materialized, or 
(if
   /// the kernel supports it) pushed down into the kernel implementation.
   ///
   /// ExecBatch is semantically similar to RecordBatch in that in a SQL context
   /// it represents a collection of records, but constant "columns" are
   /// represented by Scalar values rather than having to be converted into 
arrays
   /// with repeated values.
   ///
   /// TODO: Datum uses arrow/util/variant.h which may be a bit heavier-weight
   /// than is desirable for this class. Microbenchmarks would help determine 
for
   /// sure. See ARROW-8928.
   
   /// \addtogroup acero-internals
   /// @{
   
   struct ARROW_EXPORT ExecBatch {
   ```
   
   However, some code implicit shows that it might contains chunked array, see:
   
   ```
     Datum WrapResults(const std::vector<Datum>& inputs,
                       const std::vector<Datum>& outputs) override {
       // If execution yielded multiple chunks (because large arrays were split
       // based on the ExecContext parameters, then the result is a ChunkedArray
       if (HaveChunkedArray(inputs) || outputs.size() > 1) {
         return ToChunkedArray(outputs, output_type_);
       } else {
         // Outputs have just one element
         return outputs[0];
       }
     }
   ```
   
   And:
   
   ```
   ExecBatch ExecBatch::Slice(int64_t offset, int64_t length) const {
     ExecBatch out = *this;
     for (auto& value : out.values) {
       if (value.is_scalar()) {
         // keep value as is
       } else if (value.is_array()) {
         value = value.array()->Slice(offset, length);
       } else if (value.is_chunked_array()) {
         value = value.chunked_array()->Slice(offset, length);
       } else {
         ARROW_DCHECK(false);
       }
     }
     out.length = std::min(length, this->length - offset);
     return out;
   }
   ```
   
   If the input contains chunked array, the problem might happens here:
   
   ```
     bool all_scalar = true;
     for (size_t i = 0; i < arguments.size(); ++i) {
       ARROW_ASSIGN_OR_RAISE(
           arguments[i], ExecuteScalarExpression(call->arguments[i], input, 
exec_context));
       if (arguments[i].is_array()) {
         all_scalar = false;
       }
     }
   
     int64_t input_length;
     if (!arguments.empty() && all_scalar) {
       // all inputs are scalar, so use a 1-long batch to avoid
       // computing input.length equivalent outputs
       input_length = 1;
     } else {
       input_length = input.length;
     }
   ```
   
   The `arguments[i].is_array()` flag should be `is_scalar()` if non-array 
element could be in here.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to