lriggs opened a new issue, #50140:
URL: https://github.com/apache/arrow/issues/50140

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   # [C++][Gandiva] castVARCHAR(decimal128) can corrupt native memory and 
return invalid buffers.
   
   ## Describe the bug
   
   The Gandiva `castVARCHAR_decimal128_int64` function path can corrupt native
   memory and crash the host process (SIGSEGV) when the arena allocation for the
   output string fails — for example when a `CAST(decimal AS VARCHAR)` runs 
under
   memory pressure.
   
   There are three independent problems that combine to produce the crash:
   
   ### 1. `castVARCHAR` decimal128 entry is missing `kCanReturnErrors`
   
   In `function_registry_string.cc`, the `castVARCHAR` registry entry for
   `decimal128` is registered with only `NativeFunction::kNeedsContext`. Unlike 
the
   other error-producing cast/string functions, it does **not** set
   `NativeFunction::kCanReturnErrors`.
   
   Because of this, generated LLVM code assumes the function can never fail and
   skips the post-call error check. Any error the function reports via the 
context
   is silently ignored, and execution continues with whatever (invalid) buffer 
and
   length the function returned.
   
   ### 2. `gdv_fn_dec_to_string` reports a positive length on allocation failure
   
   In `gdv_function_stubs.cc`, `gdv_fn_dec_to_string` writes the output length
   *before* it checks whether the allocation succeeded:
   
   ```cpp
   *dec_str_len = static_cast<int32_t>(dec_str.length());   // positive length
   char* ret = reinterpret_cast<char*>(gdv_fn_context_arena_malloc(context, 
*dec_str_len));
   if (ret == nullptr) {
     // error is set, but *dec_str_len is still positive
     return nullptr;
   }
   ```
   
   When the allocation fails, the function returns `nullptr` while 
`*dec_str_len`
   still holds a positive value. The caller then copies from a null/invalid 
buffer
   using that positive length, i.e. effectively `memcpy(dst, nullptr, 
positive_len)`,
   which is undefined behavior and crashes.
   
   ### 3. `castVARCHAR_decimal128_int64` does not validate its output length
   
   In `precompiled/decimal_wrapper.cc`, `castVARCHAR_decimal128_int64` computes 
the
   truncated length and dereferences/returns the buffer from 
`gdv_fn_dec_to_string`
   without:
   
   - validating that the requested output length (`out_len_param`) is 
non-negative, or
   - handling the case where the upstream allocation failed.
   
   A negative output length flows straight through into the output length used 
by
   the copy, which can produce a huge unsigned size when interpreted by the 
memory
   copy routine.
   
   ### Component(s)
   
   C++, Gandiva


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to