zanmato1984 opened a new issue, #45254:
URL: https://github.com/apache/arrow/issues/45254

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   In #43389 I widened the offset within the row table to 64-bit and changed 
the references of row offset, but I seem to have missed two places:
   
https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/swiss_join.cc#L442
   and
   
https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/swiss_join.cc#L446
   (The `num_bytes` is the accumulation of the sizes of each source row table 
and the `static_cast` here is apparently truncating the number which is 
possibly bigger than `4GB`).
   
   Unfortunately our existing test 
https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/hash_join_node_test.cc#L3367
 didn't catch this. Because the exposure of this bug requires the matching row 
to be located in the area over 4GB, which depends on the hash algorithm, which 
is opaque. We can confirm the truncation actually happens by adding 
`DCHECK_LE(num_bytes, uint32_max)`.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to