zanmato1984 opened a new issue, #45254: URL: https://github.com/apache/arrow/issues/45254
### Describe the bug, including details regarding any error messages, version, and platform. In #43389 I widened the offset within the row table to 64-bit and changed the references of row offset, but I seem to have missed two places: https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/swiss_join.cc#L442 and https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/swiss_join.cc#L446 (The `num_bytes` is the accumulation of the sizes of each source row table and the `static_cast` here is apparently truncating the number which is possibly bigger than `4GB`). Unfortunately our existing test https://github.com/apache/arrow/blob/ef0056860c9780410b9766539c1e02055be6591d/cpp/src/arrow/acero/hash_join_node_test.cc#L3367 didn't catch this. Because the exposure of this bug requires the matching row to be located in the area over 4GB, which depends on the hash algorithm, which is opaque. We can confirm the truncation actually happens by adding `DCHECK_LE(num_bytes, uint32_max)`. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org