This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new 4274a25f4d0f [SPARK-54625][SQL] UTF8String#reverse should check offset
and length on copying
4274a25f4d0f is described below
commit 4274a25f4d0f910a4a718b152a5c6a27352ef963
Author: Kousuke Saruta <[email protected]>
AuthorDate: Sat Dec 6 07:21:21 2025 -0800
[SPARK-54625][SQL] UTF8String#reverse should check offset and length on
copying
### What changes were proposed in this pull request?
This PR aims to check offset and length on copying in `UTF8String#reverse`.
For details, see
https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h (Only PMC
members can read with login).
To avoid performance regression, this PR choose to check offset and length
rather than validate the input UTF-8 string.
### Why are the changes needed?
For safety.
### Does this PR introduce _any_ user-facing change?
Yes, but doesn't break compatibility.
### How was this patch tested?
Example queries mentioned in [this
thread](https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h) works
even though the results are broken.
All the operation defined in `UTF8String` are expected to work correctly
with valid UTF-8 strings so the broken results with invalid UTF-8 strings
should be reasonable.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #53366 from sarutak/fix-utf8-reverse.
Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e2722b82ca68dc4d9dc843ee8747202087140b95)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../src/main/java/org/apache/spark/unsafe/types/UTF8String.java | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index 87d004040c3a..96b103ae3388 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -1160,9 +1160,10 @@ public final class UTF8String implements
Comparable<UTF8String>, Externalizable,
int i = 0; // position in byte
while (i < numBytes) {
- int len = numBytesForFirstByte(getByte(i));
+ int len = Math.min(numBytesForFirstByte(getByte(i)), numBytes);
+ int targetOffset = Math.max(result.length - i - len, 0);
copyMemory(this.base, this.offset + i, result,
- BYTE_ARRAY_OFFSET + result.length - i - len, len);
+ BYTE_ARRAY_OFFSET + targetOffset, len);
i += len;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]