This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 05c87e51a5e5 [SPARK-48644][SQL] Do a length check and throw
COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex
05c87e51a5e5 is described below
commit 05c87e51a5e50d1c156211848693b66937f12a8f
Author: Kent Yao <[email protected]>
AuthorDate: Tue Jun 18 13:19:34 2024 +0800
[SPARK-48644][SQL] Do a length check and throw
COLLECTION_SIZE_LIMIT_EXCEEDED error in Hex.hex
### What changes were proposed in this pull request?
A length check is necessary for heximalising the byte array as the new
array size is 2x the original. If the length of the new byte array exceeds
Int.MaxValue, we shall report the actual length and the threshold instead of
NegativeArraySizeException
### Why are the changes needed?
improve error handling
### Does this PR introduce _any_ user-facing change?
Yes, we convert large strings or binary values to hex strings, if the max
length exceeds, the raised error is changed
### How was this patch tested?
test manually without adding unit test because such a test case is quite
memory consuming.
```
org.apache.spark.sql.catalyst.expressions.Hex.hex((" " * (Int.MaxValue / 2
+ 1)).getBytes)
org.apache.spark.SparkIllegalArgumentException:
[COLLECTION_SIZE_LIMIT_EXCEEDED.INITIALIZE] Can't create array with 2147483648
elements which exceeding the array size limit 2147483647, cannot initialize an
array with specified parameters. SQLSTATE: 54000
at
org.apache.spark.sql.errors.QueryExecutionErrors$.tooManyArrayElementsError(QueryExecutionErrors.scala:2517)
at
org.apache.spark.sql.catalyst.expressions.Hex$.hex(mathExpressions.scala:1042)
... 42 elided
```
### Was this patch authored or co-authored using generative AI tooling?
no
Closes #47001 from yaooqinn/SPARK-48644.
Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
.../apache/spark/sql/catalyst/expressions/mathExpressions.scala | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 20bedeb04098..5981b42aead8 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -1034,7 +1034,14 @@ object Hex {
def hex(bytes: Array[Byte]): UTF8String = {
val length = bytes.length
- val value = new Array[Byte](length * 2)
+ if (length == 0) {
+ return UTF8String.EMPTY_UTF8
+ }
+ val targetLength = length * 2L
+ if (targetLength > Int.MaxValue) {
+ throw QueryExecutionErrors.tooManyArrayElementsError(targetLength,
Int.MaxValue)
+ }
+ val value = new Array[Byte](targetLength.toInt)
var i = 0
while (i < length) {
value(i * 2) = hexDigits((bytes(i) & 0xF0) >> 4)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]