This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2e07662e24e2 [SPARK-46952][SQL] XML: Limit size of corrupt record
2e07662e24e2 is described below
commit 2e07662e24e243e7d1760ea063c9e88417bc873f
Author: Sandip Agarwala <[email protected]>
AuthorDate: Fri Feb 2 15:45:09 2024 +0900
[SPARK-46952][SQL] XML: Limit size of corrupt record
### What changes were proposed in this pull request?
Limit the size of malformed XML string that gets stored in the corrupt
column.
### Why are the changes needed?
A large corrupt XML record can be arbitrarily large and may cause OOM.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
Unit test
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #44994 from sandip-db/xml_limit_corrupt_record_size.
Authored-by: Sandip Agarwala <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
index 2458d1772dab..674d5f63b039 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
@@ -145,7 +145,7 @@ class StaxXmlParser(
def doParseColumn(xml: String,
parseMode: ParseMode,
xsdSchema: Option[Schema]): Option[InternalRow] = {
- val xmlRecord = UTF8String.fromString(xml)
+ lazy val xmlRecord = UTF8String.fromString(xml)
try {
xsdSchema.foreach { schema =>
schema.newValidator().validate(new StreamSource(new StringReader(xml)))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]