[
https://issues.apache.org/jira/browse/SPARK-29806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971730#comment-16971730
]
Hyukjin Kwon commented on SPARK-29806:
--------------------------------------
{{multiline}} in JSON source currently only supports one JSON object or a JSON
array.
> Using multiline option for a JSON file which is not multiline results in
> silent truncation of data.
> ---------------------------------------------------------------------------------------------------
>
> Key: SPARK-29806
> URL: https://issues.apache.org/jira/browse/SPARK-29806
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.4
> Reporter: Dilip Biswal
> Priority: Major
>
> The content of input Json File.
> {code:java}
> {"name":"John", "id":"100"}
> {"name":"Marry","id":"200"}{code}
> The above is valid json file but every record is in single line. But trying
> to read this file
> with a multiline option with FAILFAST mode, results in data truncation
> without any error.
> {code:java}
> scala> spark.read.option("multiLine", true).option("mode",
> "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+----+
> |id |name|
> +---+----+
> |100|John|
> +---+----+
> scala> spark.read.option("mode",
> "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+-----+
> |id |name |
> +---+-----+
> |100|John |
> |200|Marry|
> +---+-----+{code}
> I think Spark should return an error in this case especially in FAILFAST
> mode. This can be a common user error and we should not do silent data
> truncation.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]