[ 
https://issues.apache.org/jira/browse/SPARK-29806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971730#comment-16971730
 ] 

Hyukjin Kwon commented on SPARK-29806:
--------------------------------------

{{multiline}} in JSON source currently only supports one JSON object or a JSON 
array.

> Using multiline option for a JSON file which is not multiline results in 
> silent truncation of data.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29806
>                 URL: https://issues.apache.org/jira/browse/SPARK-29806
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Dilip Biswal
>            Priority: Major
>
> The content of input Json File.
> {code:java}
> {"name":"John", "id":"100"}
> {"name":"Marry","id":"200"}{code}
> The above is valid json file but every record is in single line. But trying 
> to read this file
>  with a multiline option with FAILFAST mode, results in data truncation 
> without any error.
> {code:java}
> scala> spark.read.option("multiLine", true).option("mode", 
> "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+----+
> |id |name|
> +---+----+
> |100|John|
> +---+----+
> scala> spark.read.option("mode", 
> "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+-----+
> |id |name |
> +---+-----+
> |100|John |
> |200|Marry|
> +---+-----+{code}
> I think Spark should return an error in this case especially in FAILFAST 
> mode. This can be a common user error and we should not do silent data 
> truncation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to