spark git commit: [SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None for a named argument for a Row

srowen Sat, 07 Jan 2017 04:53:25 -0800

Repository: spark
Updated Branches:
  refs/heads/master d60f6f62d -> 68ea290b3



[SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None for 
a named argument for a Row

## What changes were proposed in this pull request?

It seems allowed to not set a key and value for a dict to represent the value 
is `None` or missing as below:

``` python
spark.createDataFrame([{"x": 1}, {"y": 2}]).show()
```

```
+----+----+
|   x|   y|
+----+----+
|   1|null|
|null|   2|
+----+----+
```

However,  it seems it is not for `Row` as below:

``` python
spark.createDataFrame([Row(x=1), Row(y=2)]).show()
```

``` scala
16/06/19 16:25:56 ERROR Executor: Exception in task 6.0 in stage 66.0 (TID 316)
java.lang.IllegalStateException: Input row doesn't have expected number of 
values required by the schema. 2 fields are required while 1 values are 
provided.
    at 
org.apache.spark.sql.execution.python.EvaluatePython$.fromJava(EvaluatePython.scala:147)
    at 
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
    at 
org.apache.spark.sql.SparkSession$$anonfun$7.apply(SparkSession.scala:656)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:247)
    at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
    at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:780)
```

The behaviour seems right but it seems it might confuse users just like this 
JIRA was reported.

This PR adds the explanation for `Row` class.
## How was this patch tested?

N/A

Author: hyukjinkwon <[email protected]>

Closes #13771 from HyukjinKwon/SPARK-13748.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/68ea290b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/68ea290b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/68ea290b

Branch: refs/heads/master
Commit: 68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf
Parents: d60f6f6
Author: hyukjinkwon <[email protected]>
Authored: Sat Jan 7 12:52:41 2017 +0000
Committer: Sean Owen <[email protected]>
Committed: Sat Jan 7 12:52:41 2017 +0000

----------------------------------------------------------------------
 python/pyspark/sql/types.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/68ea290b/python/pyspark/sql/types.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 4a02312..26b54a7 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -1389,7 +1389,9 @@ class Row(tuple):
     ``key in row`` will search through row keys.
 
     Row can be used to create a row object by using named arguments,
-    the fields will be sorted by names.
+    the fields will be sorted by names. It is not allowed to omit
+    a named argument to represent the value is None or missing. This should be
+    explicitly set to None in this case.
 
     >>> row = Row(name="Alice", age=11)
     >>> row


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-13748][PYSPARK][DOC] Add the description for explictly setting None for a named argument for a Row

Reply via email to