spark git commit: [SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not actually test with unicode column name

davies Wed, 18 May 2016 11:20:10 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 67c54721d -> d005f76e6



[SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not 
actually test with unicode column name

## What changes were proposed in this pull request?

The PySpark SQL `test_column_name_with_non_ascii` wants to test non-ascii 
column name. But it doesn't actually test it. We need to construct an unicode 
explicitly using `unicode` under Python 2.

## How was this patch tested?

Existing tests.

Author: Liang-Chi Hsieh <[email protected]>

Closes #13134 from viirya/correct-non-ascii-colname-pytest.

(cherry picked from commit 3d1e67f903ab3512fcad82b94b1825578f8117c9)
Signed-off-by: Davies Liu <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d005f76e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d005f76e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d005f76e

Branch: refs/heads/branch-2.0
Commit: d005f76e6c3a5a01153c0189e774b9717c1a51f9
Parents: 67c5472
Author: Liang-Chi Hsieh <[email protected]>
Authored: Wed May 18 11:18:33 2016 -0700
Committer: Davies Liu <[email protected]>
Committed: Wed May 18 11:19:08 2016 -0700

----------------------------------------------------------------------
 python/pyspark/sql/tests.py | 11 +++++++++--
 python/pyspark/sql/types.py |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d005f76e/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index e86f442..1790432 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -1044,8 +1044,15 @@ class SQLTests(ReusedPySparkTestCase):
         self.assertRaises(TypeError, lambda: df[{}])
 
     def test_column_name_with_non_ascii(self):
-        df = self.spark.createDataFrame([(1,)], ["æ°é"])
-        self.assertEqual(StructType([StructField("æ°é", LongType(), 
True)]), df.schema)
+        if sys.version >= '3':
+            columnName = "æ°é"
+            self.assertTrue(isinstance(columnName, str))
+        else:
+            columnName = unicode("æ°é", "utf-8")
+            self.assertTrue(isinstance(columnName, unicode))
+        schema = StructType([StructField(columnName, LongType(), True)])
+        df = self.spark.createDataFrame([(1,)], schema)
+        self.assertEqual(schema, df.schema)
         self.assertEqual("DataFrame[æ°é: bigint]", str(df))
         self.assertEqual([("æ°é", 'bigint')], df.dtypes)
         self.assertEqual(1, df.select("æ°é").first()[0])

http://git-wip-us.apache.org/repos/asf/spark/blob/d005f76e/python/pyspark/sql/types.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 30ab130..7d8d023 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -27,7 +27,7 @@ from array import array
 
 if sys.version >= "3":
     long = int
-    unicode = str
+    basestring = unicode = str
 
 from py4j.protocol import register_input_converter
 from py4j.java_gateway import JavaClass
@@ -401,6 +401,7 @@ class StructField(DataType):
         False
         """
         assert isinstance(dataType, DataType), "dataType should be DataType"
+        assert isinstance(name, basestring), "field name should be string"
         if not isinstance(name, str):
             name = name.encode('utf-8')
         self.name = name


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not actually test with unicode column name

Reply via email to