Repository: spark
Updated Branches:
  refs/heads/branch-1.6 d482dced3 -> bad93d9f3


[SPARK-11902][ML] Unhandled case in VectorAssembler#transform

There is an unhandled case in the transform method of VectorAssembler if one of 
the input columns doesn't have one of the supported type DoubleType, 
NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic 
"scala.MatchError: StringType".

This PR aims to fix this, throwing a SparkException when dealing with an 
unknown column type.

Author: BenFradet <[email protected]>

Closes #9885 from BenFradet/SPARK-11902.

(cherry picked from commit 4be360d4ee6cdb4d06306feca38ddef5212608cf)
Signed-off-by: Xiangrui Meng <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bad93d9f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bad93d9f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bad93d9f

Branch: refs/heads/branch-1.6
Commit: bad93d9f3a24a7ee024541569c6f3de88aad2fda
Parents: d482dce
Author: BenFradet <[email protected]>
Authored: Sun Nov 22 22:05:01 2015 -0800
Committer: Xiangrui Meng <[email protected]>
Committed: Sun Nov 22 22:05:13 2015 -0800

----------------------------------------------------------------------
 .../org/apache/spark/ml/feature/VectorAssembler.scala    |  2 ++
 .../apache/spark/ml/feature/VectorAssemblerSuite.scala   | 11 +++++++++++
 2 files changed, 13 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/bad93d9f/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
index 0feec05..801096f 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
@@ -84,6 +84,8 @@ class VectorAssembler(override val uid: String)
             val numAttrs = 
group.numAttributes.getOrElse(first.getAs[Vector](index).size)
             Array.fill(numAttrs)(NumericAttribute.defaultAttr)
           }
+        case otherType =>
+          throw new SparkException(s"VectorAssembler does not support the 
$otherType type")
       }
     }
     val metadata = new AttributeGroup($(outputCol), attrs).toMetadata()

http://git-wip-us.apache.org/repos/asf/spark/blob/bad93d9f/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
index fb21ab6..9c1c00f 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
@@ -69,6 +69,17 @@ class VectorAssemblerSuite
     }
   }
 
+  test("transform should throw an exception in case of unsupported type") {
+    val df = sqlContext.createDataFrame(Seq(("a", "b", "c"))).toDF("a", "b", 
"c")
+    val assembler = new VectorAssembler()
+      .setInputCols(Array("a", "b", "c"))
+      .setOutputCol("features")
+    val thrown = intercept[SparkException] {
+      assembler.transform(df)
+    }
+    assert(thrown.getMessage contains "VectorAssembler does not support the 
StringType type")
+  }
+
   test("ML attributes") {
     val browser = NominalAttribute.defaultAttr.withValues("chrome", "firefox", 
"safari")
     val hour = NumericAttribute.defaultAttr.withMin(0.0).withMax(24.0)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to