Repository: spark
Updated Branches:
  refs/heads/master 60336e3bc -> b3abf0b8d


[SPARK-7663] [MLLIB] Add requirement for word2vec model

JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663).

We should check the model size of word2vec, to prevent the unexpected empty.

CC srowen.

Author: Xusen Yin <[email protected]>

Closes #6228 from yinxusen/SPARK-7663 and squashes the following commits:

21770c5 [Xusen Yin] check the vocab size
54ae63e [Xusen Yin] add requirement for word2vec model


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b3abf0b8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b3abf0b8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b3abf0b8

Branch: refs/heads/master
Commit: b3abf0b8d9bca13840eb759953d76905c2ba9b8a
Parents: 60336e3
Author: Xusen Yin <[email protected]>
Authored: Wed May 20 10:41:18 2015 +0100
Committer: Sean Owen <[email protected]>
Committed: Wed May 20 10:44:06 2015 +0100

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala  | 3 +++
 1 file changed, 3 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b3abf0b8/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
index 731f757..f65f782 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
@@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {
       .sortWith((a, b) => a.cn > b.cn)
     
     vocabSize = vocab.length
+    require(vocabSize > 0, "The vocabulary size should be > 0. You may need to 
check " +
+      "the setting of minCount, which could be large enough to remove all your 
words in sentences.")
+
     var a = 0
     while (a < vocabSize) {
       vocabHash += vocab(a).word -> a


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to