Repository: spark Updated Branches: refs/heads/master 60336e3bc -> b3abf0b8d
[SPARK-7663] [MLLIB] Add requirement for word2vec model JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <[email protected]> Closes #6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b3abf0b8 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b3abf0b8 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b3abf0b8 Branch: refs/heads/master Commit: b3abf0b8d9bca13840eb759953d76905c2ba9b8a Parents: 60336e3 Author: Xusen Yin <[email protected]> Authored: Wed May 20 10:41:18 2015 +0100 Committer: Sean Owen <[email protected]> Committed: Wed May 20 10:44:06 2015 +0100 ---------------------------------------------------------------------- .../src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala | 3 +++ 1 file changed, 3 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/b3abf0b8/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala index 731f757..f65f782 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala @@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging { .sortWith((a, b) => a.cn > b.cn) vocabSize = vocab.length + require(vocabSize > 0, "The vocabulary size should be > 0. You may need to check " + + "the setting of minCount, which could be large enough to remove all your words in sentences.") + var a = 0 while (a < vocabSize) { vocabHash += vocab(a).word -> a --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
