Repository: spark
Updated Branches:
  refs/heads/master 188b47e68 -> 6eda55f72


Added more information to Imputer

Often times we want to impute custom values other than 'NaN'. My addition helps 
people locate this function without reading the API.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, 
manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Author: tengpeng <[email protected]>

Closes #19600 from tengpeng/patch-5.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6eda55f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6eda55f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6eda55f7

Branch: refs/heads/master
Commit: 6eda55f728a6f2e265ae12a7e01dae88e4172715
Parents: 188b47e
Author: tengpeng <[email protected]>
Authored: Mon Oct 30 07:24:55 2017 +0000
Committer: Sean Owen <[email protected]>
Committed: Mon Oct 30 07:24:55 2017 +0000

----------------------------------------------------------------------
 docs/ml-features.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/6eda55f7/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 86a0e09..7264313 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1373,7 +1373,9 @@ for more details on the API.
 The `Imputer` transformer completes missing values in a dataset, either using 
the mean or the 
 median of the columns in which the missing values are located. The input 
columns should be of
 `DoubleType` or `FloatType`. Currently `Imputer` does not support categorical 
features and possibly
-creates incorrect values for columns containing categorical features.
+creates incorrect values for columns containing categorical features. Imputer 
can impute custom values 
+other than 'NaN' by `.setMissingValue(custom_value)`. For example, 
`.setMissingValue(0)` will impute 
+all occurrences of (0).
 
 **Note** all `null` values in the input columns are treated as missing, and so 
are also imputed.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to