This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new 2d4a515  [SPARK-34080][ML][PYTHON][FOLLOW-UP] Update score function in 
UnivariateFeatureSelector document
2d4a515 is described below

commit 2d4a51568dde4baac6d660f378223a94b074fcec
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Wed Feb 10 09:24:25 2021 +0900

    [SPARK-34080][ML][PYTHON][FOLLOW-UP] Update score function in 
UnivariateFeatureSelector document
    
    ### What changes were proposed in this pull request?
    
    This follows up #31160 to update score function in the document.
    
    ### Why are the changes needed?
    
    Currently we use `f_classif`, `ch2`, `f_regression`, which sound to me the 
sklearn's naming. It is good to have it but I think it is nice if we have 
formal score function name with sklearn's ones.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    No, only doc change.
    
    Closes #31531 from viirya/SPARK-34080-minor.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
    (cherry picked from commit 1fbd5764105e2c09caf4ab57a7095dd794307b02)
    Signed-off-by: HyukjinKwon <[email protected]>
---
 docs/ml-features.md                                              | 6 +++---
 .../main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala   | 2 +-
 .../org/apache/spark/ml/feature/UnivariateFeatureSelector.scala  | 9 ++++++---
 python/pyspark/ml/feature.py                                     | 9 ++++++---
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/docs/ml-features.md b/docs/ml-features.md
index 2bb8873..b36b076 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1802,9 +1802,9 @@ User can set `featureType` and `labelType`, and Spark 
will pick the score functi
 ~~~
 featureType |  labelType |score function
 ------------|------------|--------------
-categorical |categorical | chi2
-continuous  |categorical | f_classif
-continuous  |continuous  | f_regression
+categorical |categorical | chi-squared (chi2)
+continuous  |categorical | ANOVATest (f_classif)
+continuous  |continuous  | F-value (f_regression)
 ~~~
 
 It supports five selection modes: `numTopFeatures`, `percentile`, `fpr`, 
`fdr`, `fwe`:
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
index 198a886..fc6c615 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
@@ -44,7 +44,7 @@ import org.apache.spark.sql.types.StructType
  * By default, the selection method is `numTopFeatures`, with the default 
number of top features
  * set to 50.
  */
-@deprecated("use UnivariateFeatureSelector instead", "3.1.0")
+@deprecated("use UnivariateFeatureSelector instead", "3.1.1")
 @Since("1.6.0")
 final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: 
String)
   extends Selector[ChiSqSelectorModel] {
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
index bfe1d5f..7fff159 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala
@@ -100,9 +100,12 @@ private[feature] trait UnivariateFeatureSelectorParams 
extends Params
  * The user can set `featureType` and labelType`, and Spark will pick the 
score function based on
  * the specified `featureType` and labelType`.
  * The following combination of `featureType` and `labelType` are supported:
- *  - `featureType` `categorical` and `labelType` `categorical`:  Spark uses 
chi2.
- *  - `featureType` `continuous` and `labelType` `categorical`:  Spark uses 
f_classif.
- *  - `featureType` `continuous` and `labelType` `continuous`:  Spark uses 
f_regression.
+ *  - `featureType` `categorical` and `labelType` `categorical`: Spark uses 
chi-squared,
+ *    i.e. chi2 in sklearn.
+ *  - `featureType` `continuous` and `labelType` `categorical`: Spark uses 
ANOVATest,
+ *    i.e. f_classif in sklearn.
+ *  - `featureType` `continuous` and `labelType` `continuous`: Spark uses 
F-value,
+ *    i.e. f_regression in sklearn.
  *
  * The `UnivariateFeatureSelector` supports different selection modes: 
`numTopFeatures`,
  * `percentile`, `fpr`, `fdr`, `fwe`.
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index f9d22ba..4e8b8b4 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -5821,9 +5821,12 @@ class UnivariateFeatureSelector(JavaEstimator, 
_UnivariateFeatureSelectorParams,
 
     The following combination of `featureType` and `labelType` are supported:
 
-    - `featureType` `categorical` and `labelType` `categorical`, Spark uses 
chi2.
-    - `featureType` `continuous` and `labelType` `categorical`, Spark uses 
f_classif.
-    - `featureType` `continuous` and `labelType` `continuous`, Spark uses 
f_regression.
+    - `featureType` `categorical` and `labelType` `categorical`, Spark uses 
chi-squared,
+      i.e. chi2 in sklearn.
+    - `featureType` `continuous` and `labelType` `categorical`, Spark uses 
ANOVATest,
+      i.e. f_classif in sklearn.
+    - `featureType` `continuous` and `labelType` `continuous`, Spark uses 
F-value,
+      i.e. f_regression in sklearn.
 
     The `UnivariateFeatureSelector` supports different selection modes: 
`numTopFeatures`,
     `percentile`, `fpr`, `fdr`, `fwe`.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to