(spark) branch master updated: [MINOR][PS][DOC] Update pandas API on Spark option doc

gurwls223 Sat, 03 May 2025 18:42:47 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new e857f43cde5a [MINOR][PS][DOC] Update pandas API on Spark option doc
e857f43cde5a is described below

commit e857f43cde5a00acc36d58a29eaa3cb5593161ef
Author: Takuya Ueshin <ues...@databricks.com>
AuthorDate: Sun May 4 10:35:28 2025 +0900

    [MINOR][PS][DOC] Update pandas API on Spark option doc
    
    ### What changes were proposed in this pull request?
    
    Updates pandas API on Spark option doc.
    
    ### Why are the changes needed?
    
    The descriptions for some options are outdated.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The existing tests should pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #50777 from ueshin/doc.
    
    Authored-by: Takuya Ueshin <ues...@databricks.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 python/docs/source/tutorial/pandas_on_spark/options.rst | 15 ++++++++-------
 python/pyspark/pandas/config.py                         |  2 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/python/docs/source/tutorial/pandas_on_spark/options.rst 
b/python/docs/source/tutorial/pandas_on_spark/options.rst
index 14164b771e3f..91f128eb351a 100644
--- a/python/docs/source/tutorial/pandas_on_spark/options.rst
+++ b/python/docs/source/tutorial/pandas_on_spark/options.rst
@@ -274,11 +274,11 @@ compute.max_rows                1000                    
'compute.max_rows' sets
                                                         is unset, the 
operation is executed by PySpark.
                                                         Default is 1000.
 compute.shortcut_limit          1000                    
'compute.shortcut_limit' sets the limit for a
-                                                        shortcut. It computes 
specified number of rows and
-                                                        use its schema. When 
the dataframe length is larger
-                                                        than this limit, 
pandas-on-Spark uses PySpark to
-                                                        compute.
-compute.ops_on_diff_frames      False                   This determines 
whether or not to operate between two
+                                                        shortcut. It computes 
the specified number of rows
+                                                        and uses its schema. 
When the dataframe length is
+                                                        larger than this 
limit, pandas-on-Spark uses PySpark
+                                                        to compute.
+compute.ops_on_diff_frames      True                    This determines 
whether or not to operate between two
                                                         different dataframes. 
For example, 'combine_frames'
                                                         function internally 
performs a join operation which
                                                         can be expensive in 
general. So, if
@@ -325,8 +325,9 @@ plotting.max_rows               1000                    
'plotting.max_rows' sets
                                                         used for plotting. 
Default is 1000.
 plotting.sample_ratio           None                    
'plotting.sample_ratio' sets the proportion of data
                                                         that will be plotted 
for sample-based plots such as
-                                                        `plot.line` and 
`plot.area`. This option defaults to
-                                                        'plotting.max_rows' 
option.
+                                                        `plot.line` and 
`plot.area`. If not set, it is
+                                                        derived from 
'plotting.max_rows', by calculating the
+                                                        ratio of 
'plotting.max_rows' to the total data size.
 plotting.backend                'plotly'                Backend to use for 
plotting. Default is plotly.
                                                         Supports any package 
that has a top-level `.plot`
                                                         method. Known options 
are: [matplotlib, plotly].
diff --git a/python/pyspark/pandas/config.py b/python/pyspark/pandas/config.py
index 6ed4adf21ff4..64fbd006570e 100644
--- a/python/pyspark/pandas/config.py
+++ b/python/pyspark/pandas/config.py
@@ -112,7 +112,7 @@ class Option:
 #
 # NOTE: if you are fixing or adding an option here, make sure you execute 
`show_options()` and
 #     copy & paste the results into show_options
-#     'docs/source/user_guide/pandas_on_spark/options.rst' as well.
+#     'python/docs/source/tutorial/pandas_on_spark/options.rst' as well.
 #     See the examples below:
 #     >>> from pyspark.pandas.config import show_options
 #     >>> show_options()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [MINOR][PS][DOC] Update pandas API on Spark option doc

Reply via email to