This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new 068e44877a3f [SPARK-50997][DOCS] Remove `\t` character in `docs`
068e44877a3f is described below
commit 068e44877a3f37f6111b90142efb4d92f22d998d
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sun Jan 26 16:48:37 2025 -0800
[SPARK-50997][DOCS] Remove `\t` character in `docs`
### What changes were proposed in this pull request?
This PR aims to remove `\t` characters in `docs`.
### Why are the changes needed?
This is a clean-up in order to be consistent in `docs`.
### Does this PR introduce _any_ user-facing change?
No, these are white-space character changes in docs.
### How was this patch tested?
Manual review.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #49682 from dongjoon-hyun/SPARK-50997.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 77096a29e4fe4ccaf1df8f90b5223e88b03251bf)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
docs/mllib-decision-tree.md | 14 +++++++-------
docs/mllib-ensembles.md | 12 ++++++------
docs/running-on-kubernetes.md | 2 +-
docs/sql-ref-ansi-compliance.md | 4 ++--
docs/sql-ref-syntax-qry-select-aggregate.md | 22 ++++++++++-----------
docs/sql-ref-syntax-qry-select-transform.md | 22 ++++++++++-----------
docs/streaming-kafka-0-10-integration.md | 6 +++---
docs/streaming-programming-guide.md | 6 +++---
docs/streaming/performance-tips.md | 4 ++--
docs/web-ui.md | 30 ++++++++++++++---------------
10 files changed, 61 insertions(+), 61 deletions(-)
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index 0d9886315e28..601eb1513ca8 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -58,19 +58,19 @@ impurity measure for regression (variance).
<tbody>
<tr>
<td>Gini impurity</td>
- <td>Classification</td>
- <td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of
label $i$ at a node and $C$ is the number of unique labels.</td>
+ <td>Classification</td>
+ <td>$\sum_{i=1}^{C} f_i(1-f_i)$</td><td>$f_i$ is the frequency of label
$i$ at a node and $C$ is the number of unique labels.</td>
</tr>
<tr>
<td>Entropy</td>
- <td>Classification</td>
- <td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of
label $i$ at a node and $C$ is the number of unique labels.</td>
+ <td>Classification</td>
+ <td>$\sum_{i=1}^{C} -f_ilog(f_i)$</td><td>$f_i$ is the frequency of
label $i$ at a node and $C$ is the number of unique labels.</td>
</tr>
<tr>
<td>Variance</td>
- <td>Regression</td>
- <td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label for
an instance,
- $N$ is the number of instances and $\mu$ is the mean given by
$\frac{1}{N} \sum_{i=1}^N y_i$.</td>
+ <td>Regression</td>
+ <td>$\frac{1}{N} \sum_{i=1}^{N} (y_i - \mu)^2$</td><td>$y_i$ is label
for an instance,
+ $N$ is the number of instances and $\mu$ is the mean given by
$\frac{1}{N} \sum_{i=1}^N y_i$.</td>
</tr>
</tbody>
</table>
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index 8f4e6b1088b3..a161293072bc 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -198,18 +198,18 @@ Notation: $N$ = number of instances. $y_i$ = label of
instance $i$. $x_i$ = fea
<tbody>
<tr>
<td>Log Loss</td>
- <td>Classification</td>
- <td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice
binomial negative log likelihood.</td>
+ <td>Classification</td>
+ <td>$2 \sum_{i=1}^{N} \log(1+\exp(-2 y_i F(x_i)))$</td><td>Twice
binomial negative log likelihood.</td>
</tr>
<tr>
<td>Squared Error</td>
- <td>Regression</td>
- <td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss.
Default loss for regression tasks.</td>
+ <td>Regression</td>
+ <td>$\sum_{i=1}^{N} (y_i - F(x_i))^2$</td><td>Also called L2 loss.
Default loss for regression tasks.</td>
</tr>
<tr>
<td>Absolute Error</td>
- <td>Regression</td>
- <td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss. Can be
more robust to outliers than Squared Error.</td>
+ <td>Regression</td>
+ <td>$\sum_{i=1}^{N} |y_i - F(x_i)|$</td><td>Also called L1 loss. Can be
more robust to outliers than Squared Error.</td>
</tr>
</tbody>
</table>
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index bd5e1956a627..79d0554a5ae2 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -1470,7 +1470,7 @@ See the [configuration page](configuration.html) for
information on Spark config
<td><code>spark.kubernetes.executor.scheduler.name</code></td>
<td>(none)</td>
<td>
- Specify the scheduler name for each executor pod.
+ Specify the scheduler name for each executor pod.
</td>
<td>3.0.0</td>
</tr>
diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 3b1138b9ee0e..45b8e9a4dcea 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -281,8 +281,8 @@ Note, arithmetic operations have special rules to calculate
the least common typ
| Operation | Result precision | Result scale |
|------------|------------------------------------------|---------------------|
| e1 + e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
-| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
-| e1 * e2 | p1 + p2 + 1 | s1 + s2 |
+| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
+| e1 * e2 | p1 + p2 + 1 | s1 + s2 |
| e1 / e2 | p1 - s1 + s2 + max(6, s1 + p2 + 1) | max(6, s1 + p2 + 1) |
| e1 % e2 | min(p1 - s1, p2 - s2) + max(s1, s2) | max(s1, s2) |
diff --git a/docs/sql-ref-syntax-qry-select-aggregate.md
b/docs/sql-ref-syntax-qry-select-aggregate.md
index e0e294cc50c2..8cb371486335 100644
--- a/docs/sql-ref-syntax-qry-select-aggregate.md
+++ b/docs/sql-ref-syntax-qry-select-aggregate.md
@@ -94,23 +94,23 @@ SELECT * FROM basic_pays;
+-----------------+----------+------+
| employee_name|department|salary|
+-----------------+----------+------+
-| Anthony Bow|Accounting| 6627|
+| Anthony Bow|Accounting| 6627|
| Barry Jones| SCM| 10586|
-| Diane Murphy|Accounting| 8435|
-| Foon Yue Tseng| Sales| 6660|
+| Diane Murphy|Accounting| 8435|
+| Foon Yue Tseng| Sales| 6660|
| George Vanauf| Sales| 10563|
| Gerard Bondur|Accounting| 11472|
-| Gerard Hernandez| SCM| 6949|
-| Jeff Firrelli|Accounting| 8992|
-| Julie Firrelli| Sales| 9181|
+| Gerard Hernandez| SCM| 6949|
+| Jeff Firrelli|Accounting| 8992|
+| Julie Firrelli| Sales| 9181|
| Larry Bott| SCM| 11798|
-| Leslie Jennings| IT| 8113|
-| Leslie Thompson| IT| 5186|
+| Leslie Jennings| IT| 8113|
+| Leslie Thompson| IT| 5186|
| Loui Bondur| SCM| 10449|
-| Mary Patterson|Accounting| 9998|
+| Mary Patterson|Accounting| 9998|
| Pamela Castillo| SCM| 11303|
-| Steve Patterson| Sales| 9441|
-|William Patterson|Accounting| 8870|
+| Steve Patterson| Sales| 9441|
+|William Patterson|Accounting| 8870|
+-----------------+----------+------+
SELECT
diff --git a/docs/sql-ref-syntax-qry-select-transform.md
b/docs/sql-ref-syntax-qry-select-transform.md
index 2ca69727a704..18d05db3ddf4 100644
--- a/docs/sql-ref-syntax-qry-select-transform.md
+++ b/docs/sql-ref-syntax-qry-select-transform.md
@@ -238,17 +238,17 @@ SELECT TRANSFORM(zip_code, name, age)
USING 'cat'
FROM person
WHERE zip_code > 94500;
-+-------+---------------------+
-| key| value|
-+-------+---------------------+
-| 94588| Anil K 27|
-| 94588| John V \N|
-| 94511| Aryan B. 18|
-| 94511| David K 42|
-| 94588| Zen Hui 50|
-| 94588| Dan Li 18|
-| 94511| Lalit B. \N|
-+-------+---------------------+
++-------+----------------+
+| key| value|
++-------+----------------+
+| 94588| Anil K 27|
+| 94588| John V \N|
+| 94511| Aryan B. 18|
+| 94511| David K 42|
+| 94588| Zen Hui 50|
+| 94588| Dan Li 18|
+| 94511| Lalit B. \N|
++-------+----------------+
```
### Related Statements
diff --git a/docs/streaming-kafka-0-10-integration.md
b/docs/streaming-kafka-0-10-integration.md
index 0f5964786fbc..6af14c17476d 100644
--- a/docs/streaming-kafka-0-10-integration.md
+++ b/docs/streaming-kafka-0-10-integration.md
@@ -26,9 +26,9 @@ there are notable differences in usage.
### Linking
For Scala/Java applications using SBT/Maven project definitions, link your
streaming application with the following artifact (see [Linking
section](streaming-programming-guide.html#linking) in the main programming
guide for further information).
- groupId = org.apache.spark
- artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
- version = {{site.SPARK_VERSION_SHORT}}
+ groupId = org.apache.spark
+ artifactId = spark-streaming-kafka-0-10_{{site.SCALA_BINARY_VERSION}}
+ version = {{site.SPARK_VERSION_SHORT}}
**Do not** manually add dependencies on `org.apache.kafka` artifacts (e.g.
`kafka-clients`). The `spark-streaming-kafka-0-10` artifact has the
appropriate transitive dependencies already, and different versions may be
incompatible in hard to diagnose ways.
diff --git a/docs/streaming-programming-guide.md
b/docs/streaming-programming-guide.md
index 3d39331eb15f..51d50d7e1bf8 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -414,7 +414,7 @@ Similar to Spark, Spark Streaming is available through
Maven Central. To write y
<div class="codetabs">
<div data-lang="Maven" markdown="1">
- <dependency>
+ <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_{{site.SCALA_BINARY_VERSION}}</artifactId>
<version>{{site.SPARK_VERSION}}</version>
@@ -423,7 +423,7 @@ Similar to Spark, Spark Streaming is available through
Maven Central. To write y
</div>
<div data-lang="SBT" markdown="1">
- libraryDependencies += "org.apache.spark" %
"spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" %
"provided"
+ libraryDependencies += "org.apache.spark" %
"spark-streaming_{{site.SCALA_BINARY_VERSION}}" % "{{site.SPARK_VERSION}}" %
"provided"
</div>
</div>
@@ -2191,7 +2191,7 @@ improve the performance of your application. At a high
level, you need to consid
1. Reducing the processing time of each batch of data by efficiently using
cluster resources.
2. Setting the right batch size such that the batches of data can be processed
as fast as they
- are received (that is, data processing keeps up with the data
ingestion).
+ are received (that is, data processing keeps up with the data ingestion).
## Reducing the Batch Processing Times
There are a number of optimizations that can be done in Spark to minimize the
processing time of
diff --git a/docs/streaming/performance-tips.md
b/docs/streaming/performance-tips.md
index 25b29ee7097b..7fdebddbbeee 100644
--- a/docs/streaming/performance-tips.md
+++ b/docs/streaming/performance-tips.md
@@ -43,9 +43,9 @@ val stream = spark.readStream
.load()
val query = stream.writeStream
.format("kafka")
- .option("topic", "out")
+ .option("topic", "out")
.option("checkpointLocation", "/tmp/checkpoint")
- .option("asyncProgressTrackingEnabled", "true")
+ .option("asyncProgressTrackingEnabled", "true")
.start()
```
diff --git a/docs/web-ui.md b/docs/web-ui.md
index c500860a201b..9173ddef81d3 100644
--- a/docs/web-ui.md
+++ b/docs/web-ui.md
@@ -80,15 +80,15 @@ This page displays the details of a specific job identified
by its job ID.
</p>
* List of stages (grouped by state active, pending, completed, skipped, and
failed)
- * Stage ID
- * Description of the stage
- * Submitted timestamp
- * Duration of the stage
- * Tasks progress bar
- * Input: Bytes read from storage in this stage
- * Output: Bytes written in storage in this stage
- * Shuffle read: Total shuffle bytes and records read, includes both
data read locally and data read from remote executors
- * Shuffle write: Bytes and records written to disk in order to be read
by a shuffle in a future stage
+ * Stage ID
+ * Description of the stage
+ * Submitted timestamp
+ * Duration of the stage
+ * Tasks progress bar
+ * Input: Bytes read from storage in this stage
+ * Output: Bytes written in storage in this stage
+ * Shuffle read: Total shuffle bytes and records read, includes both data
read locally and data read from remote executors
+ * Shuffle write: Bytes and records written to disk in order to be read by
a shuffle in a future stage
<p style="text-align: center;">
<img src="img/JobPageDetail3.png" title="DAG" alt="DAG">
@@ -479,12 +479,12 @@ The third section has the SQL statistics of the submitted
operations.
* **Duration time** is the difference between close time and start time.
* **Statement** is the operation being executed.
* **State** of the process.
- * _Started_, first state, when the process begins.
- * _Compiled_, execution plan generated.
- * _Failed_, final state when the execution failed or finished with
error.
- * _Canceled_, final state when the execution is canceled.
- * _Finished_ processing and waiting to fetch results.
- * _Closed_, final state when client closed the statement.
+ * _Started_, first state, when the process begins.
+ * _Compiled_, execution plan generated.
+ * _Failed_, final state when the execution failed or finished with error.
+ * _Canceled_, final state when the execution is canceled.
+ * _Finished_ processing and waiting to fetch results.
+ * _Closed_, final state when client closed the statement.
* **Detail** of the execution plan with parsed logical plan, analyzed logical
plan, optimized logical plan and physical plan or errors in the SQL statement.
<p style="text-align: center;">
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]