[spark] branch branch-3.0 updated: [SPARK-31110][DOCS][SQL] refine sql doc for SELECT

dongjoon Wed, 11 Mar 2020 16:54:26 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new ffcc4a2  [SPARK-31110][DOCS][SQL] refine sql doc for SELECT
ffcc4a2 is described below

commit ffcc4a27041abe97991f4bd14d0b5abf3c50a542
Author: Wenchen Fan <[email protected]>
AuthorDate: Wed Mar 11 16:52:40 2020 -0700

    [SPARK-31110][DOCS][SQL] refine sql doc for SELECT
    
    ### What changes were proposed in this pull request?
    
    A few improvements to the sql ref SELECT doc:
    1. correct the syntax of SELECT query
    2. correct the default of null sort order
    3. correct the GROUP BY syntax
    4. several minor fixes
    
    ### Why are the changes needed?
    
    refine document
    
    ### Does this PR introduce any user-facing change?
    
    N/A
    
    ### How was this patch tested?
    
    N/A
    
    Closes #27866 from cloud-fan/doc.
    
    Authored-by: Wenchen Fan <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
    (cherry picked from commit 0f0ccdadb123d5839c34244e25a4ee17dde0fcdc)
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 docs/sql-ref-syntax-qry-select-clusterby.md     | 18 ++++----
 docs/sql-ref-syntax-qry-select-distribute-by.md | 18 ++++----
 docs/sql-ref-syntax-qry-select-groupby.md       | 48 ++++++++++-----------
 docs/sql-ref-syntax-qry-select-having.md        | 12 +++---
 docs/sql-ref-syntax-qry-select-limit.md         | 23 +++++++----
 docs/sql-ref-syntax-qry-select-orderby.md       | 24 +++++------
 docs/sql-ref-syntax-qry-select-sortby.md        | 28 ++++++-------
 docs/sql-ref-syntax-qry-select-where.md         | 10 ++---
 docs/sql-ref-syntax-qry-select.md               | 55 ++++++++++++++-----------
 docs/sql-ref-syntax-qry.md                      |  8 ++--
 10 files changed, 126 insertions(+), 118 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-clusterby.md 
b/docs/sql-ref-syntax-qry-select-clusterby.md
index bb60e8b..8f1dc59 100644
--- a/docs/sql-ref-syntax-qry-select-clusterby.md
+++ b/docs/sql-ref-syntax-qry-select-clusterby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -41,20 +41,20 @@ CLUSTER BY { expression [ , ... ] }
 ### Examples
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
-INSERT INTO person VALUES 
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+INSERT INTO person VALUES
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
 
 -- Reduce the number of shuffle partitions to 2 to illustrate the behavior of 
`CLUSTER BY`.
 -- It's easier to see the clustering and sorting behavior with less number of 
partitions.
 SET spark.sql.shuffle.partitions = 2;
-                        
+
 -- Select the rows with no ordering. Please note that without any sort 
directive, the results
--- of the query is not deterministic. It's included here to show the 
difference in behavior 
+-- of the query is not deterministic. It's included here to show the 
difference in behavior
 -- of a query when `CLUSTER BY` is not used vs when it's used. The query below 
produces rows
 -- where age column is not sorted.
 SELECT age, name FROM person;
diff --git a/docs/sql-ref-syntax-qry-select-distribute-by.md 
b/docs/sql-ref-syntax-qry-select-distribute-by.md
index 5ade9c1..957df9c 100644
--- a/docs/sql-ref-syntax-qry-select-distribute-by.md
+++ b/docs/sql-ref-syntax-qry-select-distribute-by.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -20,7 +20,7 @@ license: |
 ---
 The <code>DISTRIBUTE BY</code> clause is used to repartition the data based
 on the input expressions. Unlike the [CLUSTER 
BY](sql-ref-syntax-qry-select-clusterby.html)
-clause, this does not sort the data within each partition. 
+clause, this does not sort the data within each partition.
 
 ### Syntax
 {% highlight sql %}
@@ -39,19 +39,19 @@ DISTRIBUTE BY { expression [ , ... ] }
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
 INSERT INTO person VALUES
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
 
 -- Reduce the number of shuffle partitions to 2 to illustrate the behavior of 
`DISTRIBUTE BY`.
 -- It's easier to see the clustering and sorting behavior with less number of 
partitions.
 SET spark.sql.shuffle.partitions = 2;
-                        
+
 -- Select the rows with no ordering. Please note that without any sort 
directive, the result
--- of the query is not deterministic. It's included here to just contrast it 
with the 
+-- of the query is not deterministic. It's included here to just contrast it 
with the
 -- behavior of `DISTRIBUTE BY`. The query below produces rows where age 
columns are not
 -- clustered together.
 SELECT age, name FROM person;
diff --git a/docs/sql-ref-syntax-qry-select-groupby.md 
b/docs/sql-ref-syntax-qry-select-groupby.md
index ab1c5d6..49a11ca 100644
--- a/docs/sql-ref-syntax-qry-select-groupby.md
+++ b/docs/sql-ref-syntax-qry-select-groupby.md
@@ -9,42 +9,43 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
-The <code>GROUP BY</code> clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on 
-the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple 
+The <code>GROUP BY</code> clause is used to group the rows based on a set of 
specified grouping expressions and compute aggregations on
+the group of rows based on one or more specified aggregate functions. Spark 
also supports advanced aggregations to do multiple
 aggregations for the same input record set via `GROUPING SETS`, `CUBE`, 
`ROLLUP` clauses.
 
 ### Syntax
 {% highlight sql %}
-GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , group_expression 
[ , ... ] ]
-    [ ( WITH ROLLUP | WITH CUBE | GROUPING SETS grouping_sets ) ) ]
+GROUP BY group_expression [ , group_expression [ , ... ] ]
+  [ { WITH ROLLUP | WITH CUBE | GROUPING SETS (grouping_set [ , ...]) } ]
+
+GROUP BY GROUPING SETS (grouping_set [ , ...])
 {% endhighlight %}
 
 ### Parameters
 <dl>
   <dt><code><em>GROUPING SETS</em></code></dt>
   <dd>
-    Groups the rows for each subset of the expressions specified in the 
grouping sets. For example, 
+    Groups the rows for each subset of the expressions specified in the 
grouping sets. For example,
     <code>GROUP BY GROUPING SETS (warehouse, product)</code> is semantically 
equivalent
     to union of results of <code>GROUP BY warehouse</code> and <code>GROUP BY 
product</code>. This clause
-    is shorthand for a <code>UNION ALL</code> where each leg of the 
<code>UNION ALL</code> 
+    is a shorthand for a <code>UNION ALL</code> where each leg of the 
<code>UNION ALL</code>
     operator performs aggregation of subset of the columns specified in the 
<code>GROUPING SETS</code> clause.
   </dd>
-  <dt><code><em>grouping_sets</em></code></dt>
+  <dt><code><em>grouping_set</em></code></dt>
   <dd>
-    Specifies one of more groupings based on which the <code>GROUP BY</code> 
clause performs aggregations. A grouping
-    set is specified by a list of comma-separated expressions in 
parentheses.<br><br>
+    A grouping set is specified by zero or more comma-separated expressions in 
parentheses.<br><br>
     <b>Syntax:</b>
       <code>
-        ( () | ( expression [ , ... ] ) )
+        ([expression [, ...]])
       </code>
   </dd>
   <dt><code><em>grouping_expression</em></code></dt>
@@ -55,17 +56,18 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ 
, group_expression [
   </dd>
   <dt><code><em>ROLLUP</em></code></dt>
   <dd>
-    Specifies multiple levels of aggregations in a single statement. This 
clause is used to compute aggregations 
-    based on multiple grouping sets. <code>ROLLUP</code> is shorthand for 
<code>GROUPING SETS</code>. For example,
-    GROUP BY warehouse, product  WITH ROLLUP is equivalent to GROUP BY 
<code>warehouse, product</code> GROUPING SETS
-    <code> ((warehouse, product), (warehouse), ())</code>.
+    Specifies multiple levels of aggregations in a single statement. This 
clause is used to compute aggregations
+    based on multiple grouping sets. <code>ROLLUP</code> is a shorthand for 
<code>GROUPING SETS</code>. For example,
+    <code>GROUP BY warehouse, product WITH ROLLUP</code> is equivalent to 
<code>GROUP BY GROUPING SETS
+    ((warehouse, product), (warehouse), ())</code>.
     The N elements of a <code>ROLLUP</code> specification results in N+1 
<code>GROUPING SETS</code>.
   </dd>
   <dt><code><em>CUBE</em></code></dt>
   <dd>
-    <code>CUBE</code> clause is used to perform aggregations based on 
combination of grouping columns specified in the 
-    <code>GROUP BY</code> clause. For example, <code>GROUP BY warehouse, 
product  WITH CUBE</code> is equivalent 
-    to GROUP BY <code>warehouse, product</code> GROUPING SETS 
<code>((warehouse, product), (warehouse), (product), ())</code>.
+    <code>CUBE</code> clause is used to perform aggregations based on 
combination of grouping columns specified in the
+    <code>GROUP BY</code> clause. <code>CUBE</code> is a shorthand for 
<code>GROUPING SETS</code>. For example,
+    <code>GROUP BY warehouse, product WITH CUBE</code> is equivalent to 
<code>GROUP BY GROUPING SETS
+    ((warehouse, product), (warehouse), (product), ())</code>.
     The N elements of a <code>CUBE</code> specification results in 2^N 
<code>GROUPING SETS</code>.
   </dd>
 </dl>
@@ -73,7 +75,7 @@ GROUP BY [ GROUPING SETS grouping_sets ] group_expression [ , 
group_expression [
 ### Examples
 {% highlight sql %}
 CREATE TABLE dealer (id INT, city STRING, car_model STRING, quantity INT);
-INSERT INTO dealer VALUES 
+INSERT INTO dealer VALUES
     (100, 'Fremont', 'Honda Civic', 10),
     (100, 'Fremont', 'Honda Accord', 15),
     (100, 'Fremont', 'Honda CRV', 7),
@@ -107,7 +109,7 @@ SELECT id, sum(quantity) FROM dealer GROUP BY 1 ORDER BY 1;
 
 -- Multiple aggregations.
 -- 1. Sum of quantity per dealership.
--- 2. Max quantity per dealership. 
+-- 2. Max quantity per dealership.
 SELECT id, sum(quantity) AS sum, max(quantity) AS max FROM dealer GROUP BY id 
ORDER BY id;
 
   +---+---+---+
@@ -148,7 +150,7 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
   |San Jose|Honda Civic |5  |
   +--------+------------+---+
 
--- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING 
SETS` 
+-- Alternate syntax for `GROUPING SETS` in which both `GROUP BY` and `GROUPING 
SETS`
 -- specifications are present.
 SELECT city, car_model, sum(quantity) AS sum FROM dealer
    GROUP BY city, car_model GROUPING SETS ((city, car_model), (city), 
(car_model), ())
@@ -200,7 +202,7 @@ SELECT city, car_model, sum(quantity) AS sum FROM dealer
 -- Group by processing with `CUBE` clause.
 -- Equivalent GROUP BY GROUPING SETS ((city, car_model), (city), (car_model), 
())
 SELECT city, car_model, sum(quantity) AS sum FROM dealer
-   GROUP BY city, car_model WITH CUBE 
+   GROUP BY city, car_model WITH CUBE
    ORDER BY city, car_model;
 
   +--------+------------+---+
diff --git a/docs/sql-ref-syntax-qry-select-having.md 
b/docs/sql-ref-syntax-qry-select-having.md
index 94d9be6..dee1e3c 100644
--- a/docs/sql-ref-syntax-qry-select-having.md
+++ b/docs/sql-ref-syntax-qry-select-having.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -33,7 +33,7 @@ HAVING boolean_expression
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
     Specifies any expression that evaluates to a result type 
<code>boolean</code>. Two or
-    more expressions may be combined together using the logical 
+    more expressions may be combined together using the logical
     operators ( <code>AND</code>, <code>OR</code> ).<br><br>
 
     <b>Note</b><br>
@@ -70,7 +70,7 @@ SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city 
HAVING city = 'Fremo
 
 -- `HAVING` clause referring to aggregate function.
 SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING 
sum(quantity) > 15;
- 
+
   +-------+---+
   |   city|sum|
   +-------+---+
@@ -100,7 +100,7 @@ SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city 
HAVING max(quantity)
 
 -- `HAVING` clause referring to constant expression.
 SELECT city, sum(quantity) AS sum FROM dealer GROUP BY city HAVING 1 > 0 ORDER 
BY city;
-  
+
   +--------+---+
   |    city|sum|
   +--------+---+
@@ -116,7 +116,7 @@ SELECT sum(quantity) AS sum FROM dealer HAVING 
sum(quantity) > 10;
   +---+
   | 78|
   +---+
- 
+
 {% endhighlight %}
 
 ### Related Clauses
diff --git a/docs/sql-ref-syntax-qry-select-limit.md 
b/docs/sql-ref-syntax-qry-select-limit.md
index 06925e6..356930c 100644
--- a/docs/sql-ref-syntax-qry-select-limit.md
+++ b/docs/sql-ref-syntax-qry-select-limit.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -37,7 +37,7 @@ LIMIT { ALL | integer_expression }
   </dd>
   <dt><code><em>integer_expression</em></code></dt>
   <dd>
-    Specifies an expression that returns an integer. 
+    Specifies a foldable expression that returns an integer.
   </dd>
 </dl>
 
@@ -45,13 +45,13 @@ LIMIT { ALL | integer_expression }
 {% highlight sql %}
 CREATE TABLE person (name STRING, age INT);
 INSERT INTO person VALUES
-    ('Zen Hui', 25), 
-    ('Anil B', 18), 
-    ('Shone S', 16), 
+    ('Zen Hui', 25),
+    ('Anil B', 18),
+    ('Shone S', 16),
     ('Mike A', 25),
-    ('John A', 18), 
+    ('John A', 18),
     ('Jack N', 16);
-                        
+
 -- Select the first two rows.
 SELECT name, age FROM person ORDER BY name LIMIT 2;
 
@@ -76,7 +76,7 @@ SELECT name, age FROM person ORDER BY name LIMIT ALL;
   |Zen Hui|25 |
   +-------+---+
 
--- A function expression as an input to limit.
+-- A function expression as an input to LIMIT.
 SELECT name, age FROM person ORDER BY name LIMIT length('SPARK')
 
   +-------+---+
@@ -88,6 +88,11 @@ SELECT name, age FROM person ORDER BY name LIMIT 
length('SPARK')
   | Mike A| 25|
   |Shone S| 16|
   +-------+---+
+
+-- A non-foldable expression as an input to LIMIT is not allowed.
+SELECT name, age FROM person ORDER BY name LIMIT length(name)
+
+org.apache.spark.sql.AnalysisException: The limit expression must evaluate to 
a constant value ...
 {% endhighlight %}
 
 ### Related Clauses
diff --git a/docs/sql-ref-syntax-qry-select-orderby.md 
b/docs/sql-ref-syntax-qry-select-orderby.md
index c4b4ced..eb99dbb 100644
--- a/docs/sql-ref-syntax-qry-select-orderby.md
+++ b/docs/sql-ref-syntax-qry-select-orderby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -20,7 +20,7 @@ license: |
 ---
 The <code>ORDER BY</code> clause is used to return the result rows in a sorted 
manner
 in the user specified order. Unlike the [SORT 
BY](sql-ref-syntax-qry-select-sortby.html)
-clause, this clause guarantees a total order in the output. 
+clause, this clause guarantees a total order in the output.
 
 ### Syntax
 {% highlight sql %}
@@ -47,20 +47,18 @@ ORDER BY { expression [ sort_direction | nulls_sort_oder ] 
[ , ... ] }
   </dd>
   <dt><code><em>nulls_sort_order</em></code></dt>
   <dd>
-    Optionally specifies whether NULL values are returned before/after 
non-NULL values, based on the 
-    sort direction. In Spark, NULL values are considered to be lower than any 
non-NULL values by default.
-    Therefore the ordering of NULL values depend on the sort direction. If 
<code>null_sort_order</code> is
-    not specified, then NULLs sort first if sort order is <code>ASC</code> and 
NULLS sort last if 
-    sort order is <code>DESC</code>.<br><br>
+    Optionally specifies whether NULL values are returned before/after 
non-NULL values. If
+    <code>null_sort_order</code> is not specified, then NULLs sort first if 
sort order is
+    <code>ASC</code> and NULLS sort last if sort order is 
<code>DESC</code>.<br><br>
     <ol>
-      <li> If <code>NULLS FIRST</code> (the default) is specified, then NULL 
values are returned first 
+      <li> If <code>NULLS FIRST</code> is specified, then NULL values are 
returned first
            regardless of the sort order.</li>
       <li>If <code>NULLS LAST</code> is specified, then NULL values are 
returned last regardless of
            the sort order. </li>
     </ol><br>
     <b>Syntax:</b>
     <code>
-       [ NULLS { FIRST | LAST } ] 
+       [ NULLS { FIRST | LAST } ]
     </code>
   </dd>
 </dl>
@@ -75,7 +73,7 @@ INSERT INTO person VALUES
     (400, 'Jerry', NULL),
     (500, 'Dan',  50);
 
--- Sort rows by age. By default rows are sorted in ascending manner.
+-- Sort rows by age. By default rows are sorted in ascending manner with NULL 
FIRST.
 SELECT name, age FROM person ORDER BY age;
 
   +-----+----+
@@ -101,9 +99,9 @@ SELECT name, age FROM person ORDER BY age NULLS LAST;
   |Jerry|null|
   +-----+----+
 
--- Sort rows by age in descending manner.
+-- Sort rows by age in descending manner, which defaults to NULL LAST.
 SELECT name, age FROM person ORDER BY age DESC;
- 
+
   +-----+----+
   |name |age |
   +-----+----+
diff --git a/docs/sql-ref-syntax-qry-select-sortby.md 
b/docs/sql-ref-syntax-qry-select-sortby.md
index 1818a69..9b52738 100644
--- a/docs/sql-ref-syntax-qry-select-sortby.md
+++ b/docs/sql-ref-syntax-qry-select-sortby.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -49,20 +49,18 @@ SORT BY { expression [ sort_direction | nulls_sort_order ] 
[ , ... ] }
   </dd>
   <dt><code><em>nulls_sort_order</em></code></dt>
   <dd>
-    Optionally specifies whether NULL values are returned before/after 
non-NULL values, based on the 
-    sort direction. In Spark, NULL values are considered to be lower than any 
non-NULL values by default.
-    Therefore the ordering of NULL values depend on the sort direction. If 
<code>null_sort_order</code> is
-    not specified, then NULLs sort first if sort order is <code>ASC</code> and 
NULLS sort last if 
-    sort order is <code>DESC</code>.<br><br>
+    Optionally specifies whether NULL values are returned before/after 
non-NULL values. If
+    <code>null_sort_order</code> is not specified, then NULLs sort first if 
sort order is
+    <code>ASC</code> and NULLS sort last if sort order is 
<code>DESC</code>.<br><br>
     <ol>
-      <li> If <code>NULLS FIRST</code> (the default) is specified, then NULL 
values are returned first 
+      <li> If <code>NULLS FIRST</code> is specified, then NULL values are 
returned first
            regardless of the sort order.</li>
       <li>If <code>NULLS LAST</code> is specified, then NULL values are 
returned last regardless of
            the sort order. </li>
     </ol><br>
     <b>Syntax:</b>
     <code>
-       [ NULLS { FIRST | LAST } ] 
+       [ NULLS { FIRST | LAST } ]
     </code>
   </dd>
 </dl>
@@ -71,15 +69,15 @@ SORT BY { expression [ sort_direction | nulls_sort_order ] 
[ , ... ] }
 {% highlight sql %}
 CREATE TABLE person (zip_code INT, name STRING, age INT);
 INSERT INTO person VALUES
-    (94588, 'Zen Hui', 50), 
-    (94588, 'Dan Li', 18), 
+    (94588, 'Zen Hui', 50),
+    (94588, 'Dan Li', 18),
     (94588, 'Anil K', 27),
     (94588, 'John V', NULL),
     (94511, 'David K', 42),
     (94511, 'Aryan B.', 18),
     (94511, 'Lalit B.', NULL);
 
--- Use `REPARTITION` hint to partition the data by `zip_code` to 
+-- Use `REPARTITION` hint to partition the data by `zip_code` to
 -- examine the `SORT BY` behavior. This is used in rest of the
 -- examples.
 
@@ -128,9 +126,9 @@ SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code 
FROM person SORT BY age
   |null|Lalit B.|94511   |
   +----+--------+--------+
 
--- Sort rows by age within each partition in descending manner.
+-- Sort rows by age within each partition in descending manner, which defaults 
to NULL LAST.
 SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY 
age DESC;
- 
+
   +----+--------+--------+
   |age |name    |zip_code|
   +----+--------+--------+
@@ -143,7 +141,7 @@ SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code 
FROM person SORT BY age
   |null|Lalit B.|94511   |
   +----+--------+--------+
 
--- Sort rows by age within each partition in ascending manner keeping null 
values to be first.
+-- Sort rows by age within each partition in descending manner keeping null 
values to be first.
 SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person SORT BY 
age DESC NULLS FIRST;
 
   +----+--------+--------+
diff --git a/docs/sql-ref-syntax-qry-select-where.md 
b/docs/sql-ref-syntax-qry-select-where.md
index a493623..106053d 100644
--- a/docs/sql-ref-syntax-qry-select-where.md
+++ b/docs/sql-ref-syntax-qry-select-where.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -31,7 +31,7 @@ WHERE boolean_expression
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
     Specifies any expression that evaluates to a result type 
<code>boolean</code>. Two or
-    more expressions may be combined together using the logical 
+    more expressions may be combined together using the logical
     operators ( <code>AND</code>, <code>OR</code> ).
   </dd>
 </dl>
@@ -99,8 +99,8 @@ SELECT * FROM person WHERE age > (SELECT avg(age) FROM 
person);
   |300|Mike|80 |
   +---+----+---+
 
--- Correlated column reference in `WHERE` clause of subquery.
-SELECT * FROM person AS parent 
+-- Correlated Subquery in `WHERE` clause.
+SELECT * FROM person AS parent
 WHERE EXISTS (
               SELECT 1 FROM person AS child
               WHERE parent.id = child.id AND child.age IS NULL
diff --git a/docs/sql-ref-syntax-qry-select.md 
b/docs/sql-ref-syntax-qry-select.md
index 80b930f..e87c4a5 100644
--- a/docs/sql-ref-syntax-qry-select.md
+++ b/docs/sql-ref-syntax-qry-select.md
@@ -1,7 +1,7 @@
 ---
 layout: global
 title: SELECT
-displayTitle: SELECT 
+displayTitle: SELECT
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,39 +19,43 @@ license: |
   limitations under the License.
 ---
 Spark supports a `SELECT` statement and conforms to the ANSI SQL standard. 
Queries are
-used to retrieve result sets from one or more tables. The following section 
+used to retrieve result sets from one or more tables. The following section
 describes the overall query syntax and the sub-sections cover different 
constructs
-of a query along with examples. 
+of a query along with examples.
 
 ### Syntax
 {% highlight sql %}
 [ WITH with_query [ , ... ] ]
+select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] 
select_statement, ... ]
+[ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
+[ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } ]
+[ CLUSTER BY { expression [ , ...] } ]
+[ DISTRIBUTE BY { expression [, ...] } ]
+[ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
+[ LIMIT { ALL | expression } ]
+{% endhighlight %}
+
+While `select_statement` is defined as
+{% highlight sql %}
 SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
   FROM { from_item [ , ...] }
   [ WHERE boolean_expression ]
   [ GROUP BY expression [ , ...] ]
   [ HAVING boolean_expression ]
-  [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } 
]
-  [ SORT  BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ...] } 
]
-  [ CLUSTER BY { expression [ , ...] } ]
-  [ DISTRIBUTE BY { expression [, ...] } ]
-  { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
-  [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
-  [ LIMIT { ALL | expression } ]
 {% endhighlight %}
 
 ### Parameters
 <dl>
   <dt><code><em>with_query</em></code></dt>
   <dd>
-    Specifies the common table expressions (CTEs) before the main 
<code>SELECT</code> query block.
-    These table expressions are allowed to be referenced later in the main 
query. This is useful to abstract
-    out repeated subquery blocks in the main query and improves readability of 
the query.
+    Specifies the common table expressions (CTEs) before the main query block.
+    These table expressions are allowed to be referenced later in the FROM 
clause. This is useful to abstract
+    out repeated subquery blocks in the FROM clause and improves readability 
of the query.
   </dd>
   <dt><code><em>hints</em></code></dt>
   <dd>
     Hints can be specified to help spark optimizer make better planning 
decisions. Currently spark supports hints
-    that influence selection of join strategies and repartitioning of the 
data. 
+    that influence selection of join strategies and repartitioning of the data.
   </dd>
   <dt><code><em>ALL</em></code></dt>
   <dd>
@@ -77,7 +81,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression 
[ , ... ] }
       <li>Join relation</li>
       <li>Table valued function</li>
       <li>Inlined table</li>
-      <li>Subquery</li>    
+      <li>Subquery</li>
     </ol>
   </dd>
   <dt><code><em>WHERE</em></code></dt>
@@ -87,12 +91,13 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { 
named_expression [ , ... ] }
   <dt><code><em>GROUP BY</em></code></dt>
   <dd>
     Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
-    (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping 
expressions.
+    (MIN, MAX, COUNT, SUM, AVG, etc.) to group rows based on the grouping 
expressions and aggregate values in each group.
   </dd>
   <dt><code><em>HAVING</em></code></dt>
   <dd>
     Specifies the predicates by which the rows produced by GROUP BY are 
filtered. The HAVING clause is used to
-    filter rows after the grouping is performed.
+    filter rows after the grouping is performed. If HAVING is specified 
without GROUP BY, it indicates a GROUP BY
+    without grouping expressions (global aggregate).
   </dd>
   <dt><code><em>ORDER BY</em></code></dt>
   <dd>
@@ -108,17 +113,17 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { 
named_expression [ , ... ] }
   <dt><code><em>CLUSTER BY</em></code></dt>
   <dd>
     Specifies a set of expressions that is used to repartition and sort the 
rows. Using this clause has
-    the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT 
BY</code> together. 
+    the same effect of using <code>DISTRIBUTE BY</code> and <code>SORT 
BY</code> together.
   </dd>
   <dt><code><em>DISTRIBUTE BY</em></code></dt>
   <dd>
-    Specifies a set of expressions by which the result rows are repartitioned. 
This parameter is mutually 
-    exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can 
not be specified together. 
+    Specifies a set of expressions by which the result rows are repartitioned. 
This parameter is mutually
+    exclusive with <code>ORDER BY</code> and <code>CLUSTER BY</code> and can 
not be specified together.
   </dd>
   <dt><code><em>LIMIT</em></code></dt>
   <dd>
-    Specifies the maximum number of rows that can be returned by a statement 
or subquery. This clause 
-    is mostly used in the conjunction with <code>ORDER BY</code> to produce a 
deterministic result. 
+    Specifies the maximum number of rows that can be returned by a statement 
or subquery. This clause
+    is mostly used in the conjunction with <code>ORDER BY</code> to produce a 
deterministic result.
   </dd>
   <dt><code><em>boolean_expression</em></code></dt>
   <dd>
@@ -130,7 +135,7 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { 
named_expression [ , ... ] }
   </dd>
   <dt><code><em>named_window</em></code></dt>
   <dd>
-    Specifies aliases for one or more source window specifications. The source 
window specifications can 
+    Specifies aliases for one or more source window specifications. The source 
window specifications can
     be referenced in the widow definitions in the query.
   </dd>
 </dl>
diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md
index cd7c0ff..37414ac 100644
--- a/docs/sql-ref-syntax-qry.md
+++ b/docs/sql-ref-syntax-qry.md
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,14 +19,14 @@ license: |
   limitations under the License.
 ---
 
-Spark supports <code>SELECT</code> statement that is  used to retrieve rows
+Spark supports <code>SELECT</code> statement that is used to retrieve rows
 from one or more tables according to the specified clauses. The full syntax
 and brief description of supported clauses are explained in
 [SELECT](sql-ref-syntax-qry-select.html) section. Spark also provides the
 ability to generate logical and physical plan for a given query using
 [EXPLAIN](sql-ref-syntax-qry-explain.html) statement.
 
- 
+
 - [WHERE Clause](sql-ref-syntax-qry-select-where.html)
 - [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
 - [HAVING Clause](sql-ref-syntax-qry-select-having.html)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [SPARK-31110][DOCS][SQL] refine sql doc for SELECT

Reply via email to