szehon-ho commented on code in PR #10981:
URL: https://github.com/apache/iceberg/pull/10981#discussion_r1894535397


##########
format/spec.md:
##########
@@ -584,8 +589,8 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 | _optional_ | _optional_ | _optional_ | **`110  null_value_counts`**      | 
`map<121: int, 122: long>`                                                  | 
Map from column id to number of null values in the column                       
                                                                                
                                                   |
 | _optional_ | _optional_ | _optional_ | **`137  nan_value_counts`**       | 
`map<138: int, 139: long>`                                                  | 
Map from column id to number of NaN values in the column                        
                                                                                
                                                   |
 | _optional_ | _optional_ | _optional_ | **`111  distinct_counts`**        | 
`map<123: int, 124: long>`                                                  | 
Map from column id to number of distinct values in the column; distinct counts 
must be derived using values in the file by counting or using sketches, but not 
using methods like merging existing distinct counts |
-| _optional_ | _optional_ | _optional_ | **`125  lower_bounds`**           | 
`map<126: int, 127: binary>`                                                | 
Map from column id to lower bound in the column serialized as binary [1]. Each 
value must be less than or equal to all non-null, non-NaN values in the column 
for the file [2]                                     |
-| _optional_ | _optional_ | _optional_ | **`128  upper_bounds`**           | 
`map<129: int, 130: binary>`                                                | 
Map from column id to upper bound in the column serialized as binary [1]. Each 
value must be greater than or equal to all non-null, non-Nan values in the 
column for the file [2]                                  |
+| _optional_ | _optional_ | _optional_ | **`125  lower_bounds`**           | 
`map<126: int, 127: binary>`                                                | 
Map from column id to lower bound in the column serialized as binary [1]. Each 
value must be less than or equal to all non-null, non-NaN values in the column 
for the file [2]. See [7] for`geometry` and [8] for `geography`.  |
+| _optional_ | _optional_ | _optional_ | **`128  upper_bounds`**           | 
`map<129: int, 130: binary>`                                                | 
Map from column id to upper bound in the column serialized as binary [1]. Each 
value must be greater than or equal to all non-null, non-Nan values in the 
column for the file [2]. See [9] for `geometry` and [10] for `geography`. |

Review Comment:
   I made another attempt, also tried to refactor out common language in the 
suggestion into common point to reduce from 5 to 3 points.
   
   ```
   7. `geometry` and `geography`: this is a point: X, Y, Z, and M are the lower 
/ upper bound of all component points of all geometries in file.
   8. `geometry`: For the X value only, the lower_bound's X value (xmin) may be 
greater than the upper_bound's X value (xmax), in which case a geometry in the 
file may match if it contains an X such that `x >= xmin` OR `x <= xmax`. This 
definition is agnostic to coordinate system.
   9. `geography`: For the X and Y values, the `lower_bound`'s X (westernmost) 
and Y (northernmost) values may be greater than the `upper_bound`'s X 
(easternmost) and Y (southernmost) values. The canonical ranges for these 
points in the coordinate system is [-180 180] for west-east and [-90 90] for 
the south-north.
   ```
   
   Let me know how it sounds?
   
   Also, do we need westernmost, northernmost, etc language for geography now 
that we have a mathematical definition?  I was wondering whether we can even 
re-use the geometry math definition for x, and extend it to y-axis, so that we 
have even less text.  One reason is we introduce westernmost, northernmost, but 
don't really define them.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to