szehon-ho commented on PR #10981: URL: https://github.com/apache/iceberg/pull/10981#issuecomment-2533103710
Update, there was a sync with @jiayuasu @flyrain @dmitrykoval @paleolimbot @rdblue and Menelaos, it was decided the following (meeting notes): My summary is that we decided to have two types: * geometry(crs_id) always uses linear edges but can have a geographic CRS * geography(crs_id, algorithm) always uses geodesic edges defined by a geographic CRS * A geography’s algorithm approximates the edges and must be used consistently. A spherical approximation is considered a different algorithm. * The crs_id is opaque, but could be srid:<srid> to select a specific SRID, or projjson:<property-name> to select a JSON CRS in a table property * Neither Parquet nor Iceberg is responsible for providing CRS definitions, but may include them for convenience (if they can considering copyright or other legal considerations) Here are the specific points I think we decided on: * Planar/linear edges are always associated with the geometry type. Geometry should always use linear edges. * Parquet and Iceberg should have a geometry type because users already expect the linear behavior * Geometry needs to support geographic CRS * Geometry needs a CRS parameter, but not an edge parameter * Geography never uses linear edges * Geography edges are always interpreted as edges on the spheroid defined by the geographic CRS (geodesics) * An exception here, which is that if the algorithm specified is spherical, then we are talking about geodesics (great circle arcs) on a sphere. I think it is important to notice (and specify/require) that if the algorithm is spherical, then the radius of the underlying sphere is assumed/expected to be the mean radius of the spheroid specified by the CRS, where the mean radius is always defined as (2 * major_axis_length + minor_axis_length) / 3. * Geography bounding boxes must include the northmost/southmost points on edges * Geography edge calculations use a particular algorithm, which may introduce either approximation errors (for instance, Vincenty) or may simplify the problem and introduce representation errors (i.e. Spherical) * The edge calculation algorithm must be a parameter of the geography type (i.e. spherical, andoyer, vincenty, etc.) * The algorithm is set by what the writer creating the table can produce (vs having a default in the format) * Writers must not write if they cannot produce bounding boxes using the correct algorithm * Engines should reject non-geographic CRS for geography columns * we decided that coordinates should be limited to [-180, 180] and [-90, 90] for geography. updating the pr based on the same. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org