Kontinuation commented on code in PR #1992: URL: https://github.com/apache/sedona/pull/1992#discussion_r2163440211
########## common/src/main/java/org/apache/sedona/common/S2Geography/S2Geography.java: ########## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.sedona.common.S2Geography; + +import com.google.common.geometry.*; +import java.io.*; +import java.util.List; + +/** + * An abstract class represent S2Geography. Has 6 subtypes of geography: POINT, POLYLINE, POLYGON, + * GEOGRAPHY_COLLECTION, SHAPE_INDEX, ENCODED_SHAPE_INDEX. + */ +public abstract class S2Geography { + protected final GeographyKind kind; + + protected S2Geography(GeographyKind kind) { + this.kind = kind; + } + + public enum GeographyKind { + UNINITIALIZED(0), + POINT(1), + POLYLINE(2), + POLYGON(3), + GEOGRAPHY_COLLECTION(4), + SHAPE_INDEX(5), + ENCODED_SHAPE_INDEX(6), + CELL_CENTER(7); + + private final int kind; + + GeographyKind(int kind) { + this.kind = kind; + } + + /** Returns the integer tag for this kind. */ + public int getKind() { + return kind; + } + /** + * Look up the enum by its integer tag. + * + * @throws IllegalArgumentException if no matching kind exists. + */ + public static GeographyKind fromKind(int kind) { + for (GeographyKind k : values()) { + if (k.getKind() == kind) return k; + } + throw new IllegalArgumentException("Unknown GeographyKind: " + kind); + } + } + /** + * @return 0, 1, or 2 if all Shape()s that are returned will have the same dimension (i.e., they + * are all points, all lines, or all polygons). + */ + public abstract int dimension(); + + /** + * Usage of checking all shapes in side collection geography + * + * @return + */ + protected final int computeDimensionFromShapes() { + if (numShapes() == 0) return -1; + int dim = shape(0).dimension(); + for (int i = 1; i < numShapes(); ++i) { + if (dim != shape(i).dimension()) return -1; + } + return dim; + } + + /** + * @return The number of S2Shape objects needed to represent this Geography + */ + public abstract int numShapes(); + + /** + * Returns the given S2Shape (where 0 <= id < num_shapes()). The caller retains ownership of the + * S2Shape but the data pointed to by the object requires that the underlying Geography outlives + * the returned object. + * + * @param id (where 0 <= id < num_shapes()) + * @return the given S2Shape + */ + public abstract S2Shape shape(int id); + + /** + * Returns an S2Region that represents the object. The caller retains ownership of the S2Region + * but the data pointed to by the object requires that the underlying Geography outlives the + * returned object. + * + * @return S2Region + */ + public abstract S2Region region(); + + /** + * Adds an unnormalized set of S2CellIDs to `cell_ids`. This is intended to be faster than using + * Region().GetCovering() directly and to return a small number of cells that can be used to + * compute a possible intersection quickly. + */ + public void getCellUnionBound(List<S2CellId> cellIds) { + // Build a shape index of all shapes in this geography + S2ShapeIndex index = new S2ShapeIndex(); + for (int i = 0; i < numShapes(); i++) { + index.add(shape(i)); + } + // Create a region from the index and delegate covering + S2ShapeIndexRegion region = new S2ShapeIndexRegion(index); + region.getCellUnionBound(cellIds); + } + + // ─── Encoding / decoding machinery ──────────────────────────────────────────── + /** + * Serialize this geography to an encoder. This does not include any encapsulating information + * (e.g., which geography type or flags). Encode this geography into a stream as: 1) a 5-byte + * EncodeTag header (see EncodeTag encode / decode) 2) coveringSize × 8-byte cell-ids 3) the raw + * shape payload (point/polyline/polygon) via the built-in coder + * + * @param opts CodingHint.FAST / CodingHint.COMPACT / Include or omit the cell‐union covering + * prefix + */ + public abstract void encode(OutputStream os, EncodeOptions opts) throws IOException; + + // public abstract S2Geography decode(DataInputStream in) throws IOException; Review Comment: This still does not look correct to me. We should also implement `encodeTagged` here. `encodeTagged` constructs the `EncodeTag` object and write it into the output stream, then calls the abstract `encode` method. `encode` method are implemented by subclasses, they don't need to deal with `EncodeTag` in their `encode` implementations. `PointGeography` has an optimized representation when it encodes only one point in compact mode, so it overrides the default `encodeTagged` implementation provided by `S2Geography` parent class to to provide a more optimized implementation. Other S2Geography subclasses don't do this. They just implement `encode`, and let the parent class encode tags. The `decodeTagged` implementations I saw in the previous round of review look good, we should not remove them. ########## common/src/main/java/org/apache/sedona/common/S2Geography/S2Geography.java: ########## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.sedona.common.S2Geography; + +import com.google.common.geometry.*; +import java.io.*; +import java.util.List; + +/** + * An abstract class represent S2Geography. Has 6 subtypes of geography: POINT, POLYLINE, POLYGON, + * GEOGRAPHY_COLLECTION, SHAPE_INDEX, ENCODED_SHAPE_INDEX. + */ +public abstract class S2Geography { + protected final GeographyKind kind; + + protected S2Geography(GeographyKind kind) { + this.kind = kind; + } + + public enum GeographyKind { + UNINITIALIZED(0), + POINT(1), + POLYLINE(2), + POLYGON(3), + GEOGRAPHY_COLLECTION(4), + SHAPE_INDEX(5), + ENCODED_SHAPE_INDEX(6), + CELL_CENTER(7); + + private final int kind; + + GeographyKind(int kind) { + this.kind = kind; + } + + /** Returns the integer tag for this kind. */ + public int getKind() { + return kind; + } + /** + * Look up the enum by its integer tag. + * + * @throws IllegalArgumentException if no matching kind exists. + */ + public static GeographyKind fromKind(int kind) { + for (GeographyKind k : values()) { + if (k.getKind() == kind) return k; + } + throw new IllegalArgumentException("Unknown GeographyKind: " + kind); + } + } + /** + * @return 0, 1, or 2 if all Shape()s that are returned will have the same dimension (i.e., they + * are all points, all lines, or all polygons). + */ + public abstract int dimension(); + + /** + * Usage of checking all shapes in side collection geography + * + * @return + */ + protected final int computeDimensionFromShapes() { + if (numShapes() == 0) return -1; + int dim = shape(0).dimension(); + for (int i = 1; i < numShapes(); ++i) { + if (dim != shape(i).dimension()) return -1; + } + return dim; + } + + /** + * @return The number of S2Shape objects needed to represent this Geography + */ + public abstract int numShapes(); + + /** + * Returns the given S2Shape (where 0 <= id < num_shapes()). The caller retains ownership of the + * S2Shape but the data pointed to by the object requires that the underlying Geography outlives + * the returned object. + * + * @param id (where 0 <= id < num_shapes()) + * @return the given S2Shape + */ + public abstract S2Shape shape(int id); + + /** + * Returns an S2Region that represents the object. The caller retains ownership of the S2Region + * but the data pointed to by the object requires that the underlying Geography outlives the + * returned object. + * + * @return S2Region + */ + public abstract S2Region region(); + + /** + * Adds an unnormalized set of S2CellIDs to `cell_ids`. This is intended to be faster than using + * Region().GetCovering() directly and to return a small number of cells that can be used to + * compute a possible intersection quickly. + */ + public void getCellUnionBound(List<S2CellId> cellIds) { + // Build a shape index of all shapes in this geography + S2ShapeIndex index = new S2ShapeIndex(); + for (int i = 0; i < numShapes(); i++) { + index.add(shape(i)); + } + // Create a region from the index and delegate covering + S2ShapeIndexRegion region = new S2ShapeIndexRegion(index); + region.getCellUnionBound(cellIds); + } + + // ─── Encoding / decoding machinery ──────────────────────────────────────────── + /** + * Serialize this geography to an encoder. This does not include any encapsulating information + * (e.g., which geography type or flags). Encode this geography into a stream as: 1) a 5-byte + * EncodeTag header (see EncodeTag encode / decode) 2) coveringSize × 8-byte cell-ids 3) the raw + * shape payload (point/polyline/polygon) via the built-in coder + * + * @param opts CodingHint.FAST / CodingHint.COMPACT / Include or omit the cell‐union covering + * prefix + */ + public abstract void encode(OutputStream os, EncodeOptions opts) throws IOException; + + // public abstract S2Geography decode(DataInputStream in) throws IOException; Review Comment: The overall structure will be identical to the original C++ implementation. In S2Geography, we implement `encodeTagged` and `decodeTagged`, and defines `encode` abstract methods for subclasses to implement: ```java public class S2Geography { public void encodeTagged(OutputStream os, EncodeOptions opts) throws IOException { DataOutputStream out = new DataOutputStream(os); // 1) Get covering if needed // ... // 2) Write tag header // ... // 3) Write the geography this.encode(out, opts); out.flush(); } public static S2Geography decodeTagged(InputStream is) throws IOException { DataInputStream in = new DataInputStream(is); // 1) decode the tag EncodeTag tag = ... // 2) dispatch to subclass's decode method according to tag.kind switch (tag.kind) { case GeographyKind::POINT: return PointGeography::decode(in, tag); case ...: } } public abstract void encode(DataOutputStream out, EncodeOptions opts) throws IOException; } ``` The C++ version defines `decode` as a non-static function, while the implementation I reviewed the last time defines it as a static method. Both approaches are fine. PointGeography implements `encode` and `decode`, it also overrides `encodeTagged` to implement an optimized encoding by omitting the points when possible. ```java public class PointGeography { @Override public void encodeTagged(OutputStream os, EncodeOptions opts) throws IOException { if (points.size() == 1 && opts.getCodingHint() == EncodeOptions.CodingHint.COMPACT) { // Optimized encoding which only uses covering to represent the point return; } // In other cases, fallback to the default encodeTagged implementation: super.encodeTagged(os, opts); } @Override public void encode(DataOutputStream out, EncodeOptions opts) throws IOException { // encode point list S2Point.Shape shp = S2Point.Shape.fromList(points); switch (opts.getCodingHint()) { case FAST: ... case COMPACT: ... } } public static PointGeography decode(DataInputStream in, EncodeTag tag) throws IOException { ... } } ``` For any other geography types without an optimized encoded representation, only `encode` and `decode` needs to be implemented: ```java public class PolygonGeography { @Override public void encode(DataOutputStream out, EncodeOptions opts) throws IOException { for (S2Polygon poly : polygons) { poly.encode(out); } } public static PolygonGeography decode(DataInputStream in, EncodeTag tag) throws IOException { ... } } ``` ########## common/src/main/java/org/apache/sedona/common/S2Geography/PolylineGeography.java: ########## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.sedona.common.S2Geography; + +import com.google.common.collect.ImmutableList; +import com.google.common.geometry.*; +import java.io.*; +import java.util.ArrayList; +import java.util.Collection; +import java.util.List; +import java.util.logging.Logger; + +/** A Geography representing zero or more polylines using S2Polyline. */ +public class PolylineGeography extends S2Geography { + private static final Logger logger = Logger.getLogger(PolylineGeography.class.getName()); + + private final List<S2Polyline> polylines; + + private static int sizeofInt() { + return Integer.BYTES; + } + + public PolylineGeography() { + super(GeographyKind.POLYLINE); + this.polylines = new ArrayList<>(); + } + + public PolylineGeography(S2Polyline polyline) { + super(GeographyKind.POLYLINE); + this.polylines = new ArrayList<>(); + this.polylines.add(polyline); + } + + public PolylineGeography(List<S2Polyline> polylines) { + super(GeographyKind.POLYLINE); + this.polylines = new ArrayList<>(polylines); + } + + @Override + public int dimension() { + return polylines.isEmpty() ? -1 : 1; + } + + @Override + public int numShapes() { + return polylines.size(); + } + + @Override + public S2Shape shape(int id) { + return polylines.get(id); + } + + @Override + public S2Region region() { + Collection<S2Region> polylineRegionCollection = new ArrayList<>(); + polylineRegionCollection.addAll(polylines); + S2RegionUnion union = new S2RegionUnion(polylineRegionCollection); + return union; + } + + @Override + public void getCellUnionBound(List<S2CellId> cellIds) { + // Fallback to default Geography logic via shape index region + super.getCellUnionBound(cellIds); + } + + public List<S2Polyline> getPolylines() { + return ImmutableList.copyOf(polylines); + } + + @Override + public void encode(OutputStream os, EncodeOptions opts) throws IOException { + // Wrap your stream in a little-endian DataOutput + DataOutputStream out = new DataOutputStream(os); + + // 1) Serialize any covering cells (if you have them) + List<S2CellId> cover = new ArrayList<>(); + if (opts.isIncludeCovering()) { + getCellUnionBound(cover); + if (cover.size() > 255) { + cover.clear(); + logger.warning("Covering size too large (> 255) — clear Covering"); + } + } + + // 2) Write tag header (unchanged) using leOut.writeByte(...) + EncodeTag tag = new EncodeTag(opts); + tag.setKind(GeographyKind.POLYLINE); + tag.setCoveringSize((byte) cover.size()); + tag.encode(out); + for (S2CellId c2 : cover) { + out.writeLong(c2.id()); + } + + // 3) **Critical**: write the number of polylines in little-endian + out.writeInt(polylines.size()); Review Comment: `DataOutputStream` always write multi-byte numbers as big-endian. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
