petern48 commented on code in PR #2784:
URL: https://github.com/apache/sedona/pull/2784#discussion_r2995744963
##########
python/tests/geopandas/test_geoseries.py:
##########
@@ -616,6 +616,32 @@ def test_to_arrow(self):
def test_clip(self):
pass
+ def test_clip_by_rect(self):
+ s = GeoSeries(
+ [
+ Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
+ LineString([(0, 0), (2, 2)]),
+ Point(0.5, 0.5),
+ Point(5, 5),
+ None,
+ ],
+ )
+ result = s.clip_by_rect(0, 0, 1, 1)
+ expected = gpd.GeoSeries(
+ [
+ Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
+ LineString([(0, 0), (2, 2)]),
+ Point(0.5, 0.5),
+ Point(5, 5),
+ None,
+ ]
+ ).clip_by_rect(0, 0, 1, 1)
Review Comment:
Currently, it's being tested by comparing the results with whatever
`geopandas` outputs. We do that kinda of `match` test cases.
In `test_geoseries.py`, we should instead be comparing the outputs to
hard-coded end results. AKA This should hard-code the expected results rather
than calling `.clip_by_rect()`.
##########
python/sedona/spark/geopandas/geoseries.py:
##########
@@ -835,6 +835,16 @@ def dwithin(self, other, distance, align=None):
default_val=False,
)
+ def clip_by_rect(self, xmin, ymin, xmax, ymax) -> "GeoSeries":
+ rect = stc.ST_PolygonFromEnvelope(
+ float(xmin), float(ymin), float(xmax), float(ymax)
Review Comment:
Could we instead have an explicit check that raises an error message if the
input isn't valid? Similar to how they do it
[here](https://github.com/shapely/shapely/blob/aa49ee09587b758ec34c14d96d00ceca58092254/shapely/constructive.py#L417-L418)
(geopandas relies on shapely for the actual implementations, hence why the
code is in that repo)?
##########
python/sedona/spark/geopandas/base.py:
##########
@@ -3073,6 +3049,55 @@ def dwithin(self, other, distance, align=None):
"""
return _delegate_to_geometry_column("dwithin", self, other, distance,
align)
+ def clip_by_rect(self, xmin, ymin, xmax, ymax):
+ """Returns a ``GeoSeries`` of the portions of geometry within the
+ given rectangle.
+
+ The geometry is clipped to the rectangle defined by the given
+ coordinates. Geometries that do not intersect the rectangle are
+ returned as empty geometry collections.
+
+ Parameters
+ ----------
+ xmin : float
+ Minimum x value of the rectangle.
+ ymin : float
+ Minimum y value of the rectangle.
+ xmax : float
+ Maximum x value of the rectangle.
+ ymax : float
+ Maximum y value of the rectangle.
+
+ Returns
+ -------
+ GeoSeries
+
+ Examples
+ --------
+ >>> from sedona.spark.geopandas import GeoSeries
+ >>> from shapely.geometry import Polygon, LineString, Point
+ >>> s = GeoSeries(
+ ... [
+ ... Polygon([(0, 0), (2, 0), (2, 2), (0, 2)]),
+ ... LineString([(0, 0), (2, 2)]),
+ ... Point(1, 1),
Review Comment:
Ideally, add a test case that demonstrates the behavior when the
intersection is empty. Again, I suggest just grabbing a case from the original
geopandas documentation:
https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.clip_by_rect.html
##########
python/tests/geopandas/test_match_geopandas_series.py:
##########
@@ -495,6 +495,19 @@ def test_to_arrow(self):
def test_clip(self):
pass
+ def test_clip_by_rect(self):
+ # Use rect (0.3, 0.3, 1.7, 1.7) so no test-geometry vertex or hole
+ # coordinate (0, 0.1, 0.2, 1, 2, …) lands on a rectangle boundary.
+ # This avoids boundary-handling differences between JTS and GEOS.
+ for geom in self.geoms:
+ if isinstance(geom[0], (LinearRing, GeometryCollection)):
+ continue
+ if not gpd.GeoSeries(geom).is_valid.all():
+ continue
Review Comment:
Could you explain why these are needed? i.e. what outputs if we remove them?
Depending on the behavior, we might need to investigate further before
merging or simply document that this is the case.
##########
python/sedona/spark/geopandas/base.py:
##########
@@ -3073,6 +3049,55 @@ def dwithin(self, other, distance, align=None):
"""
return _delegate_to_geometry_column("dwithin", self, other, distance,
align)
+ def clip_by_rect(self, xmin, ymin, xmax, ymax):
+ """Returns a ``GeoSeries`` of the portions of geometry within the
+ given rectangle.
+
+ The geometry is clipped to the rectangle defined by the given
+ coordinates. Geometries that do not intersect the rectangle are
+ returned as empty geometry collections.
+
Review Comment:
Add a note saying that results may differ from geopandas due to
implementation details.
Is there a reason you left out this line from the [original geopandas
docs](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.clip_by_rect.html)?
`Note: empty geometries or geometries that do not overlap with the specified
bounds will result in GEOMETRYCOLLECTION EMPTY.`
If this isn't true for our implementation, please document the behavior for
the case where the intersection is empty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]