Copilot commented on code in PR #2040:
URL: https://github.com/apache/sedona/pull/2040#discussion_r2180807845


##########
python/sedona/geopandas/geoseries.py:
##########
@@ -1023,6 +1252,42 @@ def from_shapely(
     def from_arrow(cls, arr, **kwargs) -> "GeoSeries":
         raise NotImplementedError("GeoSeries.from_arrow() is not implemented 
yet.")
 
+    @classmethod
+    def _create_from_select(
+        cls, select: str, data, schema, index, crs, **kwargs
+    ) -> "GeoSeries":
+
+        from pyspark.pandas.utils import default_session
+        from pyspark.pandas.internal import InternalField
+        import numpy as np
+
+        if isinstance(data, list) and not isinstance(data[0], (tuple, list)):
+            data = [(obj,) for obj in data]
+
+        select = f"{select} as geometry"
+
+        print(data)
+        print(select)

Review Comment:
   Remove the `print(select)` debug statement to avoid unintended console 
output; if insight into the SQL expression is needed, use a logger instead.
   ```suggestion
           logger = logging.getLogger(__name__)
           logger.info(data)
           logger.info(select)
   ```



##########
python/sedona/geopandas/geoseries.py:
##########
@@ -1023,6 +1252,42 @@ def from_shapely(
     def from_arrow(cls, arr, **kwargs) -> "GeoSeries":
         raise NotImplementedError("GeoSeries.from_arrow() is not implemented 
yet.")
 
+    @classmethod
+    def _create_from_select(
+        cls, select: str, data, schema, index, crs, **kwargs
+    ) -> "GeoSeries":
+
+        from pyspark.pandas.utils import default_session
+        from pyspark.pandas.internal import InternalField
+        import numpy as np
+
+        if isinstance(data, list) and not isinstance(data[0], (tuple, list)):
+            data = [(obj,) for obj in data]
+
+        select = f"{select} as geometry"
+
+        print(data)
+        print(select)

Review Comment:
   The debug `print(data)` statement can clutter logs in production. Consider 
removing it or replacing it with a structured logging call at an appropriate 
log level.
   ```suggestion
           logger.debug(data)
           logger.debug(select)
   ```



##########
python/sedona/geopandas/geoseries.py:
##########
@@ -132,6 +129,20 @@ def __init__(
                     "allow_override=True)' to overwrite CRS or "
                     "'GeoSeries.to_crs(crs)' to reproject geometries. "
                 )
+            # This is a temporary workaround since pyspark errors when 
creating a ps.Series from a ps.Series
+            # This is NOT a scalable solution since we call to_pandas() on the 
data and is a hacky solution
+            # but this should be resolved if/once 
https://github.com/apache/spark/pull/51300 is merged in.
+            # For now, we reset self._anchor = data to have keep the geometry 
information (e.g crs) that's lost in to_pandas()
+            super().__init__(
+                data=data.to_pandas(),
+                index=index,
+                dtype=dtype,
+                name=name,
+                copy=copy,
+                fastpath=fastpath,
+            )
+
+            self._anchor = data

Review Comment:
   After the workaround, `self._col_label` is not restored. This may break 
geometry column labeling. Please reassign `self._col_label = index` (or the 
appropriate label) immediately after `self._anchor = data`.
   ```suggestion
               self._anchor = data
               self._col_label = index
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to