zhangfengcdt commented on PR #2038:
URL: https://github.com/apache/sedona/pull/2038#issuecomment-3029059823

   > @zhangfengcdt I think we're mostly on the same page actually. The spark 
index column (`__index_level_{}__`) I'm using does actually represent the index 
in geopandas. See the comment in the from the pyspark codebase 
[here](https://github.com/apache/spark/blob/master/python/pyspark/pandas/internal.py)
 below
   > 
   > ```python
   > # A function to turn given numbers to Spark columns that represent 
pandas-on-Spark index.
   > SPARK_INDEX_NAME_FORMAT = "__index_level_{}__".format
   > SPARK_DEFAULT_INDEX_NAME = SPARK_INDEX_NAME_FORMAT(0)
   > ```
   > 
   > > However, if no index is used in the GeoSeries creation, then we don't 
need to support alignment
   > 
   > If no index is given, pandas on pyspark creates a default index which we 
can use for the `align=True`. This is what the current tests use since we don't 
yet have index support.
   > 
   > Originally, I was proposing not to support `align=False`, where geopandas 
uses the "natural ordering" of the series instead of the given index. However, 
it looks like Pandas on PySpark does already have a [hidden natural ordering 
column](https://github.com/apache/spark/blob/a1e628574b7d9cdf89472fa550ecc41f8a871b98/python/pyspark/pandas/internal.py#L77-L79),
 so we can try using that.
   > 
   > Regardless, if the current default `align=True` logic sounds good to you, 
I'd rather merge this in now and revisit additional functionality 
(`align=False`) later when we add indexes (creating a separate issue of 
course). Does that make sense?
   
   Cool, that works for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to