zhangfengcdt commented on PR #2038: URL: https://github.com/apache/sedona/pull/2038#issuecomment-3028695719
> @zhangfengcdt For these row-wise operations, the default align=True matches elements using Spark internal index column. I'm planning to never support `align=False` in any of these row_wise functions in Sedona because Spark dataframes do not naturally support a deterministic order to my understanding. It's possible to hack something up with an extra column in the cases where the user specifies an index, but I don't think it makes sense to support that to be honest. What do you think? I think the API is to match on the order of geopandas index column (not spark dataframe index). This is actually useful because it is the only way the user would be able to align other columns with the resulting `intersects` results. And this seems to be a pretty common case. However, if no index is used in the GeoSeries creation, then we don't need to support alignment. I am thinking if there is index created and users want to do align, we should add the original index column to the SQL sent to Sedona, and resulting PySpark Series should keep this index column in it. This way, end users could link the index to intersects results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
