Re: [PR] [GH-2037] Implement _row_wise_operation + intersection, intersect [sedona]

via GitHub Wed, 02 Jul 2025 10:33:34 -0700


zhangfengcdt commented on PR #2038:
URL: https://github.com/apache/sedona/pull/2038#issuecomment-3028695719


   > @zhangfengcdt For these row-wise operations, the default align=True 
matches elements using Spark internal index column. I'm planning to never 
support `align=False` in any of these row_wise functions in Sedona because 
Spark dataframes do not naturally support a deterministic order to my 
understanding. It's possible to hack something up with an extra column in the 
cases where the user specifies an index, but I don't think it makes sense to 
support that to be honest. What do you think?
   
   I think the API is to match on the order of geopandas index column (not 
spark dataframe index). This is actually useful because it is the only way the 
user would be able to align other columns with the resulting `intersects` 
results. And this seems to be a pretty common case. However, if no index is 
used in the GeoSeries creation, then we don't need to support alignment.
   
   I am thinking if there is index created and users want to do align, we 
should add the original index column to the SQL sent to Sedona, and resulting 
PySpark Series should keep this index column in it. This way, end users could 
link the index to intersects results.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2037] Implement _row_wise_operation + intersection, intersect [sedona]

Reply via email to