Kontinuation commented on PR #2593: URL: https://github.com/apache/sedona/pull/2593#issuecomment-4150502758
I still have the old concern: the performance improvement mainly comes from batch serialization and deserialization (the newly added `from_sedona` and `to_sedona` functions), SedonaDB is not a necessary component to achieve this speed up, and depending it improves the overall complexity. I tried implementing batch UDF without SedonaDB in https://github.com/Kontinuation/sedona/commit/528a26a4b8e6ce5c50b9ef98dded3978db1a3957 and the performance is on-par with SedonaDB-based UDF. I think this SedonaDB-based UDF is still good to have, but I think we need to have more understanding about the performance characteristics and how we benefit from SedonaDB, and whether we can achieve similar performance while introducing less complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
