syun64 commented on issue #368: URL: https://github.com/apache/iceberg-python/issues/368#issuecomment-2020762590
Hi @brianfromoregon and @corleyma , from my understanding of PyIceberg and PySpark Iceberg, I'm not sure if allowing the two separate clients to participate in the same transaction will be possible any time soon. Currently, Transactions are designed as classes, and they are available only to the specific client that's building it. This feature request implies that the transaction should be shared between the two separate clients which would need either: 1. the Transaction class to be exchanged in a way that can be understood by both Spark and Python within the same machine (presumably the Spark driver) 2. or have Transaction that is sent to an intelligent Catalog backend, that doesn't commit it immediately, but stages the transaction - so that the transaction can be looked up with a unique identifier and built upon by separate clients, until it is committed. Is there a specific use case you are thinking of that requires both PySpark-Iceberg and PyIceberg? We know PyIceberg is still evolving, but it is growing fast and we will reach somewhat feature parity in the near future. After that, the choice of the client we use would really depend on the use case - would it require the built in distributed capabilities of spark? or do we want to perform simpler transactions through PyIceberg? @Fokko - do you have any thoughts on this topic? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org