JonasJ-ap commented on PR #6449:
URL: https://github.com/apache/iceberg/pull/6449#issuecomment-1374232612

   > I read the demo one more time, thanks @JonasJ-ap this is super helpful in 
understanding the whole experience. I think one way we can go with this PR is 
that, because we are using Delta standalone, instead of following the same 
pattern for having a base class action and then a Spark extension, we can 
directly make Delta to Iceberg conversion work just within the 
`iceberg-delta-lake` module. Unit test can be done end to end to test the 
conversion capability.
   > 
   > The code in Spark module can be completely removed, as user can simply get 
Hadoop configuration from Spark session and invoke the method in the 
`iceberg-delta-lake` module.
   > 
   > By doing this, it also completely removes the concern of adding Delta 
dependency to Iceberg Spark distribution.
   > 
   > Trino can be intergrated in the same way to offer the conversion logic. 
@findepi
   > 
   > There could be improvements made in the future to parallelize the 
retrieval of delta log in different engines like Spark and Trino, but those 
could be added later as extension points in the `iceberg-delta-lake` module.
   > 
   > What do you think? @JonasJ-ap
   
   
   > I was thinking about this last night, feels like the current blocker is 
unit testing with Spark, but actually if we just use `iceberg-spark` as a test 
dependency, we can move all the tests currently in the Spark module to 
`iceberg-delta-lake` and satisfy the goal of unit-testing the implementation 
without the need to find another way to write Delta lake tables outside Spark.
   
   Thank you very much for your suggestions. I agree that it is unnecessary to 
create a spark action to just provide hadoop configuration. We may have some 
trouble if we include `iceberg-spark` for the unit test as the Java CI does not 
include iceberg-spark in the build. Instead, we can make spark-related tests as 
integrationTest to avoid this issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to