danielcweeks commented on PR #7194: URL: https://github.com/apache/iceberg/pull/7194#issuecomment-1490738762
> > this is introducing something that we don't do in the Iceberg project: Invoke an external provider to perform operations on behalf of the library. > > Yes I 100% agree with that. But first of all this is within the aws module which is an external provider integration. And secondly, without this PR, we will likely also contribute things like CloudWatch, SQS and SNS metrics reporter. > > So the questions maybe you could help share your opinions are that: > > 1. would it work if we contribute CloudWatch, SQS and SNS metrics reporters, such that these compaction actions could be triggered from that point, which is outside the Iceberg library > 2. would it be okay with the community if we publish a separated open source repository for metrics reporters like this, such that they will be a part of the EMR distribution, and users can plug it into a catalog if necessary. > 3. we could also develop a metrics reporter for a specific engine like Spark, and we contact the Spark endpoint to submit such a job instead of doing it through an external service provider's API. Would that be a viable approach? I think we should clarify that the `iceberg-aws` module was originally introduced to integrate the storage aspects of AWS S3 with Iceberg (similar to GCP, etc.) since cloud provides are ubiquitous for data warehousing and that aligns well with the requirements to use Iceberg. The existence of the module isn't justification to include anything AWS related. We should focus on things that are highly aligned and generalizable. For items 1 and 2 above, I would lean towards publishing them separately since they are pluggable and can be integrated with other AWS libraries and tools. For number 3, I think invocations and automatic callbacks into systems takes us out of the format+integration space and starts down the path of defining the runtime activity, which is where we've historically said we draw the line on what should, or should not be part of Iceberg. This is very much a slippery slope because if we allow vendor solutions or hooks like this (regardless of how convenient/optional it may be) we would need to be open to the same contributions from all vendors (cloud or otherwise). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
