steveloughran commented on PR #12264: URL: https://github.com/apache/iceberg/pull/12264#issuecomment-2721144184
@mmgaggle I'm actually setting up the s3a tests to actually test through iceberg and parquet, so we can validate features and performance optimisations through our code. Initially, https://github.com/apache/hadoop/pull/7316 has gone in for the bulk delete API of #10233; (please can someone review/merge this!)... it will then act as a regression test of the s3a connector, as well as being easy test local iceberg/parquet builds against arbitrary stores through our test harness. That test harness uses the hadoop IOStatistics API to make assertions about the actual number of remote S3 calls made -this lets you identify regressions in the actually amount of network IO which takes place. Everyone cares about this. Even with this, you should have a test harness which * can be targeted at production S3 stores * contains a good set of operations, both low level FileIO and higher level API calls * has many of those tests abstracted up to work with all FileIO implementation. * provides really good diagnostics on test failures. If someone starts that, I'd be happy to help. What i'm not going to is say "here are the tests you need". I did try to do that with spark and the spark-hadoop-cloud module, but there was no interest in full integration tests. I'd only do it for iceberg as part of a collaborative work with others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org