sdd opened a new pull request, #497: URL: https://github.com/apache/iceberg-rust/pull/497
This PR adds some performance testing capabilities. It includes the following features: * docker-compose environment that includes containers for Minio, Spark, HAProxy and the Iceberg REST Catalog * Uses HAProxy to simulate real-world latency and bandwidth constraints of connections to services like S3 * Includes scripting to create an Iceberg table in the performance testing environment and populate it with data from the widely-used NYC Taxi dataset * Adds a justfile for ease of creating, initialising, starting, stopping and tearing down the performance testing environment * Adds some Criterion benchmarks that use the performance testing environment to test the performance of `TableScan.plan_files` in four different representative scenarios This is still a work-in-progress - especially the support code around working with docker-compose. I've been using this on MacOS using OrbStack and so there will probably need to be some work done to ensure compatibility with Linux hosts / docker / podman. I see that @alexyin1 has been working on Podman support in https://github.com/apache/iceberg-rust/pull/489. I'll work to make sure that our combined efforts are aligned. Unlike the previous docker-based integration tests, at the moment the tests in here require the developer to manually run tasks from the justfile in order to setup / start / stop the docker environment. The decision to do it this way was because of the longer setup times due to needing to download and insert data. I'm open to suggestions on better approaches. TODO: I've not yet included the scripting to retrieve the source data for NYC Taxi yet and will add over the next couple of days. I wanted to get this PR in early to get some feedback. I'll use this suite to measure performance changes on the concurrent table scan PR as well as a couple of other read-performance-related changes that I have that I'm in the progress of turning into PRs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org