wombatu-kun opened a new pull request, #16540: URL: https://github.com/apache/iceberg/pull/16540
## Summary Closes #16530. The `iceberg-azure` integration tests (`TestADLSFileIO`, `TestADLSInputStream`, `TestADLSOutputStream`) start a shared Azurite container in `AzuriteTestBase.@BeforeAll`. They intermittently fail at startup with a `ContainerFetchException` caused by a `NotFoundException` (HTTP 404) while testcontainers pulls `mcr.microsoft.com/azure-storage/azurite:3.35.0`. The image tag is already pinned, so this is a transient registry/CDN failure (rate-limiting or CDN inconsistency), not a missing image — re-running usually succeeds, which is exactly what makes these tests flaky. ## What changed `AzuriteContainer` now overrides `start()` to retry transient image-fetch failures before giving up: up to 5 attempts with exponential backoff (2s → 4s → 8s → 16s). Each retry re-invokes `start()`, which re-triggers the image pull, because testcontainers' `RemoteDockerImage` does not cache a failed resolution. Notably, testcontainers' own `withStartupAttempts(...)` does not help here: in testcontainers 2.0.5 the image is fetched (`getDockerImageName()` → `RemoteDockerImage`) *before* the `startupAttempts` retry loop in `GenericContainer.doStart()`, so a fetch failure short-circuits that loop. Retrying `start()` itself is what actually re-attempts the pull. The retry is deliberately narrow: it only retries when the failure (or its cause chain) is a `ContainerFetchException`. Genuine launch failures — e.g. a wait-strategy timeout surfaced as a `ContainerLaunchException` without a fetch cause — are rethrown immediately so real problems are never masked. ## Tests - New `TestAzuriteContainer` exercises the retry helper without Docker: succeeds without retry, retries until success, rethrows after exhausting attempts, does not retry non-fetch launch failures, and retries a fetch failure wrapped in a `ContainerLaunchException`. - Confirmed the existing Azurite integration tests still pass through the overridden `start()` (`TestADLSFileIO`, `TestADLSInputStream`, `TestADLSOutputStream`) with Docker. ## Note This reduces the flakiness by tolerating transient fetch hiccups; it cannot mask a registry outage that persists across all retries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
