geruh commented on code in PR #3007:
URL: https://github.com/apache/iceberg-python/pull/3007#discussion_r2780331652
##########
Makefile:
##########
@@ -133,6 +133,9 @@ test-gcs: ## Run tests marked with @pytest.mark.gcs
sh ./dev/run-gcs-server.sh
$(TEST_RUNNER) pytest tests/ -m gcs $(PYTEST_ARGS)
Review Comment:
nit: `test_partitioned_write` downloads a ~38MB taxi dataset from CloudFront
on every CI run. It's the only CI job that depends on an external CDN — if it
goes down this job fails. Might be worth a comment here noting the network
dependency, or consider caching it.
##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
path: .coverage*
include-hidden-files: true
+ benchmark-test:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v6
+ - uses: actions/setup-python@v6
+ with:
+ python-version: '3.12'
+ - name: Install UV
+ uses: astral-sh/setup-uv@v7
+ with:
+ enable-cache: true
+ - name: Install system dependencies
+ run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for
kerberos
+ - name: Install
+ run: make install
+ - name: Run benchmarks
+ run: make test-benchmark
Review Comment:
These files are never produced — I confirmed locally. The tests use
`timeit`/`tracemalloc` directly, not `pytest-benchmark`, so `.benchmarks/` and
`benchmark-*.json` are never generated. This step will always be a no-op.
Should we remove it or switch to `pytest-benchmark` so it actually generates
output?
##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
path: .coverage*
include-hidden-files: true
+ benchmark-test:
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v6
+ - uses: actions/setup-python@v6
+ with:
+ python-version: '3.12'
+ - name: Install UV
+ uses: astral-sh/setup-uv@v7
+ with:
+ enable-cache: true
+ - name: Install system dependencies
+ run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for
kerberos
+ - name: Install
+ run: make install
Review Comment:
`test_partitioned_write` currently prints average runtime but has no
threshold or assertion — it's effectively a smoke test that catches crashes,
not regressions. Worth adding a baseline assertion like the memory tests have,
or at least capturing the output somewhere reviewers can compare between runs.
##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
path: .coverage*
include-hidden-files: true
Review Comment:
Could we run the benchmarks against main as a baseline in the same job? e.g.
checkout main → run benchmarks → checkout PR → run benchmarks → print
comparison. Without some form of before/after, we're running benchmarks but not
detecting regressions — which is what issue #27 was asking for.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]