geruh commented on code in PR #3007:
URL: https://github.com/apache/iceberg-python/pull/3007#discussion_r2780331652


##########
Makefile:
##########
@@ -133,6 +133,9 @@ test-gcs: ## Run tests marked with @pytest.mark.gcs
        sh ./dev/run-gcs-server.sh
        $(TEST_RUNNER) pytest tests/ -m gcs $(PYTEST_ARGS)

Review Comment:
   nit: `test_partitioned_write` downloads a ~38MB taxi dataset from CloudFront 
on every CI run. It's the only CI job that depends on an external CDN — if it 
goes down this job fails. Might be worth a comment here noting the network 
dependency, or consider caching it.



##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
         path: .coverage*
         include-hidden-files: true
 
+  benchmark-test:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v6
+    - uses: actions/setup-python@v6
+      with:
+        python-version: '3.12'
+    - name: Install UV
+      uses: astral-sh/setup-uv@v7
+      with:
+        enable-cache: true
+    - name: Install system dependencies
+      run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for 
kerberos
+    - name: Install
+      run: make install
+    - name: Run benchmarks
+      run: make test-benchmark

Review Comment:
   These files are never produced — I confirmed locally. The tests use 
`timeit`/`tracemalloc` directly, not `pytest-benchmark`, so `.benchmarks/` and 
`benchmark-*.json` are never generated. This step will always be a no-op. 
Should we remove it or switch to `pytest-benchmark` so it actually generates 
output?



##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
         path: .coverage*
         include-hidden-files: true
 
+  benchmark-test:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v6
+    - uses: actions/setup-python@v6
+      with:
+        python-version: '3.12'
+    - name: Install UV
+      uses: astral-sh/setup-uv@v7
+      with:
+        enable-cache: true
+    - name: Install system dependencies
+      run: sudo apt-get update && sudo apt-get install -y libkrb5-dev # for 
kerberos
+    - name: Install
+      run: make install

Review Comment:
   `test_partitioned_write` currently prints average runtime but has no 
threshold or assertion — it's effectively a smoke test that catches crashes, 
not regressions. Worth adding a baseline assertion like the memory tests have, 
or at least capturing the output somewhere reviewers can compare between runs.



##########
.github/workflows/python-ci.yml:
##########
@@ -178,6 +178,32 @@ jobs:
         path: .coverage*
         include-hidden-files: true
 

Review Comment:
   Could we run the benchmarks against main as a baseline in the same job? e.g. 
checkout main → run benchmarks → checkout PR → run benchmarks → print 
comparison. Without some form of before/after, we're running benchmarks but not 
detecting regressions — which is what issue #27 was asking for.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to