ingaleniranjan365 opened a new pull request, #16743:
URL: https://github.com/apache/iceberg/pull/16743

   ## Performance issue
   
   **File**: `core/src/main/java/org/apache/iceberg/SchemaUpdate.java:178`
   
   `deletes` was declared as `List<Integer>`, making `deletes.contains(id)` an 
O(n) scan. This field is checked once per column across every field in a schema 
during `apply()`, so on a wide schema the scan executes O(fields²) times.
   
   ## Fix
   
   Changed the type of `deletes` from `List<Integer>` to `Set<Integer>` (4 
occurrences: declaration + `HashSet` construction + two call sites). 
`Set.contains()` is O(1). All other behaviour is identical — `deletes` is 
append-only and never iterated in order.
   
   ## Evidence
   
   Before: `List.contains()` — O(n) per call, called once per field during 
schema apply → O(fields²) total.  
   After: `HashSet.contains()` — O(1) per call → O(fields) total.
   
   ## Validation
   
   - Test harness: `./gradlew :iceberg-core:test --tests 
"org.apache.iceberg.TestSchemaUpdate"`
   - Tests pass after fix: ✅
   - Fix scope: domain-free, independent, 1 file / 6 lines
   
   🌀 Magic applied with [Wibey VS Code 
Extension](https://wibey.walmart.com/code) 🪄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to