tomlarkworthy opened a new pull request, #2773:
URL: https://github.com/apache/iceberg-python/pull/2773

    I've successfully created a proof-of-concept demonstrating that PyIceberg 
already supports writing equality delete files via transactions, even though 
the read path is not yet implemented.
   
     What I Discovered
   
     1. No tests use actual equality_ids values - All existing tests either set 
it to [] or None
     2. The write infrastructure is complete and working - All necessary 
components exist:
       - DataFileContent.EQUALITY_DELETES enum
       - equality_ids field in DataFile
       - Snapshot tracking for equality deletes
       - Manifest serialization
     3. The key is using the transaction API directly:
     with table.transaction() as txn:
         update_snapshot = txn.update_snapshot()
         with update_snapshot.fast_append() as append_files:
             append_files.append_data_file(delete_file)  # Works for delete 
files!
   
     Files Created
   
     1. test_equality_delete_poc.py - Detailed standalone test with verbose 
output
     2. test_add_equality_delete.py - Clean pytest suite with 2 passing tests:
       - Single equality delete file
       - Multiple delete files with different equality_ids
     3. example_add_equality_delete.py - Complete working examples showing:
       - Basic usage (single column)
       - Composite keys (multiple columns)
       - Multiple delete files in one transaction
     4. EQUALITY_DELETE_POC_SUMMARY.md - Comprehensive documentation
   
     Test Results
   
     All tests pass successfully:
     test_add_equality_delete.py::test_add_equality_delete_file_via_transaction 
PASSED
     
test_add_equality_delete.py::test_add_multiple_equality_delete_files_with_different_equality_ids
 PASSED
     ====== 2 passed in 1.06s ======
   
     Key Takeaways
   
     - ✅ You can write equality delete files today using the transaction API
     - ✅ Single column deletes: equality_ids=[1]
     - ✅ Composite key deletes: equality_ids=[1, 2]
     - ✅ Multiple delete files can be added in one transaction
     - ✅ Metadata tracking works correctly (snapshot summaries, manifests)
     - ❌ Reading is blocked - raises ValueError when scanning tables with 
equality deletes
   
     The write path is production-ready. Users who generate equality delete 
files externally can add them to PyIceberg tables now, though they'll need 
other tools (like Spark) to read those tables.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to