andormarkus opened a new pull request, #1742: URL: https://github.com/apache/iceberg-python/pull/1742
This PR adds a new API method `write_parquet()` to the `Table` class, which allows writing a PyArrow table to Parquet files in Iceberg-compatible format without committing them to the table metadata. This provides a way to decouple the write and commit process, which is particularly useful in high-concurrency scenarios. ## Key features - `write_parquet(df)` writes Parquet files compatible with Iceberg table format - Returns a list of file paths to the written files - Files can later be committed using `add_files()` API - Helps manage concurrency by separating write operations from metadata commits ## Use case This is especially useful for high-concurrency ingestion scenarios where multiple writers could be writing data to an Iceberg table simultaneously. By separating the write and commit phases, applications can implement a queue system where the commit process (which requires a lock) is handled separately from the data writing phase: ```python # Write data but don't commit file_paths = table.write_parquet(df) # Later, commit the files to make them visible in queries table.add_files(file_paths=file_paths) ``` ## Documentation Added comprehensive documentation to the API docs, including explanations and examples of how to use the new method alongside the existing add_files API. ## Seeking guidance I would appreciate guidance from project maintainers on: 1. Which test cases would be most appropriate for this new API 2. Is there a preferred location or approach for testing this functionality? 3. Should we add tests that specifically verify the interaction between write_parquet() and add_files()? 4. Are there any performance considerations or edge cases that should be covered in testing? 5. Any further documentation or API changes before this is ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org