Re: [PR] parallelize `add_files` [iceberg-python]

2025-03-03 Thread via GitHub
Fokko merged PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] parallelize `add_files` [iceberg-python]

2025-03-03 Thread via GitHub
Fokko commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1977256602 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
amitgilad3 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1974076443 ## tests/integration/test_add_files.py: ## @@ -229,6 +229,35 @@ def test_add_files_to_unpartitioned_table_raises_has_field_ids( tbl.add_files(file_p

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1974029646 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -d

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1974027542 ## pyiceberg/io/pyarrow.py: ## @@ -2466,36 +2466,41 @@ def _check_pyarrow_schema_compatible( def parquet_files_to_data_files(io: FileIO, table_metadata: TableMet

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1974026972 ## tests/integration/test_add_files.py: ## @@ -229,6 +229,35 @@ def test_add_files_to_unpartitioned_table_raises_has_field_ids( tbl.add_files(file_paths=f

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
Fokko commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1973903275 ## pyiceberg/io/pyarrow.py: ## @@ -2466,36 +2466,41 @@ def _check_pyarrow_schema_compatible( def parquet_files_to_data_files(io: FileIO, table_metadata: TableMe

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-27 Thread via GitHub
Fokko commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1973893895 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-26 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1972652148 ## tests/integration/test_add_files.py: ## @@ -229,6 +229,35 @@ def test_add_files_to_unpartitioned_table_raises_has_field_ids( tbl.add_files(file_paths=f

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-26 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1972655425 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -d

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-26 Thread via GitHub
vtk9 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1972653853 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -d

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-25 Thread via GitHub
Fokko commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1970024778 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-25 Thread via GitHub
Fokko commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1970035297 ## pyiceberg/io/pyarrow.py: ## @@ -2464,38 +2464,37 @@ def _check_pyarrow_schema_compatible( _check_schema_compatible(requested_schema, provided_schema) -

Re: [PR] parallelize `add_files` [iceberg-python]

2025-02-25 Thread via GitHub
amitgilad3 commented on code in PR #1717: URL: https://github.com/apache/iceberg-python/pull/1717#discussion_r1969511896 ## tests/integration/test_add_files.py: ## @@ -229,6 +229,35 @@ def test_add_files_to_unpartitioned_table_raises_has_field_ids( tbl.add_files(file_p

[PR] parallelize `add_files` [iceberg-python]

2025-02-24 Thread via GitHub
vtk9 opened a new pull request, #1717: URL: https://github.com/apache/iceberg-python/pull/1717 - `parquet_files_to_data_files` changed to `parquet_file_to_data_files` which processes a single parquet file and returns a `DataFile` - `_parquet_files_to_data_files` uses internal ExecutorFact