Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-05 Thread via GitHub
Fokko merged PR #358: URL: https://github.com/apache/iceberg-python/pull/358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
HonahX commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477467681 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,42 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=None,

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477447145 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,42 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=Non

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
Fokko commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477443042 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,42 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=None,

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925911513 Sweet, ready to merge from my POV -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477435123 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,41 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=Non

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
Fokko commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477434938 ## tests/integration/test_writes.py: ## @@ -489,6 +492,50 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.delet

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
Fokko commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925904312 > Sorry I don't feel comfortable writing documentation because I still lack a lot of Iceberg understanding and terminology. Could you do that part please? Sure thing, no problem

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925898463 Sorry I don't feel comfortable writing documentation because I still lack a lot of Iceberg understanding and terminology. Could you do that part please? -- This is an automated m

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477429231 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,41 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=Non

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477429043 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,41 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=Non

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
Fokko commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477428773 ## pyiceberg/io/pyarrow.py: ## @@ -1745,14 +1747,41 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: key_metadata=None,

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925891201 I've changed the properties to be table properties and added handling for some other Parquet properites -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477330747 ## tests/integration/test_writes.py: ## @@ -489,6 +492,58 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.dele

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477316186 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477316186 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477316186 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-04 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477236956 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128678 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477128027 ## pyiceberg/catalog/rest.py: ## @@ -450,6 +450,10 @@ def create_table( iceberg_schema = self._convert_schema_if_needed(schema) iceberg_schema = a

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477127850 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
amogh-jahagirdar commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925449043 > For the case (no compression specified) the tests currently pass locally but they shouldn't as we never set zstd as the default The default parquet compression is Z

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477123601 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925434691 Can you start CI @syun64? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477115866 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477115723 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on PR #358: URL: https://github.com/apache/iceberg-python/pull/358#issuecomment-1925432345 @jonashaag thank you for raising the issue and putting this PR together so quickly! We are very excited to group this fix in with the impending 0.6.0 release. I've left some comments

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477114478 ## tests/integration/test_writes.py: ## @@ -489,6 +492,50 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.dele

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477114250 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477113970 ## tests/integration/test_writes.py: ## @@ -489,6 +492,50 @@ def test_data_files(spark: SparkSession, session_catalog: Catalog, arrow_table_w assert [row.dele

Re: [PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
syun64 commented on code in PR #358: URL: https://github.com/apache/iceberg-python/pull/358#discussion_r1477113742 ## pyiceberg/io/pyarrow.py: ## @@ -1720,13 +1720,22 @@ def write_file(table: Table, tasks: Iterator[WriteTask]) -> Iterator[DataFile]: except StopIteration:

[PR] Use `write.parquet.compression-{codec,level}` [iceberg-python]

2024-02-03 Thread via GitHub
jonashaag opened a new pull request, #358: URL: https://github.com/apache/iceberg-python/pull/358 I had to change the `metadata_collector` code due to https://github.com/dask/dask/issues/7977. For the `` case (no compression specified) the tests currently pass locally but they should