[GitHub] [iceberg] Gschiavon commented on issue #5946: Not able to run spark procedure rewrite_data_files

2022-12-30 Thread GitBox
Gschiavon commented on issue #5946: URL: https://github.com/apache/iceberg/issues/5946#issuecomment-1367851261 A quick solution (workaround) to this is to add the package in the `--packages` while doing spark submit Like this: `--packages org.apache.iceberg:iceberg-spark-runtime-3.

[GitHub] [iceberg] Fokko commented on a diff in pull request #6500: Aws: Cosmetic change and simplify statusCode check in GlueTableOperations

2022-12-30 Thread GitBox
Fokko commented on code in PR #6500: URL: https://github.com/apache/iceberg/pull/6500#discussion_r1059366839 ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java: ## @@ -184,9 +185,7 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {

[GitHub] [iceberg] cccs-eric commented on pull request #6497: Python: Move `adlfs` import inline

2022-12-30 Thread GitBox
cccs-eric commented on PR #6497: URL: https://github.com/apache/iceberg/pull/6497#issuecomment-1367932332 @Fokko yeah, after I wrote that message, I started with a fresh venv and couldn't make it work without installing `pyiceberg[s3fs]`. I remember seeing other places (tests, docs, ...) i

[GitHub] [iceberg] rdblue merged pull request #6504: Python: Add tests

2022-12-30 Thread GitBox
rdblue merged PR #6504: URL: https://github.com/apache/iceberg/pull/6504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6501: Python: Use PyArrow buffer

2022-12-30 Thread GitBox
rdblue commented on code in PR #6501: URL: https://github.com/apache/iceberg/pull/6501#discussion_r1059492469 ## python/pyiceberg/io/pyarrow.py: ## @@ -151,7 +160,7 @@ def open(self) -> InputStream: an AWS error code 15 """ try: -in

[GitHub] [iceberg] rdblue merged pull request #6497: Python: Move `adlfs` import inline

2022-12-30 Thread GitBox
rdblue merged PR #6497: URL: https://github.com/apache/iceberg/pull/6497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6495: Python-legacy: Fix CI

2022-12-30 Thread GitBox
rdblue commented on PR #6495: URL: https://github.com/apache/iceberg/pull/6495#issuecomment-1368051275 Looks good to me. Thanks for fixing it. When do you think it will be time to remove the legacy code? I think we're about to the point where we can do everything the legacy code can.

[GitHub] [iceberg] rdblue merged pull request #6495: Python-legacy: Fix CI

2022-12-30 Thread GitBox
rdblue merged PR #6495: URL: https://github.com/apache/iceberg/pull/6495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on a diff in pull request #6485: API: New KMS Client Interface

2022-12-30 Thread GitBox
rdblue commented on code in PR #6485: URL: https://github.com/apache/iceberg/pull/6485#discussion_r1059493412 ## api/src/main/java/org/apache/iceberg/encryption/KmsClient.java: ## @@ -22,7 +22,8 @@ import java.nio.ByteBuffer; import java.util.Map; -/** A minimum client inter

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059495752 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059496480 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059496896 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059497186 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059497682 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +465,198 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] krvikash commented on a diff in pull request #6500: Aws: Cosmetic change in GlueTableOperations

2022-12-30 Thread GitBox
krvikash commented on code in PR #6500: URL: https://github.com/apache/iceberg/pull/6500#discussion_r1059499681 ## aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java: ## @@ -184,9 +185,7 @@ protected void doCommit(TableMetadata base, TableMetadata metadata)

[GitHub] [iceberg] Fokko commented on pull request #6495: Python-legacy: Fix CI

2022-12-30 Thread GitBox
Fokko commented on PR #6495: URL: https://github.com/apache/iceberg/pull/6495#issuecomment-1368078960 Thanks for merging it. It feels to me that it isn't up to me to remove the old code. Now and then I still look at the old code to get some inspiration, but I agree that most of the function

[GitHub] [iceberg] Fokko commented on a diff in pull request #6501: Python: Use PyArrow buffer

2022-12-30 Thread GitBox
Fokko commented on code in PR #6501: URL: https://github.com/apache/iceberg/pull/6501#discussion_r1059512957 ## python/pyiceberg/io/pyarrow.py: ## @@ -151,7 +160,7 @@ def open(self) -> InputStream: an AWS error code 15 """ try: -inp

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059515855 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059515919 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059515855 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059516096 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059516096 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] Fokko opened a new issue, #6505: Infer Iceberg schema from the Parquet file

2022-12-30 Thread GitBox
Fokko opened a new issue, #6505: URL: https://github.com/apache/iceberg/issues/6505 ### Feature Request / Improvement In PyIceberg we rely on fetching the schema from the Parquet metadata. If this is not available (because the parquet file is written by something else than an Iceberg

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059516392 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059516697 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059516840 ## python/pyiceberg/schema.py: ## @@ -1046,3 +1055,79 @@ def _project_map(map_type: MapType, value_result: IcebergType) -> MapType: value_type=value_re

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059517229 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059517783 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059518245 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059518886 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059519176 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +468,170 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] rdblue commented on a diff in pull request #6490: Python: Replace Pydantic with StructRecord

2022-12-30 Thread GitBox
rdblue commented on code in PR #6490: URL: https://github.com/apache/iceberg/pull/6490#discussion_r1059522065 ## python/pyiceberg/manifest.py: ## @@ -76,137 +66,283 @@ def __repr__(self) -> str: return f"FileFormat.{self.name}" -class DataFile(IcebergBaseModel): -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6490: Python: Replace Pydantic with StructRecord

2022-12-30 Thread GitBox
rdblue commented on code in PR #6490: URL: https://github.com/apache/iceberg/pull/6490#discussion_r1059522065 ## python/pyiceberg/manifest.py: ## @@ -76,137 +66,283 @@ def __repr__(self) -> str: return f"FileFormat.{self.name}" -class DataFile(IcebergBaseModel): -

[GitHub] [iceberg] Fokko commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1059522716 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +465,198 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarrow

[GitHub] [iceberg] rdblue commented on a diff in pull request #6490: Python: Replace Pydantic with StructRecord

2022-12-30 Thread GitBox
rdblue commented on code in PR #6490: URL: https://github.com/apache/iceberg/pull/6490#discussion_r1059524353 ## python/pyiceberg/manifest.py: ## @@ -76,137 +66,283 @@ def __repr__(self) -> str: return f"FileFormat.{self.name}" -class DataFile(IcebergBaseModel): -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6490: Python: Replace Pydantic with StructRecord

2022-12-30 Thread GitBox
rdblue commented on code in PR #6490: URL: https://github.com/apache/iceberg/pull/6490#discussion_r1059524606 ## python/pyiceberg/typedef.py: ## @@ -85,16 +86,24 @@ class Record(StructProtocol): def __init__(self, *data: Union[Any, StructProtocol]) -> None: self.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6490: Python: Replace Pydantic with StructRecord

2022-12-30 Thread GitBox
rdblue commented on code in PR #6490: URL: https://github.com/apache/iceberg/pull/6490#discussion_r1059524781 ## python/tests/expressions/test_visitors.py: ## @@ -827,85 +840,91 @@ def manifest_no_stats() -> ManifestFile: return _to_manifest_file() +def _PartitionFieldS

[GitHub] [iceberg] Fokko merged pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko merged PR #6437: URL: https://github.com/apache/iceberg/pull/6437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on pull request #6437: Python: Projection by Field ID

2022-12-30 Thread GitBox
Fokko commented on PR #6437: URL: https://github.com/apache/iceberg/pull/6437#issuecomment-1368105150 Thanks for the review @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] github-actions[bot] commented on issue #5183: Allow to configure Avro block size

2022-12-30 Thread GitBox
github-actions[bot] commented on issue #5183: URL: https://github.com/apache/iceberg/issues/5183#issuecomment-1368130469 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] github-actions[bot] commented on issue #5000: Proposal: FlinkSQL supports partition transform by computed columns

2022-12-30 Thread GitBox
github-actions[bot] commented on issue #5000: URL: https://github.com/apache/iceberg/issues/5000#issuecomment-1368130488 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] rdblue opened a new pull request, #6506: Python: Refactor Avro read path to use a partner visitor

2022-12-30 Thread GitBox
rdblue opened a new pull request, #6506: URL: https://github.com/apache/iceberg/pull/6506 This refactors the Avro read path to use the newly introduced `SchemaWithPartnerVisitor`. The purpose of this is to make the resolver a little more standard and make it easy to inject types that implem

[GitHub] [iceberg] rdblue commented on a diff in pull request #6506: Python: Refactor Avro read path to use a partner visitor

2022-12-30 Thread GitBox
rdblue commented on code in PR #6506: URL: https://github.com/apache/iceberg/pull/6506#discussion_r1059559551 ## python/pyiceberg/avro/resolver.py: ## @@ -57,80 +61,91 @@ def resolve(file_schema: Union[Schema, IcebergType], read_schema: Union[Schema, Raises: Not