Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
syun64 commented on code in PR #506: URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524922200 ## pyiceberg/table/__init__.py: ## @@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_T for

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
syun64 commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1997482530 > @syun64 I'm all for it if it works, but I see a lot of issues with inferring it from the Hive path. Yeah. I don't personally need migration procedures to add files from Hive

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
syun64 commented on code in PR #506: URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524886550 ## pyiceberg/table/__init__.py: ## @@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_T for

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
syun64 commented on code in PR #506: URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524886152 ## pyiceberg/table/__init__.py: ## @@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_T for

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
syun64 commented on code in PR #506: URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524887033 ## pyiceberg/table/__init__.py: ## @@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_T for

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-14 Thread via GitHub
Fokko commented on code in PR #506: URL: https://github.com/apache/iceberg-python/pull/506#discussion_r1524358528 ## pyiceberg/table/__init__.py: ## @@ -1147,6 +1150,26 @@ def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_T for

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-13 Thread via GitHub
Fokko commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1996646115 @syun64 I'm all for it if it works, but I see a lot of issues with inferring it from the Hive path. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-13 Thread via GitHub
syun64 commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1995003905 > So both of the approaches have pro's and con's. One thing I would like to avoid is having to rely on Hive directly, this will make sure that we can generalize it to also import gene

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-13 Thread via GitHub
Fokko commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1994769768 So both of the approaches have pro's and con's. One thing I would like to avoid is having to rely on Hive directly, this will make sure that we can generalize it to also import generic

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-13 Thread via GitHub
syun64 commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1994561440 > We will replace file_path based partition inference with parquet metadata footer based partition inference. Currently we only support IdentityPartitions, and we can infer the partit

Re: [PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-12 Thread via GitHub
syun64 commented on PR #506: URL: https://github.com/apache/iceberg-python/pull/506#issuecomment-1991884158 Updates from offline discussions: 1. The task of creating the correct Iceberg Table Schema with the desired Partition Spec, from an external table (like Hive) is out of scope of thi

[PR] [WIP] Add Data Files from Parquet Files [iceberg-python]

2024-03-07 Thread via GitHub
syun64 opened a new pull request, #506: URL: https://github.com/apache/iceberg-python/pull/506 PyIceberg's version of add_files Spark migration procedure. Some early ideas on its implementation: - instead of staying with the input interface for Spark's Procedure, we could just allo