[GitHub] [iceberg] yuangjiang opened a new issue, #6236: aused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-11-21 Thread GitBox
yuangjiang opened a new issue, #6236: URL: https://github.com/apache/iceberg/issues/6236 ### Apache Iceberg version main (development) ### Query engine Spark ### Please describe the bug 🐞 Iceberg spark cannot create a table using jdbc catalog, prompting that

[GitHub] [iceberg] yuangjiang commented on issue #6236: aused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-11-21 Thread GitBox
yuangjiang commented on issue #6236: URL: https://github.com/apache/iceberg/issues/6236#issuecomment-1321628781 My submit command is as follows bin/spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.0.0 \ --conf spark.sql.catalog.prod=org.apache.iceberg.sp

[GitHub] [iceberg] hameizi opened a new pull request, #6237: Core: Fix check is delete file and data file overlap

2022-11-21 Thread GitBox
hameizi opened a new pull request, #6237: URL: https://github.com/apache/iceberg/pull/6237 Just `deleteLower` and `deleteUpper` less than `dataLower ` is true or `deleteLower` and `deleteUpper` greater than `dataUpper` is true mean there is no overlap between the delete-file and data-file.

[GitHub] [iceberg] hameizi commented on pull request #6237: Core: Fix check is delete file and data file overlap

2022-11-21 Thread GitBox
hameizi commented on PR #6237: URL: https://github.com/apache/iceberg/pull/6237#issuecomment-1321652265 @rdblue can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] yuangjiang commented on issue #6236: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-11-21 Thread GitBox
yuangjiang commented on issue #6236: URL: https://github.com/apache/iceberg/issues/6236#issuecomment-1321699076 CREATE TABLE `iceberg_namespace_properties` ( `catalog_name` varchar(255) NOT NULL, `namespace` varchar(255) NOT NULL, `property_key` varchar(255) NOT NULL, `pro

[GitHub] [iceberg] nastra commented on pull request #6238: Nessie: Make UpdateableReference public

2022-11-21 Thread GitBox
nastra commented on PR #6238: URL: https://github.com/apache/iceberg/pull/6238#issuecomment-1321831628 I think it would be actually better if `NessieIcebergClient` would have a re-usable commit operation rathern than exposing this class here. I talked to @ajantha-bhat and he'll add it. --

[GitHub] [iceberg] nastra closed pull request #6238: Nessie: Make UpdateableReference public

2022-11-21 Thread GitBox
nastra closed pull request #6238: Nessie: Make UpdateableReference public URL: https://github.com/apache/iceberg/pull/6238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [iceberg] Fokko merged pull request #6212: Replace ImmutableMap.Builder.build() with buildOrThrow()

2022-11-21 Thread GitBox
Fokko merged PR #6212: URL: https://github.com/apache/iceberg/pull/6212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko merged pull request #6228: Python: Minor fixes to expression types

2022-11-21 Thread GitBox
Fokko merged PR #6228: URL: https://github.com/apache/iceberg/pull/6228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] zhongyujiang commented on pull request #6237: Core: Fix check is delete file and data file overlap

2022-11-21 Thread GitBox
zhongyujiang commented on PR #6237: URL: https://github.com/apache/iceberg/pull/6237#issuecomment-1321994998 I think the current judgment has already dealt with this situation, IIUC, deletes and data wil **not overlap** if: `(dataLower > deleteUpper) || (deleteLower > dataUpper)` So t

[GitHub] [iceberg] pvary commented on pull request #6175: Hive: Add UGI to the key in CachedClientPool

2022-11-21 Thread GitBox
pvary commented on PR #6175: URL: https://github.com/apache/iceberg/pull/6175#issuecomment-1322002592 @flyrain: The Catalog is a strange animal. We basically expect it to be a short lived and easy/cheap to drop and recreate. I had a similar discussion [1] on another PR around this. I Hive c

[GitHub] [iceberg] Fokko commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
Fokko commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1027642501 ## python/pyiceberg/table/__init__.py: ## @@ -138,7 +157,10 @@ def __eq__(self, other: Any) -> bool: ) -class TableScan: +S = TypeVar("S", bound="TableScan",

[GitHub] [iceberg] SHuixo commented on issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model)

2022-11-21 Thread GitBox
SHuixo commented on issue #6104: URL: https://github.com/apache/iceberg/issues/6104#issuecomment-1322093088 > Yes, we have to wait it to be merged. Good, looking forward to the merger of this rockdb new feature. > Had a look about your exception log. The reason is the cdc contai

[GitHub] [iceberg] Fokko commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
Fokko commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028076839 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def s

[GitHub] [iceberg] Fokko opened a new pull request, #6239: Docs: Select the right Spark catalog

2022-11-21 Thread GitBox
Fokko opened a new pull request, #6239: URL: https://github.com/apache/iceberg/pull/6239 I copy pasted the example, but still had to select the catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] Fokko merged pull request #6034: Python: GlueCatalog Full Implementation

2022-11-21 Thread GitBox
Fokko merged PR #6034: URL: https://github.com/apache/iceberg/pull/6034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] asheeshgarg commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
asheeshgarg commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322337039 @nastra I have removed the spark dependency and just added raw org.apache.iceberg iceberg-common 1.0.0 org.apache.i

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028276873 ## python/pyiceberg/table/__init__.py: ## @@ -138,7 +157,10 @@ def __eq__(self, other: Any) -> bool: ) -class TableScan: +S = TypeVar("S", bound="TableScan"

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028278238 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028282275 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028282699 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] nastra commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
nastra commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322348315 Can you try `iceberg-hive-runtime` rather than `iceberg-hive-metastore` (you'd also need to remove `iceberg-parquet` as that's shaded inside `iceberg-hive-runtime`)? -- This is an

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028283685 ## python/pyiceberg/manifest.py: ## @@ -141,6 +141,14 @@ def read_manifest_entry(input_file: InputFile) -> Iterator[ManifestEntry]: yield ManifestEntry(**d

[GitHub] [iceberg] rdblue commented on a diff in pull request #6232: Python: Disallow Any generics

2022-11-21 Thread GitBox
rdblue commented on code in PR #6232: URL: https://github.com/apache/iceberg/pull/6232#discussion_r1028308314 ## python/pyiceberg/expressions/literals.py: ## @@ -207,7 +207,7 @@ def __init__(self, value: int): super().__init__(value, int) @singledispatchmethod -

[GitHub] [iceberg] asheeshgarg commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
asheeshgarg commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322426914 @nastra org.apache.iceberg iceberg-common 1.0.0 org.apache.iceberg iceberg-core 1.0.0

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
ajantha-bhat opened a new pull request, #6240: URL: https://github.com/apache/iceberg/pull/6240 Move core logic from `NessieTableOperations#doCommit` to `NessieIcebergClient#commitTable` because Trino Nessie catalog integration (https://github.com/trinodb/trino/pull/11701) don't use Iceberg

[GitHub] [iceberg] ajantha-bhat commented on pull request #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
ajantha-bhat commented on PR #6240: URL: https://github.com/apache/iceberg/pull/6240#issuecomment-1322441438 cc: @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [iceberg] nastra commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
nastra commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322441996 NoClassDefFoundError generally means that this dependency existed at compilation time but doesn't exist at runtime, so you're missing the right dependency for that. Can you provid

[GitHub] [iceberg] nastra commented on a diff in pull request #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
nastra commented on code in PR #6240: URL: https://github.com/apache/iceberg/pull/6240#discussion_r1028365461 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -46,15 +50,7 @@ import org.projectnessie.error.NessieNamespaceNotFoundException; impo

[GitHub] [iceberg] asheeshgarg commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
asheeshgarg commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322444351 @nastra is there a complete list of dependency that I can use for Pure Java API program above list is complete I have added org.apache.thrift libthrift

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6239: Docs: Select the right Spark catalog

2022-11-21 Thread GitBox
ajantha-bhat commented on code in PR #6239: URL: https://github.com/apache/iceberg/pull/6239#discussion_r1028370385 ## docs/aws.md: ## @@ -68,6 +68,7 @@ done # start Spark SQL client shell spark-sql --packages $DEPENDENCIES \ +--conf spark.sql.defaultCatalog=my_catalog \

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
ajantha-bhat commented on code in PR #6240: URL: https://github.com/apache/iceberg/pull/6240#discussion_r1028371605 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -46,15 +50,7 @@ import org.projectnessie.error.NessieNamespaceNotFoundException;

[GitHub] [iceberg] mderoy opened a new issue, #6241: Updating the HiveCatalog Conf with setConf does not also reset FileIO

2022-11-21 Thread GitBox
mderoy opened a new issue, #6241: URL: https://github.com/apache/iceberg/issues/6241 ### Apache Iceberg version 0.14.0 ### Query engine _No response_ ### Please describe the bug 🐞 The Iceberg HiveCatalog class lets you set a configuration via ``` @Ov

[GitHub] [iceberg] mderoy commented on issue #6241: Updating the HiveCatalog Conf with setConf does not also reset FileIO

2022-11-21 Thread GitBox
mderoy commented on issue #6241: URL: https://github.com/apache/iceberg/issues/6241#issuecomment-1322538070 I understand this is mostly "user error" but I needed to read the source to get to the root of the problem which was not obvious. I'd like it if the interface could enforce thisbu

[GitHub] [iceberg] Fokko opened a new pull request, #6242: API: Restore the type of the identity transform

2022-11-21 Thread GitBox
Fokko opened a new pull request, #6242: URL: https://github.com/apache/iceberg/pull/6242 This caused some regression for the Iceberg 1.1.0 release: ``` 2022-11-21T12:05:46.6549795Z [ERROR] io.trino.plugin.iceberg.TestIcebergSystemTables.testManifestsTable Time elapsed: 0.701 s <

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028437459 ## python/pyiceberg/table/__init__.py: ## @@ -16,24 +16,43 @@ # under the License. from __future__ import annotations +from abc import ABC, abstractmethod +from data

[GitHub] [iceberg] rdblue commented on pull request #6242: API: Restore the type of the identity transform

2022-11-21 Thread GitBox
rdblue commented on PR #6242: URL: https://github.com/apache/iceberg/pull/6242#issuecomment-1322541809 Good catch. Thanks, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028440440 ## python/pyiceberg/table/__init__.py: ## @@ -138,7 +157,10 @@ def __eq__(self, other: Any) -> bool: ) -class TableScan: +S = TypeVar("S", bound="TableScan"

[GitHub] [iceberg] Fokko opened a new issue, #6243: Python: BoundType and BoundPredicate should match type

2022-11-21 Thread GitBox
Fokko opened a new issue, #6243: URL: https://github.com/apache/iceberg/issues/6243 ### Feature Request / Improvement When we bind an unbound expression, we'll convert the literal values to the type of field that it is being bound to. If we instantiate a BoundPredicate, the type shou

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028472043 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028473168 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] Fokko commented on a diff in pull request #6232: Python: Disallow Any generics

2022-11-21 Thread GitBox
Fokko commented on code in PR #6232: URL: https://github.com/apache/iceberg/pull/6232#discussion_r1028473547 ## python/pyiceberg/expressions/literals.py: ## @@ -207,7 +207,7 @@ def __init__(self, value: int): super().__init__(value, int) @singledispatchmethod -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028476399 ## python/pyiceberg/table/__init__.py: ## @@ -16,24 +16,43 @@ # under the License. from __future__ import annotations +from abc import ABC, abstractmethod +from data

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028477122 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028477438 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028478168 ## python/pyproject.toml: ## @@ -84,6 +84,7 @@ build-backend = "poetry.core.masonry.api" [tool.poetry.extras] pyarrow = ["pyarrow"] +duckdb = ["duckdb"] Review Comm

[GitHub] [iceberg] Fokko merged pull request #6232: Python: Disallow Any generics

2022-11-21 Thread GitBox
Fokko merged PR #6232: URL: https://github.com/apache/iceberg/pull/6232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028489268 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028489268 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,143 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def

[GitHub] [iceberg] rdblue commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028501655 ## python/pyiceberg/table/__init__.py: ## @@ -16,24 +16,43 @@ # under the License. from __future__ import annotations +from abc import ABC, abstractmethod +from data

[GitHub] [iceberg] ahshahid commented on issue #6198: colStats flag in TableContext remains false except in situation where delete files are present

2022-11-21 Thread GitBox
ahshahid commented on issue #6198: URL: https://github.com/apache/iceberg/issues/6198#issuecomment-1322664170 I am trying to reproduce the behaviour which I saw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [iceberg] asheeshgarg commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
asheeshgarg commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1322683140 After could of jar iteration able to read the catalog now for reading the data I am getting Exception in thread "main" java.lang.ExceptionInInitializerError at org.a

[GitHub] [iceberg] JonasJ-ap commented on pull request #5331: WIP: Adding support for Delta to Iceberg migration

2022-11-21 Thread GitBox
JonasJ-ap commented on PR #5331: URL: https://github.com/apache/iceberg/pull/5331#issuecomment-1322684304 Hi @ericlgoodman. My name is Rushan Jiang, a CS undergrad at CMU. I am interested in learning and contributing to this migration support. I saw you did not update this PR for some time.

[GitHub] [iceberg] Fokko commented on a diff in pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
Fokko commented on code in PR #6233: URL: https://github.com/apache/iceberg/pull/6233#discussion_r1028552098 ## python/pyiceberg/table/__init__.py: ## @@ -199,16 +223,144 @@ def use_ref(self, name: str): raise ValueError(f"Cannot scan unknown ref={name}") -def s

[GitHub] [iceberg] rdblue merged pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue merged PR #6233: URL: https://github.com/apache/iceberg/pull/6233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6233: Python: Implement DataScan.plan_files

2022-11-21 Thread GitBox
rdblue commented on PR #6233: URL: https://github.com/apache/iceberg/pull/6233#issuecomment-1322733138 Thanks, @Fokko! I'll follow up with testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue commented on code in PR #6128: URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028599283 ## python/pyiceberg/expressions/__init__.py: ## @@ -601,36 +640,65 @@ def __eq__(self, other): def __repr__(self) -> str: return f"{str(self.__class__.__na

[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue commented on code in PR #6128: URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028602287 ## python/pyiceberg/transforms.py: ## @@ -644,8 +748,88 @@ def can_transform(self, _: IcebergType) -> bool: def result_type(self, source: IcebergType) -> IcebergTyp

[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue commented on code in PR #6128: URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028604963 ## python/pyiceberg/transforms.py: ## @@ -644,8 +751,89 @@ def can_transform(self, _: IcebergType) -> bool: def result_type(self, source: IcebergType) -> IcebergTyp

[GitHub] [iceberg] rdblue commented on a diff in pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue commented on code in PR #6128: URL: https://github.com/apache/iceberg/pull/6128#discussion_r1028605806 ## python/pyiceberg/transforms.py: ## @@ -511,6 +590,31 @@ def preserves_order(self) -> bool: def source_type(self) -> IcebergType: return self._source_typ

[GitHub] [iceberg] rdblue commented on pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue commented on PR #6128: URL: https://github.com/apache/iceberg/pull/6128#issuecomment-1322801754 I fixed a formatting issue and tests are passing! Overall this looks great with only a couple nits left, so I'll merge it. Thanks for getting this in, @Fokko! -- This is an automated me

[GitHub] [iceberg] rdblue merged pull request #6128: Python: Projection

2022-11-21 Thread GitBox
rdblue merged PR #6128: URL: https://github.com/apache/iceberg/pull/6128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-21 Thread GitBox
rdblue commented on PR #6069: URL: https://github.com/apache/iceberg/pull/6069#issuecomment-1322809905 @dhruv-pratap, we've been working on the list lately and I think the remaining items are: (4) add `MetricsEvalVisitor` to prune by column stats, (5) add `ResidualEvalVisitor` to produce re

[GitHub] [iceberg] dhruv-pratap commented on pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-21 Thread GitBox
dhruv-pratap commented on PR #6069: URL: https://github.com/apache/iceberg/pull/6069#issuecomment-1322841113 > @dhruv-pratap, we've been working on the list lately and I think the remaining items are: (4) add `MetricsEvalVisitor` to prune by column stats, (5) add `ResidualEvalVisitor` to pr

[GitHub] [iceberg] emkornfield commented on issue #644: Views on top of Iceberg tables

2022-11-21 Thread GitBox
emkornfield commented on issue #644: URL: https://github.com/apache/iceberg/issues/644#issuecomment-1322849306 What is the current state of Views, in general, it seems like there has been development effort here but I didn't see a vote on the mailing list officially adopting the specificati

[GitHub] [iceberg] luoyuxia commented on issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model)

2022-11-21 Thread GitBox
luoyuxia commented on issue #6104: URL: https://github.com/apache/iceberg/issues/6104#issuecomment-1322899747 > This means that in the CDC data that is streaming to Iceberg, don't have a viable data compression scheme for data streams that contain delete operations at this stage? Yes

[GitHub] [iceberg] hameizi commented on pull request #6237: Core: Fix check is delete file and data file overlap

2022-11-21 Thread GitBox
hameizi commented on PR #6237: URL: https://github.com/apache/iceberg/pull/6237#issuecomment-1322908600 > I think the current judgment has already dealt with this situation, IIUC, deletes and data wil **not overlap** if: `(dataLower > deleteUpper) || (deleteLower > dataUpper)` So they **wil

[GitHub] [iceberg] hameizi closed pull request #6237: Core: Fix check is delete file and data file overlap

2022-11-21 Thread GitBox
hameizi closed pull request #6237: Core: Fix check is delete file and data file overlap URL: https://github.com/apache/iceberg/pull/6237 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] SHuixo commented on issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model)

2022-11-21 Thread GitBox
SHuixo commented on issue #6104: URL: https://github.com/apache/iceberg/issues/6104#issuecomment-1322917998 Yes, we tried it and found it worked. The following two log messages are the results of the compression test. > [compact-data-when-stream-write.log](https://github.com/apac

[GitHub] [iceberg] SHuixo closed issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model)

2022-11-21 Thread GitBox
SHuixo closed issue #6104: Rewrite iceberg small files with flink succeeds but no snapshot is generated (V2 - upsert model) URL: https://github.com/apache/iceberg/issues/6104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6239: Docs: Select the right Spark catalog

2022-11-21 Thread GitBox
ajantha-bhat commented on code in PR #6239: URL: https://github.com/apache/iceberg/pull/6239#discussion_r1028370385 ## docs/aws.md: ## @@ -68,6 +68,7 @@ done # start Spark SQL client shell spark-sql --packages $DEPENDENCIES \ +--conf spark.sql.defaultCatalog=my_catalog \

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6217: Core: Fix deletion of old metadata files when METADATA_DELETE_AFTER_COMMIT_ENABLED is set

2022-11-21 Thread GitBox
amogh-jahagirdar commented on code in PR #6217: URL: https://github.com/apache/iceberg/pull/6217#discussion_r1028798630 ## core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java: ## @@ -414,16 +414,15 @@ private void deleteRemovedMetadataFiles(TableMetadata bas

[GitHub] [iceberg] arunb2w opened a new issue, #6244: Iceberg metadata not stored properly

2022-11-21 Thread GitBox
arunb2w opened a new issue, #6244: URL: https://github.com/apache/iceberg/issues/6244 I tried creating an sample iceberg table with below schema ``` CREATE TABLE glue_dev.db.datatype_test ( id bigint, data string, category string

[GitHub] [iceberg] lirui-apache commented on pull request #5206: Core: Defer reading Avro metadata until ManifestFile is read

2022-11-21 Thread GitBox
lirui-apache commented on PR #5206: URL: https://github.com/apache/iceberg/pull/5206#issuecomment-1323119574 We encountered authorization failures reading manifest files after applied this PR, and thought it might be related. Since the worker pool in use is by default a global static pool,

[GitHub] [iceberg] pvary commented on issue #6241: Updating the HiveCatalog Conf with setConf does not also reset FileIO

2022-11-21 Thread GitBox
pvary commented on issue #6241: URL: https://github.com/apache/iceberg/issues/6241#issuecomment-1323146232 With the current `Catalog` implementations it is probably better to recreate the catalogs anyway, as they are mostly a thin wrapper above the HMSClientPool. -- This is an automated m

[GitHub] [iceberg] nastra commented on issue #6003: Vectorized Read

2022-11-21 Thread GitBox
nastra commented on issue #6003: URL: https://github.com/apache/iceberg/issues/6003#issuecomment-1323178266 This problem is specific to Arrow itself when running with JDK9+, because Arrow's `MemoryUtil` requires access to the mentioned Java module. See https://arrow.apache.org/docs/java/ins

[GitHub] [iceberg] psnilesh opened a new issue, #6245: For for issue #2796 is missing from 0.14.1 and 1.0.x releases

2022-11-21 Thread GitBox
psnilesh opened a new issue, #6245: URL: https://github.com/apache/iceberg/issues/6245 ### Apache Iceberg version 1.0.0 (latest release) ### Query engine _No response_ ### Please describe the bug 🐞 Issue is same as https://github.com/apache/iceberg/issues/27

[GitHub] [iceberg] nastra commented on pull request #6169: AWS,Core: Add S3 REST Signer client + REST Spec

2022-11-21 Thread GitBox
nastra commented on PR #6169: URL: https://github.com/apache/iceberg/pull/6169#issuecomment-1323218731 I have moved the S3 Signer REST Spec to the `iceberg-aws` module. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [iceberg] ajantha-bhat commented on issue #6245: Fix for issue #2796 is missing from 0.14.1 and 1.0.x releases

2022-11-21 Thread GitBox
ajantha-bhat commented on issue #6245: URL: https://github.com/apache/iceberg/issues/6245#issuecomment-1323226313 0.14.1 and 1.0.0 release is not from the master branch. It was based on 0.14.0 branch. The fix that you are expecting will be available in the upcoming 1.1.0 release. --

[GitHub] [iceberg] nastra commented on pull request #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
nastra commented on PR #6240: URL: https://github.com/apache/iceberg/pull/6240#issuecomment-1323227993 cc: @dimas-b / @snazy can you guys review this one please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] psnilesh commented on issue #6245: Fix for issue #2796 is missing from 0.14.1 and 1.0.x releases

2022-11-21 Thread GitBox
psnilesh commented on issue #6245: URL: https://github.com/apache/iceberg/issues/6245#issuecomment-1323228826 Thank you. When can I expect 1.1.0 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] ajantha-bhat commented on issue #6244: Iceberg metadata not stored properly

2022-11-21 Thread GitBox
ajantha-bhat commented on issue #6244: URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1323230663 > Does that mean iceberg is not storing the metadata correctly? lower bounds and upper bounds are stored as byte arrays in the manifest files. Hence, what you are seeing i

[GitHub] [iceberg] ajantha-bhat commented on issue #6244: Iceberg metadata not stored properly

2022-11-21 Thread GitBox
ajantha-bhat commented on issue #6244: URL: https://github.com/apache/iceberg/issues/6244#issuecomment-1323232753 As a workaround, I believe https://github.com/hililiwei/iceberg-tools#manifest2json can convert them and show them. cc: @hililiwei -- This is an automated message fr

[GitHub] [iceberg] ajantha-bhat commented on issue #6245: Fix for issue #2796 is missing from 0.14.1 and 1.0.x releases

2022-11-21 Thread GitBox
ajantha-bhat commented on issue #6245: URL: https://github.com/apache/iceberg/issues/6245#issuecomment-1323233889 It is already open for voting. So within 3 to 4 days, the release will be available. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [iceberg] Fokko merged pull request #6211: Allow dropping a column used by old SortOrders but not current SortOrder

2022-11-21 Thread GitBox
Fokko merged PR #6211: URL: https://github.com/apache/iceberg/pull/6211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on issue #6204: Allow dropping a column used by old SortOrders

2022-11-21 Thread GitBox
Fokko commented on issue #6204: URL: https://github.com/apache/iceberg/issues/6204#issuecomment-1323246249 Thanks for double checking this 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] Fokko closed issue #6204: Allow dropping a column used by old SortOrders

2022-11-21 Thread GitBox
Fokko closed issue #6204: Allow dropping a column used by old SortOrders URL: https://github.com/apache/iceberg/issues/6204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [iceberg] ajantha-bhat commented on pull request #6240: Nessie: Refactor NessieTableOperations#doCommit

2022-11-21 Thread GitBox
ajantha-bhat commented on PR #6240: URL: https://github.com/apache/iceberg/pull/6240#issuecomment-1323253673 @nastra: Review from Iceberg side is easy. But I would like to know, can Trino really use it? Looks to me that `client#commitTable` needs the below parameters. ``` commi