Re: [PR] Flink: Fix TestIcebergSourceWithWatermarkExtractor flakiness [iceberg]

2023-12-18 Thread via GitHub
pvary merged PR #9309: URL: https://github.com/apache/iceberg/pull/9309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Flink: Fix TestIcebergSourceWithWatermarkExtractor flakiness [iceberg]

2023-12-18 Thread via GitHub
pvary commented on PR #9309: URL: https://github.com/apache/iceberg/pull/9309#issuecomment-1859744822 @stevenzwu: The v1.16 does not have the `ALLOW_UNALIGNED_SOURCE_SPLITS` yet, so I had to remove this setting. I still hope that the fix would be enough itself. Merged the changes. Is the

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429696862 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

[I] How to improve write speed for data in the same partition? [iceberg]

2023-12-18 Thread via GitHub
dealing with significantly large partition data (e.g., 20GB in a single partition), the write speed becomes very slow. What settings can be configured to increase concurrency and improve write speed in such scenarios? ![screenshot-20231218-164250](https://github.com/apache/iceberg/assets

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429705956 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429705956 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429709164 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [I] bug: ManifestList parsing should not require `partition_type`. [iceberg-rust]

2023-12-18 Thread via GitHub
ZENOTME commented on issue #121: URL: https://github.com/apache/iceberg-rust/issues/121#issuecomment-1859832146 Thanks! I will send a PR to fix it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
panbingkun opened a new pull request, #9331: URL: https://github.com/apache/iceberg/pull/9331 Bumps [actions/labeler](https://github.com/actions/labeler) from 4 to 5. Release notes https://github.com/actions/labeler/releases/tag/v5.0.0 https://github.com/apache/iceberg/assets/152

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429740418 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429741449 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
panbingkun commented on PR #9331: URL: https://github.com/apache/iceberg/pull/9331#issuecomment-1859859240 The above modifications are based on the following document: https://github.com/actions/labeler/tree/main?tab=readme-ov-file#usage Some features have been validated in private

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429743306 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429744874 ## pyiceberg/table/name_mapping.py: ## @@ -0,0 +1,203 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. Se

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
panbingkun commented on PR #9320: URL: https://github.com/apache/iceberg/pull/9320#issuecomment-1859864397 A new pr about it: https://github.com/apache/iceberg/pull/9331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Core: Fixed certain operations failing to add new data files during retries [iceberg]

2023-12-18 Thread via GitHub
jasonf20 commented on PR #9230: URL: https://github.com/apache/iceberg/pull/9230#issuecomment-1859879328 Hi @rdblue. Thanks for info. Good to know about the manifest compaction conflict case. I was looking for a way the list could be partially cleared and this answers that. I

[I] Slowness when loading table from S3 [iceberg-python]

2023-12-18 Thread via GitHub
itaise opened a new issue, #220: URL: https://github.com/apache/iceberg-python/issues/220 ### Apache Iceberg version 0.3.0 ### Please describe the bug 🐞 Hi, I am trying to read a table schema. for our use case - we need only field names and types The operation of

Re: [PR] Add name-mapping [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #212: URL: https://github.com/apache/iceberg-python/pull/212#discussion_r1429778796 ## tests/table/test_name_mapping.py: ## @@ -0,0 +1,291 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

Re: [PR] Core: Look up targeted position deletes by path [iceberg]

2023-12-18 Thread via GitHub
szehon-ho commented on code in PR #9251: URL: https://github.com/apache/iceberg/pull/9251#discussion_r1429779656 ## core/src/main/java/org/apache/iceberg/DeleteFileIndex.java: ## @@ -582,93 +513,187 @@ private Iterable>> deleteManifestRea } } - // a group of indexed

Re: [PR] Make connect_timeout configurable in IO [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #218: URL: https://github.com/apache/iceberg-python/pull/218#discussion_r1429803361 ## pyiceberg/io/fsspec.py: ## @@ -127,6 +128,9 @@ def _s3(properties: Properties) -> AbstractFileSystem: if proxy_uri := properties.get(S3_PROXY_URI):

Re: [PR] Make connect_timeout configurable in IO [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #218: URL: https://github.com/apache/iceberg-python/pull/218#discussion_r1429804685 ## pyiceberg/io/pyarrow.py: ## @@ -330,6 +331,9 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste if proxy_uri := sel

Re: [I] An exception occurred while writing iceberg data through Spark: org. apache. iceberg. exceptions. CommitFailedException: metadata location has changed [iceberg]

2023-12-18 Thread via GitHub
AllenWee1106 commented on issue #9178: URL: https://github.com/apache/iceberg/issues/9178#issuecomment-1860003242 @Zhangg7723 ![screenshot-20231218-180612](https://github.com/apache/iceberg/assets/146182256/a42dd52d-7bf8-42ae-90d7-e7a5bb0606c2) Thank you. Is the method set in the

[PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
nastra opened a new pull request, #9332: URL: https://github.com/apache/iceberg/pull/9332 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[PR] fix: fix parse partitions in manifest_list [iceberg-rust]

2023-12-18 Thread via GitHub
ZENOTME opened a new pull request, #122: URL: https://github.com/apache/iceberg-rust/pull/122 fix #121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Build: Apply spotless for scala code [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on PR #8023: URL: https://github.com/apache/iceberg/pull/8023#issuecomment-1860064933 Just linking the conclusion here as I was searching for it. https://lists.apache.org/thread/sv70lr0bwl9jmxtzvho2ml5xcrcpzf3b -- This is an automated message from the Apache Git S

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429891452 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429892223 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [I] An exception occurred while writing iceberg data through Spark: org. apache. iceberg. exceptions. CommitFailedException: metadata location has changed [iceberg]

2023-12-18 Thread via GitHub
Zhangg7723 commented on issue #9178: URL: https://github.com/apache/iceberg/issues/9178#issuecomment-1860081404 > @Zhangg7723 ![screenshot-20231218-180612](https://private-user-images.githubusercontent.com/146182256/291238715-a42dd52d-7bf8-42ae-90d7-e7a5bb0606c2.png?

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429902142 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1429933857 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429938368 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429939138 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429946393 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429947177 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id] -

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1429955804 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429956178 ## pyiceberg/manifest.py: ## @@ -897,9 +902,11 @@ def prepare_manifest(self, manifest_file: ManifestFile) -> ManifestFile: class ManifestListWriterV2(ManifestListW

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1429960654 ## pyiceberg/table/__init__.py: ## @@ -209,6 +221,48 @@ def set_properties(self, **updates: str) -> Transaction: """ return self._append_updates(SetP

Re: [PR] BugFix: ORC reader is not closed when SortedMerge iterator is used for positional deletes [iceberg]

2023-12-18 Thread via GitHub
deniskuzZ commented on PR #9301: URL: https://github.com/apache/iceberg/pull/9301#issuecomment-1860221881 @pvary, thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] An exception occurred while writing iceberg data through Spark: org. apache. iceberg. exceptions. CommitFailedException: metadata location has changed [iceberg]

2023-12-18 Thread via GitHub
ismailsimsek commented on issue #9178: URL: https://github.com/apache/iceberg/issues/9178#issuecomment-1860252682 @AllenWee1106 could you also try with higher `commit.retry.num-retries` (default is 4) . to see if this decreases the conflicts. this is another [table property ](https://iceb

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430021109 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CreateViewAnalysis.scala: ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430026625 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on PR #9320: URL: https://github.com/apache/iceberg/pull/9320#issuecomment-1860267364 Closing in favour of https://github.com/apache/iceberg/pull/9331 Last time we reverted this version bump as a quick fix to unblock PR builders. -- This is an automated mess

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
dependabot[bot] commented on PR #9320: URL: https://github.com/apache/iceberg/pull/9320#issuecomment-1860267634 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430028232 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat closed pull request #9320: Build: Bump actions/labeler from 4 to 5 URL: https://github.com/apache/iceberg/pull/9320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430030696 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/MapRangePartitioner.java: ## @@ -0,0 +1,288 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430033876 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2ViewExec.scala: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache So

Re: [PR] Build: Bump nessie from 0.74.0 to 0.75.0 [iceberg]

2023-12-18 Thread via GitHub
dependabot[bot] commented on PR #9313: URL: https://github.com/apache/iceberg/pull/9313#issuecomment-1860285884 Sorry, only users with push access can use that command. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Build: Bump nessie from 0.74.0 to 0.75.0 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on PR #9313: URL: https://github.com/apache/iceberg/pull/9313#issuecomment-1860284457 Looks like env is flaky (only spark 3.3 test failed) `TestStoragePartitionedJoins > testJoinsWithDaysOnDateColumn FAILED ` mkdir failed ``` Suppressed: java

Re: [PR] Build: Bump nessie from 0.74.0 to 0.75.0 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on PR #9313: URL: https://github.com/apache/iceberg/pull/9313#issuecomment-1860285720 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430040455 ## jmh.gradle: ## @@ -21,10 +21,15 @@ if (jdkVersion != '8' && jdkVersion != '11' && jdkVersion != '17') { throw new GradleException("The JMH benchamrks must be run w

[PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
BsoBird opened a new pull request, #9333: URL: https://github.com/apache/iceberg/pull/9333 We found that under some boundary conditions, HadoopCatalogTable will suffer from data file loss. The problem is in BaseFileRewriteAction::doReplace. When an exception is caught, if it is an unre

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430041035 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestMapRangePartitioner.java: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Build: Bump nessie from 0.74.0 to 0.75.0 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on PR #9313: URL: https://github.com/apache/iceberg/pull/9313#issuecomment-1860299155 > Sorry, only users with push access can use that command. Please ask dependabot to rebase to retrigger the build. cc: @nastra, @Fokko I can't close and reopen the

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430046432 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestMapRangePartitioner.java: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
BsoBird commented on PR #9333: URL: https://github.com/apache/iceberg/pull/9333#issuecomment-1860311383 @RussellSpitzer Hi. can you check this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430068605 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestMapRangePartitioner.java: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430070376 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestMapRangePartitioner.java: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Flink: implement range partitioner for map data statistics [iceberg]

2023-12-18 Thread via GitHub
pvary commented on code in PR #9321: URL: https://github.com/apache/iceberg/pull/9321#discussion_r1430070789 ## flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/TestMapRangePartitioner.java: ## @@ -0,0 +1,511 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
RussellSpitzer commented on code in PR #9333: URL: https://github.com/apache/iceberg/pull/9333#discussion_r1430077254 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -129,7 +130,7 @@ public TableMetadata refresh() { @Override public void

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
RussellSpitzer commented on code in PR #9333: URL: https://github.com/apache/iceberg/pull/9333#discussion_r1430079206 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -146,29 +147,37 @@ public void commit(TableMetadata base, TableMetadata metada

Re: [PR] Build: Bump nessie from 0.74.0 to 0.75.0 [iceberg]

2023-12-18 Thread via GitHub
nastra commented on PR #9313: URL: https://github.com/apache/iceberg/pull/9313#issuecomment-1860370409 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
BsoBird commented on code in PR #9333: URL: https://github.com/apache/iceberg/pull/9333#discussion_r1430084523 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -129,7 +130,7 @@ public TableMetadata refresh() { @Override public void commit

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
BsoBird commented on PR #9333: URL: https://github.com/apache/iceberg/pull/9333#issuecomment-1860377763 @RussellSpitzer I have a question, since the versionId we defined is of type Integer, it looks like it could easily overflow. Isn't there some problem when it overflows? -- This is an

Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on code in PR #9298: URL: https://github.com/apache/iceberg/pull/9298#discussion_r1430095953 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -948,6 +950,17 @@ public static org.apache.spark.sql.catalyst.TableIdentifier to

Re: [PR] Flink: Support watermark alignment of source splits [iceberg]

2023-12-18 Thread via GitHub
pvary commented on PR #9308: URL: https://github.com/apache/iceberg/pull/9308#issuecomment-1860397381 > These messages are executed in the same thread as the fetch method, so in this case we have to return from the fetch loop, even with empty results. This is somewhat concerning to me consi

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2023-12-18 Thread via GitHub
BsoBird commented on PR #9333: URL: https://github.com/apache/iceberg/pull/9333#issuecomment-1860413716 If the int overflow doesn't cause any problems with versionId-related logic, I agree that the change to int->long will be reversed. -- This is an automated message from the Apache Git S

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430114586 ## pyiceberg/table/__init__.py: ## @@ -830,6 +884,49 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = Fa

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430115587 ## pyiceberg/table/__init__.py: ## @@ -830,6 +884,49 @@ def history(self) -> List[SnapshotLogEntry]: def update_schema(self, allow_incompatible_changes: bool = Fa

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430117518 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430117206 ## pyiceberg/table/__init__.py: ## @@ -209,6 +221,48 @@ def set_properties(self, **updates: str) -> Transaction: """ return self._append_updates(SetP

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430119429 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430119930 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [PR] Build: Bump actions/labeler from 4 to 5 [iceberg]

2023-12-18 Thread via GitHub
ajantha-bhat commented on code in PR #9331: URL: https://github.com/apache/iceberg/pull/9331#discussion_r1430121422 ## .github/labeler.yml: ## @@ -17,71 +17,171 @@ # under the License. # # Pull Request Labeler Github Action Configuration: https://github.com/marketplace/actio

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430121458 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430138494 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [PR] fix: fix parse partitions in manifest_list [iceberg-rust]

2023-12-18 Thread via GitHub
ZENOTME commented on PR #122: URL: https://github.com/apache/iceberg-rust/pull/122#issuecomment-1860504066 cc @Fokko @liurenjie1024 @Xuanwo @JanKaul -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Issue with CALL parsing [iceberg]

2023-12-18 Thread via GitHub
MojoML commented on issue #8343: URL: https://github.com/apache/iceberg/issues/8343#issuecomment-1860673870 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Parquet: Add system config for unsafe Parquet ID fallback. [iceberg]

2023-12-18 Thread via GitHub
amogh-jahagirdar commented on code in PR #9324: URL: https://github.com/apache/iceberg/pull/9324#discussion_r1430236452 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -72,6 +72,19 @@ private SystemConfigs() {} 8, Integer::parseUnsignedIn

Re: [PR] Write support [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1430244434 ## pyiceberg/table/__init__.py: ## @@ -1904,3 +2001,144 @@ def _generate_snapshot_id() -> int: snapshot_id = snapshot_id if snapshot_id >= 0 else snapshot_id * -1

Re: [I] Slowness when loading table from S3 [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on issue #220: URL: https://github.com/apache/iceberg-python/issues/220#issuecomment-1860766872 @itaise Thanks for reaching out. We're constantly improving on performance, and I would suggest bumping to a later version of PyIceberg. In 0.5.x we use Cython to parse the Avro f

Re: [I] Does spark integrated iceberg support concurrent insert overwrite [iceberg]

2023-12-18 Thread via GitHub
zz22394 commented on issue #2633: URL: https://github.com/apache/iceberg/issues/2633#issuecomment-1860773854 Short answer: Yes. Ref: # How does Iceberg handle multiple concurrent writes? https://www.dremio.com/apache-iceberg-faq/#h-how-does-iceberg-handle-multiple-concurrent-

Re: [I] Does spark integrated iceberg support concurrent insert overwrite [iceberg]

2023-12-18 Thread via GitHub
zz22394 commented on issue #2633: URL: https://github.com/apache/iceberg/issues/2633#issuecomment-1860777197 https://iceberg.apache.org/docs/latest/spark-writes/ **MERGE INTO** is recommended instead of **INSERT OVERWRITE** because Iceberg can replace only the affected data files -- Th

Re: [PR] Apply Name mapping [iceberg-python]

2023-12-18 Thread via GitHub
syun64 commented on PR #219: URL: https://github.com/apache/iceberg-python/pull/219#issuecomment-1860863136 Thank you for the context @rdblue . I will remove the fallback logic from `pyarrow_to_schema` and ignore setting identifier_field_ids property in `_ApplyNameMapping` -- This is an

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #7844: URL: https://github.com/apache/iceberg/pull/7844#discussion_r1430380074 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -53,6 +53,17 @@ private SystemConfigs() {} Math.max(2, Runtime.getRuntime().availableProces

Re: [PR] Spark: Add tests for select using tag and branch identifier [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9286: URL: https://github.com/apache/iceberg/pull/9286#discussion_r1430387289 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java: ## @@ -234,23 +234,31 @@ public void testVersionAsOf() { } @Test - public void t

Re: [PR] Spark: Add tests for select using tag and branch identifier [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9286: URL: https://github.com/apache/iceberg/pull/9286#discussion_r1430388075 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java: ## @@ -283,23 +291,31 @@ public void testUseSnapshotIdForTagReferenceAsOf() { } @

Re: [PR] Spark: Add tests for select using tag and branch identifier [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9286: URL: https://github.com/apache/iceberg/pull/9286#discussion_r1430388985 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java: ## @@ -283,23 +291,31 @@ public void testUseSnapshotIdForTagReferenceAsOf() { } @

Re: [PR] shutdown scheduler [iceberg]

2023-12-18 Thread via GitHub
nastra commented on code in PR #9150: URL: https://github.com/apache/iceberg/pull/9150#discussion_r1430418619 ## core/src/main/java/org/apache/iceberg/util/LockManagers.java: ## @@ -154,6 +154,14 @@ public void initialize(Map properties) { CatalogProperties.LOCK_H

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430438547 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CreateViewAnalysis.scala: ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Flink: Fix TestIcebergSourceWithWatermarkExtractor flakiness [iceberg]

2023-12-18 Thread via GitHub
stevenzwu commented on PR #9309: URL: https://github.com/apache/iceberg/pull/9309#issuecomment-1861098805 @pvary you can monitor the actions/workflows for Flink CI. https://github.com/apache/iceberg/actions/workflows/flink-ci.yml -- This is an automated message from the Apache Git Service

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430442941 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala: ## @@ -122,37 +147,132 @@ class Iceb

Re: [PR] Spark: Add support for Iceberg views [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on code in PR #9332: URL: https://github.com/apache/iceberg/pull/9332#discussion_r1430447699 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala: ## @@ -122,37 +147,132 @@ class Iceb

[I] Remove deprecated import [iceberg-python]

2023-12-18 Thread via GitHub
Fokko opened a new issue, #221: URL: https://github.com/apache/iceberg-python/issues/221 ### Feature Request / Improvement I noticed this warning in the logs today: ``` ../../Library/Caches/pypoetry/virtualenvs/pyiceberg--PPis2RJ-py3.11/lib/python3.11/site-packages/pyspark/b

[I] Remove MockAWSResponse and pyiceberg.io.fsspec.FsspecFileIO from testing [iceberg-python]

2023-12-18 Thread via GitHub
sebpretzer opened a new issue, #222: URL: https://github.com/apache/iceberg-python/issues/222 ### Feature Request / Improvement Hi Team, I was attempting to mess around with `PyIceberg` for my data pipelines. I was struggling implementing the same test framework due to dependen

Re: [I] Remove MockAWSResponse and pyiceberg.io.fsspec.FsspecFileIO from testing [iceberg-python]

2023-12-18 Thread via GitHub
Fokko commented on issue #222: URL: https://github.com/apache/iceberg-python/issues/222#issuecomment-1861185357 Hey @sebpretzer thanks for raising this PR, and I'd would love to get rid of those patches since they are hard to interpret on what's going on. Looking forward to the PR! -- Th

Re: [PR] Core: remove statistic files in CatalogUtil:dropTableData [iceberg]

2023-12-18 Thread via GitHub
dramaticlly commented on code in PR #9305: URL: https://github.com/apache/iceberg/pull/9305#discussion_r1430486073 ## core/src/main/java/org/apache/iceberg/CatalogUtil.java: ## @@ -117,6 +117,11 @@ public static void dropTableData(FileIO io, TableMetadata metadata) { I

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on PR #7844: URL: https://github.com/apache/iceberg/pull/7844#issuecomment-1861228366 This class cannot use a blocking queue with the worker pool, so I'm -1 on this change. The problem is that planning uses a shared threadpool. Using a blocking queue would cause task

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on code in PR #7844: URL: https://github.com/apache/iceberg/pull/7844#discussion_r1430496740 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -42,6 +42,17 @@ private SystemConfigs() {} Math.max(2, Runtime.getRuntime().availableProces

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2023-12-18 Thread via GitHub
rdblue commented on code in PR #7844: URL: https://github.com/apache/iceberg/pull/7844#discussion_r1430498570 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -53,7 +57,8 @@ public CloseableIterator iterator() { private final Iterator tasks; p

  1   2   3   >