Re: [I] Parallel Table.append [iceberg-python]

2024-02-18 Thread via GitHub
bigluck commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1951883375 Thanks @kevinjqliu Last week, I didn't test the code on my MBP; I did all the tests directly on the EC2 instance. BTW it seems to use all the cores on my M2 Max:

Re: [I] Tracking: Reading iceberg tables. [iceberg-rust]

2024-02-18 Thread via GitHub
sdd commented on issue #123: URL: https://github.com/apache/iceberg-rust/issues/123#issuecomment-1951870583 If you are aiming just to have table reads _working_ first, and optimizing them afterwards, then #124 is not completely necessary to do at this stage? -- This is an automated messag

Re: [PR] Aliyun: Add security token to OSS client properties [iceberg]

2024-02-18 Thread via GitHub
nastra merged PR #9671: URL: https://github.com/apache/iceberg/pull/9671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Migrate Write sub-classes in spark-extensions to JUnit5 and AssertJ style [iceberg]

2024-02-18 Thread via GitHub
nastra commented on code in PR #9670: URL: https://github.com/apache/iceberg/pull/9670#discussion_r1494119575 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -1181,7 +1170,7 @@ public synchronized void testDeleteWithSnapsho

Re: [I] ValidationException: Missing required files to delete [iceberg]

2024-02-18 Thread via GitHub
fengguangyuan commented on issue #9741: URL: https://github.com/apache/iceberg/issues/9741#issuecomment-1951823635 Hi, there. I believe it's the protection for the correctness of the existed data, instead of a bug. > Basic logics of parallel write: possibly read the same data, but

Re: [PR] Flink: Incrementally rewrite data files in streaming. [iceberg]

2024-02-18 Thread via GitHub
lurnagao commented on PR #3323: URL: https://github.com/apache/iceberg/pull/3323#issuecomment-1951754853 Hello, may I ask if the rewrite is asynchronous or synchronous -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Not able to run spark procedure rewrite_data_files [iceberg]

2024-02-18 Thread via GitHub
suryaprabhakark commented on issue #5946: URL: https://github.com/apache/iceberg/issues/5946#issuecomment-1951663572 I faced the same issue with Spark 3.3 but 3.2 is working fine. Not sure what is the issue though. I had to do the solution @Gschiavon suggested. Tested locally and Dataproc a

Re: [PR] Build: Bump pytest from 7.4.4 to 8.0.1 [iceberg-python]

2024-02-18 Thread via GitHub
dependabot[bot] commented on PR #439: URL: https://github.com/apache/iceberg-python/pull/439#issuecomment-1951554162 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] Build: Bump pytest from 7.4.4 to 8.0.1 [iceberg-python]

2024-02-18 Thread via GitHub
Fokko commented on PR #439: URL: https://github.com/apache/iceberg-python/pull/439#issuecomment-1951554119 This will be fixed by https://github.com/apache/iceberg-python/pull/393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Build: Bump pytest from 7.4.4 to 8.0.1 [iceberg-python]

2024-02-18 Thread via GitHub
Fokko closed pull request #439: Build: Bump pytest from 7.4.4 to 8.0.1 URL: https://github.com/apache/iceberg-python/pull/439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] docstring: Fix missing commit [iceberg-python]

2024-02-18 Thread via GitHub
Fokko commented on PR #432: URL: https://github.com/apache/iceberg-python/pull/432#issuecomment-1951553324 @kevinjqliu I would love that. I searched for that in the past but the tooling was quite thin at the time. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Update NameMapping on update_schema() [iceberg-python]

2024-02-18 Thread via GitHub
Fokko commented on code in PR #441: URL: https://github.com/apache/iceberg-python/pull/441#discussion_r1493898782 ## pyiceberg/table/__init__.py: ## @@ -1932,6 +1928,13 @@ def commit(self) -> None: else: updates = (SetCurrentSchemaUpdate(schema_id=

Re: [I] File leaking in RemoveSnapshots API [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] closed issue #822: File leaking in RemoveSnapshots API URL: https://github.com/apache/iceberg/issues/822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] can not create iceberg table by hive catalog in emr with maridb [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] commented on issue #798: URL: https://github.com/apache/iceberg/issues/798#issuecomment-1951496824 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Vectorized reads- eagerly decode parquet dictionary encoded data for fixed width types [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] commented on issue #835: URL: https://github.com/apache/iceberg/issues/835#issuecomment-1951496864 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Vectorized reads- eagerly decode parquet dictionary encoded data for fixed width types [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] closed issue #835: Vectorized reads- eagerly decode parquet dictionary encoded data for fixed width types URL: https://github.com/apache/iceberg/issues/835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Vectorized reads - explore replacing DateDayVector and TimestampMicroTZVector with IntVector and BigIntVector respectively [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] commented on issue #834: URL: https://github.com/apache/iceberg/issues/834#issuecomment-1951496848 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] Vectorized reads - explore replacing DateDayVector and TimestampMicroTZVector with IntVector and BigIntVector respectively [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] closed issue #834: Vectorized reads - explore replacing DateDayVector and TimestampMicroTZVector with IntVector and BigIntVector respectively URL: https://github.com/apache/iceberg/issues/834 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] File leaking in RemoveSnapshots API [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] commented on issue #822: URL: https://github.com/apache/iceberg/issues/822#issuecomment-1951496835 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git

Re: [I] can not create iceberg table by hive catalog in emr with maridb [iceberg]

2024-02-18 Thread via GitHub
github-actions[bot] closed issue #798: can not create iceberg table by hive catalog in emr with maridb URL: https://github.com/apache/iceberg/issues/798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] Update NameMapping on update_schema() [iceberg-python]

2024-02-18 Thread via GitHub
syun64 opened a new pull request, #441: URL: https://github.com/apache/iceberg-python/pull/441 Similar to the [Java implementation](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SchemaUpdate.java#L464), we should update the existing name_mapping on update

Re: [PR] API: Extend FileIO and add EncryptingFileIO. [iceberg]

2024-02-18 Thread via GitHub
rdblue commented on code in PR #9592: URL: https://github.com/apache/iceberg/pull/9592#discussion_r1493844182 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -184,25 +181,15 @@ protected InputFile getInputFile(String location) { priv

Re: [PR] API: Extend FileIO and add EncryptingFileIO. [iceberg]

2024-02-18 Thread via GitHub
rdblue commented on code in PR #9592: URL: https://github.com/apache/iceberg/pull/9592#discussion_r1493843963 ## core/src/main/java/org/apache/iceberg/ManifestFiles.java: ## @@ -345,34 +345,24 @@ private static ManifestFile copyManifestInternal( return writer.toManifestFile

Re: [PR] API: Extend FileIO and add EncryptingFileIO. [iceberg]

2024-02-18 Thread via GitHub
rdblue commented on code in PR #9592: URL: https://github.com/apache/iceberg/pull/9592#discussion_r1493843678 ## core/src/main/java/org/apache/iceberg/io/ContentCache.java: ## @@ -32,16 +32,15 @@ import org.apache.iceberg.exceptions.NotFoundException; import org.apache.iceberg

Re: [PR] API: Extend FileIO and add EncryptingFileIO. [iceberg]

2024-02-18 Thread via GitHub
rdblue commented on code in PR #9592: URL: https://github.com/apache/iceberg/pull/9592#discussion_r1493843026 ## core/src/main/java/org/apache/iceberg/io/ContentCache.java: ## @@ -232,80 +237,63 @@ public long getLength() { @Override public SeekableInputStream newStrea

Re: [PR] API: Extend FileIO and add EncryptingFileIO. [iceberg]

2024-02-18 Thread via GitHub
rdblue commented on code in PR #9592: URL: https://github.com/apache/iceberg/pull/9592#discussion_r1493842868 ## core/src/test/java/org/apache/iceberg/hadoop/TestCatalogUtilDropTable.java: ## @@ -201,6 +188,25 @@ public void shouldNotDropDataFilesIfGcNotEnabled() { .con

Re: [I] Parallel Table.append [iceberg-python]

2024-02-18 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1951397075 Also, @bigluck, while running the code to generate the data using faker, I opened `htop` and saw that it was using 6 CPUs. I'm using a M1 Mac -- This is an automated messa

Re: [I] Parallel Table.append [iceberg-python]

2024-02-18 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1951395305 It seems like there's an upper bound to the size of the RecordBatch produced by `to_batches`. I tried setting `max_chunksize` from `16 MB` to `256 MB`. All the batches produc

Re: [I] Parallel Table.append [iceberg-python]

2024-02-18 Thread via GitHub
kevinjqliu commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1951390652 I took the above code and did some investigation. Here's the notebook to see it in action https://colab.research.google.com/drive/12O4ARckCwJqP2U6L4WxREZbyPv_AmWC2?usp=sha

Re: [PR] chore(deps): Update derive_builder requirement from 0.13.0 to 0.20.0 [iceberg-rust]

2024-02-18 Thread via GitHub
Xuanwo commented on PR #203: URL: https://github.com/apache/iceberg-rust/pull/203#issuecomment-1951295846 Why bumping from 0.13 to 0.20?... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[PR] chore(deps): Update derive_builder requirement from 0.13.0 to 0.20.0 [iceberg-rust]

2024-02-18 Thread via GitHub
dependabot[bot] opened a new pull request, #203: URL: https://github.com/apache/iceberg-rust/pull/203 Updates the requirements on [derive_builder](https://github.com/colin-kiegel/rust-derive-builder) to permit the latest version. Release notes Sourced from https://github.com/colin

Re: [I] "Manifest is missing" ValidationException when there have Concurrent applications to rewrite manifests [iceberg]

2024-02-18 Thread via GitHub
amitgilad3 commented on issue #3466: URL: https://github.com/apache/iceberg/issues/3466#issuecomment-1951146740 Thanks @372242283 - i know this is still an issue but i saw a pr and was wondering if the pr was abandoned and if so i would like to work on fixing this issue -- This is an aut