Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]
github-actions[bot] closed pull request #11728: Spark3.4,3.5,Api,Hive: Fix using NullType in View. URL: https://github.com/apache/iceberg/pull/11728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]
github-actions[bot] commented on PR #11824: URL: https://github.com/apache/iceberg/pull/11824#issuecomment-2601090276 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@iceberg.apache.org list. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Allow adding synthetic partition for existing data in table [iceberg]
github-actions[bot] closed issue #10658: Allow adding synthetic partition for existing data in table URL: https://github.com/apache/iceberg/issues/10658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] MergeSchema doesn't work if missing columns is used for Write Ordering. [iceberg]
github-actions[bot] commented on issue #10751: URL: https://github.com/apache/iceberg/issues/10751#issuecomment-2601090164 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] uppercase table name not supported [iceberg]
github-actions[bot] commented on issue #10758: URL: https://github.com/apache/iceberg/issues/10758#issuecomment-2601090182 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Allow adding synthetic partition for existing data in table [iceberg]
github-actions[bot] commented on issue #10658: URL: https://github.com/apache/iceberg/issues/10658#issuecomment-2601090122 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]
github-actions[bot] commented on PR #11728: URL: https://github.com/apache/iceberg/pull/11728#issuecomment-2601090226 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] ManifestReader is not properly closed in BaseTableScan [iceberg]
maswin commented on issue #104: URL: https://github.com/apache/iceberg/issues/104#issuecomment-2601106962 We even see this in `1.4.1` version ``` 2025-01-14T20:42:05.211Z WARNFinalizer org.apache.iceberg.hadoop.HadoopStreams Unclosed output stream created by: org.apache.iceberg.hadoop.HadoopStreams$HadoopPositionOutputStream.(HadoopStreams.java:152) org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:66) org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85) org.apache.iceberg.avro.AvroFileAppender.(AvroFileAppender.java:56) org.apache.iceberg.avro.Avro$WriteBuilder.build(Avro.java:191) org.apache.iceberg.ManifestWriter$V1Writer.newAppender(ManifestWriter.java:315) org.apache.iceberg.ManifestWriter.(ManifestWriter.java:58) org.apache.iceberg.ManifestWriter.(ManifestWriter.java:34) org.apache.iceberg.ManifestWriter$V1Writer.(ManifestWriter.java:293) org.apache.iceberg.ManifestFiles.write(ManifestFiles.java:166) org.apache.iceberg.SnapshotProducer.newManifestWriter(SnapshotProducer.java:529) org.apache.iceberg.MergingSnapshotProducer$DataFileMergeManager.newManifestWriter(MergingSnapshotProducer.java:1082) org.apache.iceberg.ManifestMergeManager.createManifest(ManifestMergeManager.java:171) org.apache.iceberg.ManifestMergeManager.lambda$mergeGroup$1(ManifestMergeManager.java:156) org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69) org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315) java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) java.base/java.lang.Thread.run(Thread.java:1583) ``` ``` 2025-01-14T20:41:49.408Z WARNFinalizer org.apache.iceberg.hadoop.HadoopStreams Unclosed input stream created by: org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.(HadoopStreams.java:91) org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:55) org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183) org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100) org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76) org.apache.iceberg.io.CloseableIterable$7$1.(CloseableIterable.java:188) org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187) org.apache.iceberg.ManifestMergeManager.createManifest(ManifestMergeManager.java:176) org.apache.iceberg.ManifestMergeManager.lambda$mergeGroup$1(ManifestMergeManager.java:156) org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69) org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315) java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) java.base/java.lang.Thread.run(Thread.java:1583) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.assertj:assertj-core from 3.27.2 to 3.27.3 [iceberg]
Fokko commented on PR #12002: URL: https://github.com/apache/iceberg/pull/12002#issuecomment-2600973267 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump io.netty:netty-buffer from 4.1.116.Final to 4.1.117.Final [iceberg]
Fokko commented on PR #11999: URL: https://github.com/apache/iceberg/pull/11999#issuecomment-2600973211 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.apache.datasketches:datasketches-java from 6.1.1 to 6.2.0 [iceberg]
Fokko commented on PR #12000: URL: https://github.com/apache/iceberg/pull/12000#issuecomment-2600973225 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.47.2.0 to 3.48.0.0 [iceberg]
Fokko commented on PR #12001: URL: https://github.com/apache/iceberg/pull/12001#issuecomment-2600973240 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.52.0 to 26.53.0 [iceberg]
Fokko commented on PR #12003: URL: https://github.com/apache/iceberg/pull/12003#issuecomment-2600973288 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.2 [iceberg]
Fokko commented on PR #11998: URL: https://github.com/apache/iceberg/pull/11998#issuecomment-2600973190 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump boto3 from 1.35.93 to 1.36.1 [iceberg-python]
Fokko merged PR #1536: URL: https://github.com/apache/iceberg-python/pull/1536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]
shiv-io commented on PR #1242: URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-260097 Makes sense, @sungwy -- thanks! Added the test, let me know if that looks good -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] please add requires-python to pyproject.toml [iceberg-rust]
kevinjqliu commented on issue #896: URL: https://github.com/apache/iceberg-rust/issues/896#issuecomment-2600976947 Good idea! This applies to [bindings/python/pyproject.toml](https://github.com/apache/iceberg-rust/blob/main/bindings/python/pyproject.toml) which we use for `pyiceberg_core`. And it should stay in sync with pyiceberg's python versions https://github.com/apache/iceberg-python/blob/fa1bd85ee83a2de13eaaad91abc40ca83eae6c4e/pyproject.toml#L52 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]
kevinjqliu commented on PR #1534: URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2600991160 Thanks @mattmartin14 for the PR! And thanks @bitsondatadev on the tips on working in OSS. I certainly had to learn a lot of these over the years. A couple things I think we can address first. 1. Support for MERGE INTO / Upsert This has been a much anticipated and asked feature in the community. Issue #402 has been tracking it with many eyes on it. I think we still need to figure out the best approach to support this feature. Like you mentioned in the description, `MERGE INTO` is a query engine feature. Pyiceberg itself is a client library to support the Iceberg python ecosystem. Pyiceberg aims to provide the necessary Iceberg building blocks so that other engines/programs can interact with Iceberg tables easily. As we’re building out more of more engine-like features, it becomes harder to support more complex and data-intensive workloads such as MERGE INTO. We have been able to use pyarrow for query processing but it has its own limitations. For more compute intensive workloads, such as Bucket and Truncate transform, we were able to leverage rust (iceberg-rust) to handle the computation. Looking at #402, I don’t see any concrete plans on how we can support MERGE INTO. I’ve added this as an agenda on the [monthly pyiceberg sync](https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U/edit?tab=t.0#heading=h.rxx2wa3o215y) and will post the update. Please join us if you have time! 2. Taking on Datafusion as a dependency I’m very interested in exploring datafusion and ways we can leverage it for this project. As I mentioned above, we currently use pyarrow to handle most of the compute. It’ll be interesting to evaluate datafusion as an alternative. Datafusion has its own ecosystem of expression api, dataframe api, and runtime. All of which are good complements to pyiceberg. It has integrations with the rust side as well, something I have started exploring in https://github.com/apache/iceberg-rust/issues/865 That said, I think we need a wider discussion and alignment on how to integrate with datafusion. It’s a good time to start thinking about it! I’ve added this as another discussion item on the monthly sync. 3. Performance concerns Compute intensive workloads are generally a bottleneck in python. I am excited for future pyiceberg <> iceberg-rust integration where we can leverage rust to perform those computations. > The composite key code builds an overwrite filter, and once that filter gets too lengthy (in my testing more than 200 rows), the visitor “OR” function in pyiceberg hits a recursion depth error. This is an interesting observation and I think I’ve seen someone else run into this issue before. We’d want to address this separately. This is something we might want to explore using datafusion’s expression api to replace our own parser. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
kevinjqliu commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921620305 ## mkdocs/docs/configuration.md: ## @@ -195,6 +198,86 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya +## Location Providers + +Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg Review Comment: ```suggestion Apache Iceberg uses the concept of a `LocationProvider` to manage file paths for a table's data. In PyIceberg, the `LocationProvider` module is designed to be pluggable, allowing customization for specific use cases. The `LocationProvider` for a table can be specified through table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider), which generates file paths that are optimized for object storage ``` ## mkdocs/docs/configuration.md: ## @@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table behavior. ### Write options -| Key| Options | Default | Description | -| -- | - | --- | --- | -| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | -| `write.parquet.compression-level` | Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | -| `write.parquet.row-group-limit`| Number of rows| 1048576 | The upper bound of the number of entries within a single row group | -| `write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | -| `write.parquet.page-row-limit` | Number of rows| 2 | Set a target threshold for the maximum number of rows within a column chunk | -| `write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group | -| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| Key | Options | Default | Description | +|--|---|-|-| +| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | +| `write.parquet.compression-level`| Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | +| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | +| `write.parquet.page-size-bytes` | Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | +| `write.parquet.page-row-limit` | Number of rows | 2 | Set a target threshold for the maximum number of rows within a column chunk | +| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group | +| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| `write.object-storage.enabled` | Boolean | True| Enables the [ObjectStoreLocationProvider](configuration.md#objectsto
Re: [PR] Refactor to write APIs to default to `main` branch [iceberg-python]
kevinjqliu closed pull request #312: Refactor to write APIs to default to `main` branch URL: https://github.com/apache/iceberg-python/pull/312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] feat(catalog): Have Load use "type" property and "name" for config [iceberg-go]
zeroshade opened a new pull request, #260: URL: https://github.com/apache/iceberg-go/pull/260 As brought up in https://github.com/apache/iceberg-go/pull/244#discussion_r1911257805 this PR implements using a "type" property when loading catalogs and looking up catalog configurations using the provided "name", only using the uri scheme as a fallback when necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.32.8 to 1.33.0 [iceberg-go]
zeroshade merged PR #259: URL: https://github.com/apache/iceberg-go/pull/259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.72.2 to 1.73.2 [iceberg-go]
zeroshade merged PR #255: URL: https://github.com/apache/iceberg-go/pull/255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] build(deps): bump google.golang.org/api from 0.216.0 to 0.217.0 [iceberg-go]
zeroshade merged PR #257: URL: https://github.com/apache/iceberg-go/pull/257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.105.1 to 1.105.3 [iceberg-go]
zeroshade merged PR #258: URL: https://github.com/apache/iceberg-go/pull/258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.28.11 to 1.29.1 [iceberg-go]
zeroshade merged PR #256: URL: https://github.com/apache/iceberg-go/pull/256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Python version 3.13 to test matrix. [iceberg-python]
kevinjqliu commented on PR #1377: URL: https://github.com/apache/iceberg-python/pull/1377#issuecomment-2601016186 Blocked on Ray 3.13 https://github.com/ray-project/ray/issues/49738 We can run `poetry update` after -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Nighly build for Iceberg REST fixtures [iceberg]
Fokko opened a new pull request, #12008: URL: https://github.com/apache/iceberg/pull/12008 While trying to downstream V3 into PyIceberg/Iceberg-Rust/etc, I think it would be good to rebuild the REST fixtures every night. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]
kevinjqliu commented on code in PR #12008: URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921636817 ## .github/workflows/publish-iceberg-rest-fixture-docker.yml: ## @@ -20,9 +20,8 @@ name: Build and Push 'iceberg-rest-fixture' Docker Image on: - push: -tags: - - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+' Review Comment: let's keep both, we still need the tag for https://github.com/apache/iceberg/pull/12008/files#diff-24cba6867a8ec1ac782a5dbfb5ec71f84ed7377564afeada5661647f2b480879L47-L51 This way we have a tag for each release and another for `latest` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] [infra] regenerate poetry lock [iceberg-python]
kevinjqliu opened a new pull request, #1538: URL: https://github.com/apache/iceberg-python/pull/1538 Since we bumped Poetry to `2.0.1` in #1525, we have not regenerated poetry lock. Looks like poetry adds a lot of additional information to the lock file. Let's regenerate the lock file on `main`. This PR runs `poetry lock` on a clean install ``` pip uninstall poetry make install poetry lock ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.assertj:assertj-core from 3.27.2 to 3.27.3 [iceberg]
Fokko merged PR #12002: URL: https://github.com/apache/iceberg/pull/12002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.47.2.0 to 3.48.0.0 [iceberg]
Fokko merged PR #12001: URL: https://github.com/apache/iceberg/pull/12001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump io.netty:netty-buffer from 4.1.116.Final to 4.1.117.Final [iceberg]
Fokko merged PR #11999: URL: https://github.com/apache/iceberg/pull/11999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.apache.datasketches:datasketches-java from 6.1.1 to 6.2.0 [iceberg]
Fokko merged PR #12000: URL: https://github.com/apache/iceberg/pull/12000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] [infra] regenerate poetry lock [iceberg-python]
Fokko merged PR #1538: URL: https://github.com/apache/iceberg-python/pull/1538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.52.0 to 26.53.0 [iceberg]
Fokko merged PR #12003: URL: https://github.com/apache/iceberg/pull/12003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]
kevinjqliu commented on code in PR #12008: URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921636978 ## .github/workflows/publish-iceberg-rest-fixture-docker.yml: ## @@ -20,9 +20,8 @@ name: Build and Push 'iceberg-rest-fixture' Docker Image on: - push: -tags: - - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+' Review Comment: we dont have any tag releases yet since this code was added after the last tag https://github.com/apache/iceberg/tags -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651044 ## mkdocs/docs/configuration.md: ## @@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table behavior. ### Write options -| Key| Options | Default | Description | -| -- | - | --- | --- | -| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | -| `write.parquet.compression-level` | Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | -| `write.parquet.row-group-limit`| Number of rows| 1048576 | The upper bound of the number of entries within a single row group | -| `write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | -| `write.parquet.page-row-limit` | Number of rows| 2 | Set a target threshold for the maximum number of rows within a column chunk | -| `write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group | -| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| Key | Options | Default | Description | +|--|---|-|-| +| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | +| `write.parquet.compression-level`| Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | +| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | +| `write.parquet.page-size-bytes` | Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | +| `write.parquet.page-row-limit` | Number of rows | 2 | Set a target threshold for the maximum number of rows within a column chunk | +| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group | +| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| `write.object-storage.enabled` | Boolean | True| Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths| +| `write.object-storage.partitioned-paths` | Boolean | True| Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled | +| `write.py-location-provider.impl`| String of form `module.ClassName` | null| Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation | Review Comment: https://github.com/user-attachments/assets/ff4a70f9-ad5d-4d78-a665-5bd7b6283c7c"; /> Don't love how this looks. I prefer what it is now: https://github.com/user-attachments/assets/d1789914-b8d6-449d-8700-4ebee432f5ef"; /> I've changed the section li
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651470 ## mkdocs/docs/configuration.md: ## @@ -195,6 +198,86 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya +## Location Providers + +Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg +introduces a pluggable LocationProvider module; the LocationProvider used may be specified on a per-table basis via +table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider), +which generates file paths that are optimized for object storage. + +### SimpleLocationProvider + +The SimpleLocationProvider places file names underneath a `data` directory in the table's storage location. For example, +a non-partitioned table might have a data file with location: + +```txt +s3://bucket/ns/table/data/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet +``` + +When data is partitioned, files under a given partition are grouped into a subdirectory, with that partition key Review Comment: Hopefully [this](https://github.com/apache/iceberg-python/pull/1537/commits/76f397b35abaa1555ede59ad5c5a4fce8c5f1374#diff-497e037708cc64870c6ba9372f6064a69ca1e74d65d6195dcee5a44851e8b47dR221) and [this](https://github.com/apache/iceberg-python/pull/1537/commits/76f397b35abaa1555ede59ad5c5a4fce8c5f1374#diff-497e037708cc64870c6ba9372f6064a69ca1e74d65d6195dcee5a44851e8b47dR241) is wha you meant -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651630 ## mkdocs/docs/configuration.md: ## @@ -195,6 +198,86 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya +## Location Providers + +Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg +introduces a pluggable LocationProvider module; the LocationProvider used may be specified on a per-table basis via +table properties. PyIceberg defaults to the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider), +which generates file paths that are optimized for object storage. + +### SimpleLocationProvider + +The SimpleLocationProvider places file names underneath a `data` directory in the table's storage location. For example, +a non-partitioned table might have a data file with location: + +```txt +s3://bucket/ns/table/data/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet +``` + +When data is partitioned, files under a given partition are grouped into a subdirectory, with that partition key +and value as the directory name. For example, a table partitioned over a string column `category` might have a data file +with location: + +```txt +s3://bucket/ns/table/data/category=orders/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet +``` + +The SimpleLocationProvider is enabled for a table by explicitly setting its `write.object-storage.enabled` table +property to `False`. + +### ObjectStoreLocationProvider + +When several files are stored under the same prefix, cloud object stores such as S3 often [throttle requests on prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3), +resulting in slowdowns. + +The ObjectStoreLocationProvider counteracts this by injecting deterministic hashes, in the form of binary directories, +into file paths, to distribute files across a larger number of object store prefixes. + +Paths contain partitions just before the file name and a `data` directory beneath the table's location, in a similar +manner to the [SimpleLocationProvider](configuration.md#simplelocationprovider). For example, a table partitioned over a string +column `category` might have a data file with location: (note the additional binary directories) + +```txt +s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet +``` + +The `write.object-storage.enabled` table property determines whether the ObjectStoreLocationProvider is enabled for a +table. It is used by default. + + Partition Exclusion + +When the ObjectStoreLocationProvider is used, the table property `write.object-storage.partitioned-paths`, which +defaults to `True`, can be set to `False` as an additional optimization for object stores. This omits partition keys and +values from data file paths *entirely* to further reduce key size. With it disabled, the same data file above would +instead be written to: (note the absence of `category=orders`) + +```txt +s3://bucket/ns/table/data/1101/0100/1011/00111010-0-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet +``` Review Comment: I have the False case just above ("the same data file above" here) - or do you mean making that more explicit? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652049 ## mkdocs/docs/configuration.md: ## @@ -195,6 +198,86 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya +## Location Providers + +Iceberg works with the concept of a LocationProvider that determines file paths for a table's data. PyIceberg Review Comment: I've changed to backticks around `LocationProvider` and its implementations throughout. I keep them as Location Provider (e.g. Object Store Location Provider, without backticks) in section headings though for readability (to not have code-like terms in headings). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652324 ## mkdocs/docs/configuration.md: ## @@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table behavior. ### Write options -| Key| Options | Default | Description | -| -- | - | --- | --- | -| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | -| `write.parquet.compression-level` | Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | -| `write.parquet.row-group-limit`| Number of rows| 1048576 | The upper bound of the number of entries within a single row group | -| `write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | -| `write.parquet.page-row-limit` | Number of rows| 2 | Set a target threshold for the maximum number of rows within a column chunk | -| `write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group | -| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| Key | Options | Default | Description | +|--|---|-|-| +| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | +| `write.parquet.compression-level`| Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | +| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | +| `write.parquet.page-size-bytes` | Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | +| `write.parquet.page-row-limit` | Number of rows | 2 | Set a target threshold for the maximum number of rows within a column chunk | +| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group | +| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| `write.object-storage.enabled` | Boolean | True| Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths| +| `write.object-storage.partitioned-paths` | Boolean | True| Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled | +| `write.py-location-provider.impl`| String of form `module.ClassName` | null| Optional, [custom LocationProvider](configuration.md#loading-a-custom-locationprovider) implementation | Review Comment: (The above screenshot also shows how code/backticks hyperlinks look, I think they're fine. This is now relevant because of https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652049. -- This is an automated message from the Apache Git Service. To respo
Re: [PR] Docs: Location Provider Documentation [iceberg-python]
smaheshwar-pltr commented on code in PR #1537: URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652515 ## mkdocs/docs/configuration.md: ## @@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table behavior. ### Write options -| Key| Options | Default | Description | -| -- | - | --- | --- | -| `write.parquet.compression-codec` | `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | -| `write.parquet.compression-level` | Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | -| `write.parquet.row-group-limit`| Number of rows| 1048576 | The upper bound of the number of entries within a single row group | -| `write.parquet.page-size-bytes`| Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | -| `write.parquet.page-row-limit` | Number of rows| 2 | Set a target threshold for the maximum number of rows within a column chunk | -| `write.parquet.dict-size-bytes`| Size in bytes | 2MB | Set the dictionary page size limit per row group | -| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| Key | Options | Default | Description | +|--|---|-|-| +| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` | zstd| Sets the Parquet compression coddec. | +| `write.parquet.compression-level`| Integer | null| Parquet compression level for the codec. If not set, it is up to PyIceberg | +| `write.parquet.row-group-limit` | Number of rows | 1048576 | The upper bound of the number of entries within a single row group | +| `write.parquet.page-size-bytes` | Size in bytes | 1MB | Set a target threshold for the approximate encoded size of data pages within a column chunk | +| `write.parquet.page-row-limit` | Number of rows | 2 | Set a target threshold for the maximum number of rows within a column chunk | +| `write.parquet.dict-size-bytes` | Size in bytes | 2MB | Set the dictionary page size limit per row group | +| `write.metadata.previous-versions-max` | Integer | 100 | The max number of previous version metadata files to keep before deleting after commit. | +| `write.object-storage.enabled` | Boolean | True| Enables the [ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) that adds a hash component to file paths| +| `write.object-storage.partitioned-paths` | Boolean | True| Controls whether [partition values are included in file paths](configuration.md#partition-exclusion) when object storage is enabled | Review Comment: I've checked all hyperlinks on the current version and they work as intended -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-
Re: [PR] Core: List namespaces/tables when testing identifier with a dot [iceberg]
smaheshwar-pltr commented on code in PR #11991: URL: https://github.com/apache/iceberg/pull/11991#discussion_r1921682719 ## open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java: ## @@ -90,4 +90,10 @@ protected boolean overridesRequestedLocation() { RESTCompatibilityKitSuite.RCK_OVERRIDES_REQUESTED_LOCATION, false); } + + @Override + protected boolean supportsNamesWithDot() { +// underlying JDBC catalog doesn't support namespaces with a dot +return false; Review Comment: (Semi-nit) The REST catalog that people test with the RCK *could* support namespaces with a dot though, right? Should we maybe make this configurable via ```java return PropertyUtil.propertyAsBoolean( restCatalog.properties(), RESTCompatibilityKitSuite.RCK_SUPPORTS_NAMES_WITH_DOT, false); ``` like elsewhere in this class? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Flaky test `TestRewritePositionDeleteFilesAction` initializationError [iceberg]
manuzhang opened a new issue, #12009: URL: https://github.com/apache/iceberg/issues/12009 ### Apache Iceberg version main (development) ### Query engine Spark ### Please describe the bug 🐞 https://github.com/apache/iceberg/actions/runs/12857234112/job/35845760586 ``` TestRewritePositionDeleteFilesAction > initializationError FAILED java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:44503 at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:304) at org.eclipse.jetty.server.Server.lambda$doStart$0(Server.java:402) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.eclipse.jetty.server.Server.doStart(Server.java:398) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.apache.iceberg.rest.RESTCatalogServer.start(RESTCatalogServer.java:116) at org.apache.iceberg.rest.RESTServerExtension.beforeAll(RESTServerExtension.java:62) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$13(ClassBasedTestDescriptor.java:396) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllCallbacks(ClassBasedTestDescriptor.java:396) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:212) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:85) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:153) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:160) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35) at org.junit.platform.engine.
Re: [PR] Core: List namespaces/tables when testing identifier with a dot [iceberg]
nastra commented on code in PR #11991: URL: https://github.com/apache/iceberg/pull/11991#discussion_r1921923905 ## open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java: ## @@ -90,4 +90,10 @@ protected boolean overridesRequestedLocation() { RESTCompatibilityKitSuite.RCK_OVERRIDES_REQUESTED_LOCATION, false); } + + @Override + protected boolean supportsNamesWithDot() { +// underlying JDBC catalog doesn't support namespaces with a dot +return false; Review Comment: fair point, done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]
nastra merged PR #11824: URL: https://github.com/apache/iceberg/pull/11824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: add apache amoro(incubating) with iceberg (#11965) [iceberg]
nastra commented on code in PR #11966: URL: https://github.com/apache/iceberg/pull/11966#discussion_r1921916823 ## docs/docs/amoro.md: ## @@ -0,0 +1,90 @@ +--- +title: "Apache Amoro" +--- + + +# Apache Amoro With Iceberg + +**[Apache Amoro(incubating)](https://amoro.apache.org/docs/latest/)** is a Lakehouse management system built on open data lake formats. Working with compute engines including Flink, Spark, and Trino, Amoro brings pluggable and +**[Table Maintenance](https://amoro.apache.org/docs/latest/self-optimizing/)** features for Lakehouse to provide out-of-the-box data warehouse experience, and helps data platforms or products easily build infra-decoupled, stream-and-batch-fused and lake-native architecture. + + +# Auto Self-optimizing + +Lakehouse is characterized by its openness and loose coupling, with data and files maintained by users through various engines. While this +architecture appears to be well-suited for T+1 scenarios, as more attention is paid to applying Lakehouse to streaming data warehouses and real-time +analysis scenarios, challenges arise. For example: + +- Streaming writes bring a massive amount of fragment files +- CDC ingestion and streaming updates generate excessive redundant data +- Using the new data lake format leads to orphan files and expired snapshots. + +These issues can significantly affect the performance and cost of data analysis. Therefore, Amoro has introduced a Self-optimizing mechanism to +create an out-of-the-box Streaming Lakehouse management service that is as user-friendly as a traditional database or data warehouse. The new table +format is used for this purpose. Self-optimizing involves various procedures such as file compaction, deduplication, and sorting. + +The architecture and working mechanism of Self-optimizing are shown in the figure below: + + + +The Optimizer is a component responsible for executing Self-optimizing tasks. It is a resident process managed by AMS. AMS is responsible for Review Comment: I don't see a mention that AMS stands for Amoro Meta Store in the text. Could you please add that to the first time where AMS is mentioned? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Docs: add apache amoro(incubating) with iceberg (#11965) [iceberg]
czy006 commented on code in PR #11966: URL: https://github.com/apache/iceberg/pull/11966#discussion_r1921936611 ## docs/docs/amoro.md: ## @@ -0,0 +1,90 @@ +--- +title: "Apache Amoro" +--- + + +# Apache Amoro With Iceberg + +**[Apache Amoro(incubating)](https://amoro.apache.org/docs/latest/)** is a Lakehouse management system built on open data lake formats. Working with compute engines including Flink, Spark, and Trino, Amoro brings pluggable and +**[Table Maintenance](https://amoro.apache.org/docs/latest/self-optimizing/)** features for Lakehouse to provide out-of-the-box data warehouse experience, and helps data platforms or products easily build infra-decoupled, stream-and-batch-fused and lake-native architecture. + + +# Auto Self-optimizing + +Lakehouse is characterized by its openness and loose coupling, with data and files maintained by users through various engines. While this +architecture appears to be well-suited for T+1 scenarios, as more attention is paid to applying Lakehouse to streaming data warehouses and real-time +analysis scenarios, challenges arise. For example: + +- Streaming writes bring a massive amount of fragment files +- CDC ingestion and streaming updates generate excessive redundant data +- Using the new data lake format leads to orphan files and expired snapshots. + +These issues can significantly affect the performance and cost of data analysis. Therefore, Amoro has introduced a Self-optimizing mechanism to +create an out-of-the-box Streaming Lakehouse management service that is as user-friendly as a traditional database or data warehouse. The new table +format is used for this purpose. Self-optimizing involves various procedures such as file compaction, deduplication, and sorting. + +The architecture and working mechanism of Self-optimizing are shown in the figure below: + + + +The Optimizer is a component responsible for executing Self-optimizing tasks. It is a resident process managed by AMS. AMS is responsible for Review Comment: already add -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]
Fokko commented on code in PR #12008: URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921943161 ## .github/workflows/publish-iceberg-rest-fixture-docker.yml: ## @@ -20,9 +20,8 @@ name: Build and Push 'iceberg-rest-fixture' Docker Image on: - push: -tags: - - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+' Review Comment: Ah, good catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]
Fokko commented on code in PR #12008: URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921943661 ## .github/workflows/publish-iceberg-rest-fixture-docker.yml: ## @@ -20,9 +20,8 @@ name: Build and Push 'iceberg-rest-fixture' Docker Image on: - push: -tags: - - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+' + schedule: +- cron: '0 2 * * *' # run at 2 AM UTC Review Comment: ```suggestion push: tags: - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+' schedule: - cron: '0 2 * * *' # run at 2 AM UTC ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Flaky Spark tests due to initializationError [iceberg]
manuzhang commented on issue #12009: URL: https://github.com/apache/iceberg/issues/12009#issuecomment-2601264016 Failure from another tests https://github.com/apache/iceberg/actions/runs/12838153809/job/35803224974#step:7:3982 ``` TestMigrateTableAction > initializationError FAILED java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:35439 at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:304) at org.eclipse.jetty.server.Server.lambda$doStart$0(Server.java:402) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179) at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) at org.eclipse.jetty.server.Server.doStart(Server.java:398) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) at org.apache.iceberg.rest.RESTCatalogServer.start(RESTCatalogServer.java:116) at org.apache.iceberg.rest.RESTServerExtension.beforeAll(RESTServerExtension.java:62) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$13(ClassBasedTestDescriptor.java:396) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllCallbacks(ClassBasedTestDescriptor.java:396) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:212) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:85) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:153) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100) at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:160) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146) at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137) at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143) at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100) at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35) at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.ja
Re: [PR] WIP: Add headers for type/field/schema [iceberg-cpp]
lidavidm commented on PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2601202575 Just to make sure @gaborkaszab @wgtmac: are we ok with the Arrow-style type representation here? (Types are represented by a class hierarchy, erased behind smart pointers; nested types store the fields in the type object, not the field object) The alternatives are: - cuDF style: type objects are minimal, just a type ID, and there is no hierarchy, just the base `DataType` class which is not type-erased and therefore there is no need for a smart pointer. As a trade-off, nested fields have to be extracted from the field and the base DataType has to have fields for all possible parameterized types - arrow-java style: there is still a hierarchy of type objects, but child fields are stored on the Field, not the DataType. Avoids a conceptual dependency cycle between DataType and Field - variant style: like arrow-java, but instead of type erasure and a type hierarchy (or conversely, like cuDF but avoids redundant fields), we have a `std::variant` with all possibilities. Avoids boxing but the object is larger (in e.g. a vector) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] WIP: Add headers for type/field/schema [iceberg-cpp]
lidavidm commented on PR #31: URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2601203055 Arrow-style does let you do a bunch of compile-time metaprogramming (e.g. see arrow::TypeTraits) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump arrow-schema from 53.3.0 to 53.4.0 [iceberg-rust]
Xuanwo merged PR #900: URL: https://github.com/apache/iceberg-rust/pull/900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump opendal from 0.51.0 to 0.51.1 [iceberg-rust]
Xuanwo commented on PR #898: URL: https://github.com/apache/iceberg-rust/pull/898#issuecomment-2601205731 Thank you @kevinjqliu for reviewing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump opendal from 0.51.0 to 0.51.1 [iceberg-rust]
Xuanwo merged PR #898: URL: https://github.com/apache/iceberg-rust/pull/898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]
manuzhang commented on PR #11824: URL: https://github.com/apache/iceberg/pull/11824#issuecomment-2601217390 @nastra @Fokko please help review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump async-trait from 0.1.84 to 0.1.85 [iceberg-rust]
Xuanwo merged PR #897: URL: https://github.com/apache/iceberg-rust/pull/897 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump aws-sdk-s3tables from 1.3.0 to 1.4.0 [iceberg-rust]
Xuanwo merged PR #899: URL: https://github.com/apache/iceberg-rust/pull/899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Delete orphan files [iceberg-python]
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2601440980 Hello @ndrluis I think #1285 is now merged can I start working on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] chore(deps): Bump arrow-arith from 53.3.0 to 53.4.0 [iceberg-rust]
liurenjie1024 commented on PR #901: URL: https://github.com/apache/iceberg-rust/pull/901#issuecomment-2601470708 I think we should skip this upgrade as it requires upgrading msrv? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] test: Introduce datafusion engine for executing sqllogictest. [iceberg-rust]
liurenjie1024 commented on PR #895: URL: https://github.com/apache/iceberg-rust/pull/895#issuecomment-2601473466 > This follows the [datafusion/datafusion/sqllogictest](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs#L122-L124) tests suite. Do you think its worth mentioning that [conversion.rs](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/src/engines/conversion.rs#L26) and [normalize.rs](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/src/engines/datafusion_engine/normalize.rs) are both copied over from datafusion/datafusion/sqllogictest? I've synced with @Fokko and I'll add some notice in LICENSE as what we did in [pyiceberg](https://github.com/apache/iceberg-python/blob/f0346472e4301f2ea3679e0793bb8623f2bb80f1/LICENSE#L206). > Is it possible to take [datafusion_sqllogictest](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/Cargo.toml#L33) as a dependency instead of copying over the code? Currently impossible because it's not published as crates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]
ZENOTME commented on PR #882: URL: https://github.com/apache/iceberg-rust/pull/882#issuecomment-2601477027 > I think it's a step moving forward, but I think this pr didn't handle nested struct type well, see https://github.com/apache/iceberg-rust/issues/405 Hi @liurenjie1024, could you elaborate which part this PR miss? This PR is not intent to complete #405. It only support nest type but not the projected nested filed of structs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]
ZENOTME commented on code in PR #882: URL: https://github.com/apache/iceberg-rust/pull/882#discussion_r1921861153 ## crates/iceberg/src/arrow/schema.rs: ## @@ -43,7 +43,9 @@ use crate::spec::{ use crate::{Error, ErrorKind}; /// When iceberg map type convert to Arrow map type, the default map field name is "key_value". -pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; +pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; Review Comment: The reason we need to make them public is to construct the meta of the write record batch like: https://github.com/apache/iceberg-rust/blob/c44311aa88c505ce8ddce22bed7448a77213e563/crates/integration_tests/tests/scan_all_type.rs#L277. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]
ZENOTME commented on code in PR #882: URL: https://github.com/apache/iceberg-rust/pull/882#discussion_r1921861930 ## crates/iceberg/src/spec/datatypes.rs: ## @@ -226,8 +228,10 @@ pub enum PrimitiveType { /// Timestamp in microsecond precision, with timezone Timestamptz, /// Timestamp in nanosecond precision, without timezone +#[serde(rename = "timestamp_ns")] Review Comment: [scan_all_type.rs](https://github.com/apache/iceberg-rust/pull/882/files/c44311aa88c505ce8ddce22bed7448a77213e563#diff-c15373e1fa9c9e75b55e5e7e6677b5bb2595974062520e1ae3325205c65802ee) find this bug and I fix it here. I can separate it out of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org