date:20250119

Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] closed pull request #11728: Spark3.4,3.5,Api,Hive: Fix 
using NullType in View.
URL: https://github.com/apache/iceberg/pull/11728


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] commented on PR #11824:
URL: https://github.com/apache/iceberg/pull/11824#issuecomment-2601090276

   This pull request has been marked as stale due to 30 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@iceberg.apache.org list. Thank you for your 
contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Allow adding synthetic partition for existing data in table [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] closed issue #10658: Allow adding synthetic partition for 
existing data in table
URL: https://github.com/apache/iceberg/issues/10658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] MergeSchema doesn't work if missing columns is used for Write Ordering. [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] commented on issue #10751:
URL: https://github.com/apache/iceberg/issues/10751#issuecomment-2601090164

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] uppercase table name not supported [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] commented on issue #10758:
URL: https://github.com/apache/iceberg/issues/10758#issuecomment-2601090182

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Allow adding synthetic partition for existing data in table [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] commented on issue #10658:
URL: https://github.com/apache/iceberg/issues/10658#issuecomment-2601090122

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark3.4,3.5,Api,Hive: Fix using NullType in View. [iceberg]

2025-01-19 Thread via GitHub



github-actions[bot] commented on PR #11728:
URL: https://github.com/apache/iceberg/pull/11728#issuecomment-2601090226

   This pull request has been closed due to lack of activity. This is not a 
judgement on the merit of the PR in any way. It is just a way of keeping the PR 
queue manageable. If you think that is incorrect, or the pull request requires 
review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] ManifestReader is not properly closed in BaseTableScan [iceberg]

2025-01-19 Thread via GitHub



maswin commented on issue #104:
URL: https://github.com/apache/iceberg/issues/104#issuecomment-2601106962

   We even see this in `1.4.1` version
   
   ```
   2025-01-14T20:42:05.211Z WARNFinalizer   
org.apache.iceberg.hadoop.HadoopStreams Unclosed output stream created by:

org.apache.iceberg.hadoop.HadoopStreams$HadoopPositionOutputStream.(HadoopStreams.java:152)
org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:66)

org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)

org.apache.iceberg.avro.AvroFileAppender.(AvroFileAppender.java:56)
org.apache.iceberg.avro.Avro$WriteBuilder.build(Avro.java:191)

org.apache.iceberg.ManifestWriter$V1Writer.newAppender(ManifestWriter.java:315)
org.apache.iceberg.ManifestWriter.(ManifestWriter.java:58)
org.apache.iceberg.ManifestWriter.(ManifestWriter.java:34)

org.apache.iceberg.ManifestWriter$V1Writer.(ManifestWriter.java:293)
org.apache.iceberg.ManifestFiles.write(ManifestFiles.java:166)

org.apache.iceberg.SnapshotProducer.newManifestWriter(SnapshotProducer.java:529)

org.apache.iceberg.MergingSnapshotProducer$DataFileMergeManager.newManifestWriter(MergingSnapshotProducer.java:1082)

org.apache.iceberg.ManifestMergeManager.createManifest(ManifestMergeManager.java:171)

org.apache.iceberg.ManifestMergeManager.lambda$mergeGroup$1(ManifestMergeManager.java:156)
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)

java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)

java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)

java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base/java.lang.Thread.run(Thread.java:1583)
   ```
   
   ```
   2025-01-14T20:41:49.408Z WARNFinalizer   
org.apache.iceberg.hadoop.HadoopStreams Unclosed input stream created by:

org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.(HadoopStreams.java:91)
org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:55)

org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:183)

org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100)
org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76)

org.apache.iceberg.io.CloseableIterable$7$1.(CloseableIterable.java:188)

org.apache.iceberg.io.CloseableIterable$7.iterator(CloseableIterable.java:187)

org.apache.iceberg.ManifestMergeManager.createManifest(ManifestMergeManager.java:176)

org.apache.iceberg.ManifestMergeManager.lambda$mergeGroup$1(ManifestMergeManager.java:156)
org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:69)
org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)

java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)

java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)

java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base/java.lang.Thread.run(Thread.java:1583)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.assertj:assertj-core from 3.27.2 to 3.27.3 [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #12002:
URL: https://github.com/apache/iceberg/pull/12002#issuecomment-2600973267

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump io.netty:netty-buffer from 4.1.116.Final to 4.1.117.Final [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #11999:
URL: https://github.com/apache/iceberg/pull/11999#issuecomment-2600973211

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.apache.datasketches:datasketches-java from 6.1.1 to 6.2.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #12000:
URL: https://github.com/apache/iceberg/pull/12000#issuecomment-2600973225

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.47.2.0 to 3.48.0.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #12001:
URL: https://github.com/apache/iceberg/pull/12001#issuecomment-2600973240

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.52.0 to 26.53.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #12003:
URL: https://github.com/apache/iceberg/pull/12003#issuecomment-2600973288

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.29.50 to 2.30.2 [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on PR #11998:
URL: https://github.com/apache/iceberg/pull/11998#issuecomment-2600973190

   @dependabot rebase


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump boto3 from 1.35.93 to 1.36.1 [iceberg-python]

2025-01-19 Thread via GitHub



Fokko merged PR #1536:
URL: https://github.com/apache/iceberg-python/pull/1536


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]

2025-01-19 Thread via GitHub



shiv-io commented on PR #1242:
URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-260097

   Makes sense, @sungwy -- thanks! Added the test, let me know if that looks 
good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] please add requires-python to pyproject.toml [iceberg-rust]

2025-01-19 Thread via GitHub



kevinjqliu commented on issue #896:
URL: https://github.com/apache/iceberg-rust/issues/896#issuecomment-2600976947

   Good idea! 
   This applies to 
[bindings/python/pyproject.toml](https://github.com/apache/iceberg-rust/blob/main/bindings/python/pyproject.toml)
 which we use for `pyiceberg_core`. 
   And it should stay in sync with pyiceberg's python versions 
https://github.com/apache/iceberg-python/blob/fa1bd85ee83a2de13eaaad91abc40ca83eae6c4e/pyproject.toml#L52


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

2025-01-19 Thread via GitHub

kevinjqliu commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2600991160

Thanks @mattmartin14 for the PR! And thanks @bitsondatadev on the tips on
working in OSS. I certainly had to learn a lot of these over the years.

A couple things I think we can address first.

1. Support for MERGE INTO / Upsert

This has been a much anticipated and asked feature in the community. Issue
#402 has been tracking it with many eyes on it. I think we still need to figure
out the best approach to support this feature.

Like you mentioned in the description, `MERGE INTO` is a query engine
feature. Pyiceberg itself is a client library to support the Iceberg python
ecosystem. Pyiceberg aims to provide the necessary Iceberg building blocks so
that other engines/programs can interact with Iceberg tables easily.

As we’re building out more of more engine-like features, it becomes harder
to support more complex and data-intensive workloads such as MERGE INTO. We
have been able to use pyarrow for query processing but it has its own
limitations. For more compute intensive workloads, such as Bucket and Truncate
transform, we were able to leverage rust (iceberg-rust) to handle the
computation.

Looking at #402, I don’t see any concrete plans on how we can support MERGE
INTO. I’ve added this as an agenda on the [monthly pyiceberg
sync](https://docs.google.com/document/d/1oMKodaZJrOJjPfc8PDVAoTdl02eGQKHlhwuggiw7s9U/edit?tab=t.0#heading=h.rxx2wa3o215y)
and will post the update. Please join us if you have time!

2. Taking on Datafusion as a dependency

I’m very interested in exploring datafusion and ways we can leverage it for
this project. As I mentioned above, we currently use pyarrow to handle most of
the compute. It’ll be interesting to evaluate datafusion as an alternative.
Datafusion has its own ecosystem of expression api, dataframe api, and runtime.
All of which are good complements to pyiceberg. It has integrations with the
rust side as well, something I have started exploring in
https://github.com/apache/iceberg-rust/issues/865

That said, I think we need a wider discussion and alignment on how to
integrate with datafusion. It’s a good time to start thinking about it! I’ve
added this as another discussion item on the monthly sync.

3. Performance concerns

Compute intensive workloads are generally a bottleneck in python. I am
excited for future pyiceberg <> iceberg-rust integration where we can leverage
rust to perform those computations.

> The composite key code builds an overwrite filter, and once that filter
gets too lengthy (in my testing more than 200 rows), the visitor “OR” function
in pyiceberg hits a recursion depth error.

This is an interesting observation and I think I’ve seen someone else run
into this issue before. We’d want to address this separately. This is something
we might want to explore using datafusion’s expression api to replace our own
parser.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



kevinjqliu commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921620305


##
mkdocs/docs/configuration.md:
##
@@ -195,6 +198,86 @@ PyIceberg uses 
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines file 
paths for a table's data. PyIceberg

Review Comment:
   ```suggestion
   Apache Iceberg uses the concept of a `LocationProvider` to manage file paths 
for a table's data. In PyIceberg, the `LocationProvider` module is designed to 
be pluggable, allowing customization for specific use cases. The 
`LocationProvider` for a table can be specified through table properties.
   
   PyIceberg defaults to the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
   which generates file paths that are optimized for object storage
   ```



##
mkdocs/docs/configuration.md:
##
@@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key| Options   | 
Default | Description   
  |
-| -- | - | 
--- | 
---
 |
-| `write.parquet.compression-codec`  | `{uncompressed,zstd,gzip,snappy}` | 
zstd| Sets the Parquet compression coddec.  
  |
-| `write.parquet.compression-level`  | Integer   | 
null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
-| `write.parquet.row-group-limit`| Number of rows| 
1048576 | The upper bound of the number of entries within a single row group
  |
-| `write.parquet.page-size-bytes`| Size in bytes | 
1MB | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit` | Number of rows| 
2   | Set a target threshold for the maximum number of rows within a column 
chunk |
-| `write.parquet.dict-size-bytes`| Size in bytes | 
2MB | Set the dictionary page size limit per row group  
  |
-| `write.metadata.previous-versions-max` | Integer   | 
100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| Key  | Options   
| Default | Description 
|
+|--|---|-|-|
+| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` 
| zstd| Sets the Parquet compression coddec.
|
+| `write.parquet.compression-level`| Integer   
| null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
+| `write.parquet.row-group-limit`  | Number of rows
| 1048576 | The upper bound of the number of entries within a single row group  
|
+| `write.parquet.page-size-bytes`  | Size in bytes 
| 1MB | Set a target threshold for the approximate encoded size of data 
pages within a column chunk |
+| `write.parquet.page-row-limit`   | Number of rows
| 2   | Set a target threshold for the maximum number of rows within a 
column chunk |
+| `write.parquet.dict-size-bytes`  | Size in bytes 
| 2MB | Set the dictionary page size limit per row group
|
+| `write.metadata.previous-versions-max`   | Integer   
| 100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| `write.object-storage.enabled`   | Boolean   
| True| Enables the 
[ObjectStoreLocationProvider](configuration.md#objectsto

Re: [PR] Refactor to write APIs to default to `main` branch [iceberg-python]

2025-01-19 Thread via GitHub



kevinjqliu closed pull request #312: Refactor to write APIs to default to 
`main` branch 
URL: https://github.com/apache/iceberg-python/pull/312


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] feat(catalog): Have Load use "type" property and "name" for config [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade opened a new pull request, #260:
URL: https://github.com/apache/iceberg-go/pull/260

   As brought up in 
https://github.com/apache/iceberg-go/pull/244#discussion_r1911257805 this PR 
implements using a "type" property when loading catalogs and looking up catalog 
configurations using the provided "name", only using the uri scheme as a 
fallback when necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.32.8 to 1.33.0 [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade merged PR #259:
URL: https://github.com/apache/iceberg-go/pull/259


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.72.2 to 1.73.2 [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade merged PR #255:
URL: https://github.com/apache/iceberg-go/pull/255


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump google.golang.org/api from 0.216.0 to 0.217.0 [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade merged PR #257:
URL: https://github.com/apache/iceberg-go/pull/257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.105.1 to 1.105.3 [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade merged PR #258:
URL: https://github.com/apache/iceberg-go/pull/258


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.28.11 to 1.29.1 [iceberg-go]

2025-01-19 Thread via GitHub



zeroshade merged PR #256:
URL: https://github.com/apache/iceberg-go/pull/256


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Add Python version 3.13 to test matrix. [iceberg-python]

2025-01-19 Thread via GitHub



kevinjqliu commented on PR #1377:
URL: https://github.com/apache/iceberg-python/pull/1377#issuecomment-2601016186

   Blocked on Ray 3.13 https://github.com/ray-project/ray/issues/49738
   
   We can run `poetry update` after 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Build: Nighly build for Iceberg REST fixtures [iceberg]

2025-01-19 Thread via GitHub



Fokko opened a new pull request, #12008:
URL: https://github.com/apache/iceberg/pull/12008

   While trying to downstream V3 into PyIceberg/Iceberg-Rust/etc, I think it 
would be good to rebuild the REST fixtures every night.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]

2025-01-19 Thread via GitHub



kevinjqliu commented on code in PR #12008:
URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921636817


##
.github/workflows/publish-iceberg-rest-fixture-docker.yml:
##
@@ -20,9 +20,8 @@
 name: Build and Push 'iceberg-rest-fixture' Docker Image
 
 on:
-  push:
-tags:
-  - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'

Review Comment:
   let's keep both, we still need the tag for 
   
https://github.com/apache/iceberg/pull/12008/files#diff-24cba6867a8ec1ac782a5dbfb5ec71f84ed7377564afeada5661647f2b480879L47-L51
   
   This way we have a tag for each release and another for `latest` 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] [infra] regenerate poetry lock [iceberg-python]

2025-01-19 Thread via GitHub



kevinjqliu opened a new pull request, #1538:
URL: https://github.com/apache/iceberg-python/pull/1538

   Since we bumped Poetry to `2.0.1` in #1525, we have not regenerated poetry 
lock. Looks like poetry adds a lot of additional information to the lock file. 
Let's regenerate the lock file on `main`.
   
   This PR runs `poetry lock` on a clean install
   ```
   pip uninstall poetry
   make install
   poetry lock
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.assertj:assertj-core from 3.27.2 to 3.27.3 [iceberg]

2025-01-19 Thread via GitHub



Fokko merged PR #12002:
URL: https://github.com/apache/iceberg/pull/12002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.xerial:sqlite-jdbc from 3.47.2.0 to 3.48.0.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko merged PR #12001:
URL: https://github.com/apache/iceberg/pull/12001


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump io.netty:netty-buffer from 4.1.116.Final to 4.1.117.Final [iceberg]

2025-01-19 Thread via GitHub



Fokko merged PR #11999:
URL: https://github.com/apache/iceberg/pull/11999


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump org.apache.datasketches:datasketches-java from 6.1.1 to 6.2.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko merged PR #12000:
URL: https://github.com/apache/iceberg/pull/12000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] [infra] regenerate poetry lock [iceberg-python]

2025-01-19 Thread via GitHub



Fokko merged PR #1538:
URL: https://github.com/apache/iceberg-python/pull/1538


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.52.0 to 26.53.0 [iceberg]

2025-01-19 Thread via GitHub



Fokko merged PR #12003:
URL: https://github.com/apache/iceberg/pull/12003


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]

2025-01-19 Thread via GitHub



kevinjqliu commented on code in PR #12008:
URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921636978


##
.github/workflows/publish-iceberg-rest-fixture-docker.yml:
##
@@ -20,9 +20,8 @@
 name: Build and Push 'iceberg-rest-fixture' Docker Image
 
 on:
-  push:
-tags:
-  - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'

Review Comment:
   we dont have any tag releases yet since this code was added after the last 
tag https://github.com/apache/iceberg/tags 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651044


##
mkdocs/docs/configuration.md:
##
@@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key| Options   | 
Default | Description   
  |
-| -- | - | 
--- | 
---
 |
-| `write.parquet.compression-codec`  | `{uncompressed,zstd,gzip,snappy}` | 
zstd| Sets the Parquet compression coddec.  
  |
-| `write.parquet.compression-level`  | Integer   | 
null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
-| `write.parquet.row-group-limit`| Number of rows| 
1048576 | The upper bound of the number of entries within a single row group
  |
-| `write.parquet.page-size-bytes`| Size in bytes | 
1MB | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit` | Number of rows| 
2   | Set a target threshold for the maximum number of rows within a column 
chunk |
-| `write.parquet.dict-size-bytes`| Size in bytes | 
2MB | Set the dictionary page size limit per row group  
  |
-| `write.metadata.previous-versions-max` | Integer   | 
100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| Key  | Options   
| Default | Description 
|
+|--|---|-|-|
+| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` 
| zstd| Sets the Parquet compression coddec.
|
+| `write.parquet.compression-level`| Integer   
| null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
+| `write.parquet.row-group-limit`  | Number of rows
| 1048576 | The upper bound of the number of entries within a single row group  
|
+| `write.parquet.page-size-bytes`  | Size in bytes 
| 1MB | Set a target threshold for the approximate encoded size of data 
pages within a column chunk |
+| `write.parquet.page-row-limit`   | Number of rows
| 2   | Set a target threshold for the maximum number of rows within a 
column chunk |
+| `write.parquet.dict-size-bytes`  | Size in bytes 
| 2MB | Set the dictionary page size limit per row group
|
+| `write.metadata.previous-versions-max`   | Integer   
| 100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| `write.object-storage.enabled`   | Boolean   
| True| Enables the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) 
that adds a hash component to file paths|
+| `write.object-storage.partitioned-paths` | Boolean   
| True| Controls whether [partition values are included in file 
paths](configuration.md#partition-exclusion) when object storage is enabled |
+| `write.py-location-provider.impl`| String of form `module.ClassName` 
| null| Optional, [custom 
LocationProvider](configuration.md#loading-a-custom-locationprovider) 
implementation  |

Review Comment:
   https://github.com/user-attachments/assets/ff4a70f9-ad5d-4d78-a665-5bd7b6283c7c";
 />

   Don't love how this looks. I prefer what it is now:
   
   https://github.com/user-attachments/assets/d1789914-b8d6-449d-8700-4ebee432f5ef";
 />
   
   I've changed the section li

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651470


##
mkdocs/docs/configuration.md:
##
@@ -195,6 +198,86 @@ PyIceberg uses 
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines file 
paths for a table's data. PyIceberg
+introduces a pluggable LocationProvider module; the LocationProvider used may 
be specified on a per-table basis via
+table properties. PyIceberg defaults to the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
+which generates file paths that are optimized for object storage.
+
+### SimpleLocationProvider
+
+The SimpleLocationProvider places file names underneath a `data` directory in 
the table's storage location. For example,
+a non-partitioned table might have a data file with location:
+
+```txt
+s3://bucket/ns/table/data/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet
+```
+
+When data is partitioned, files under a given partition are grouped into a 
subdirectory, with that partition key

Review Comment:
   Hopefully 
[this](https://github.com/apache/iceberg-python/pull/1537/commits/76f397b35abaa1555ede59ad5c5a4fce8c5f1374#diff-497e037708cc64870c6ba9372f6064a69ca1e74d65d6195dcee5a44851e8b47dR221)
 and 
[this](https://github.com/apache/iceberg-python/pull/1537/commits/76f397b35abaa1555ede59ad5c5a4fce8c5f1374#diff-497e037708cc64870c6ba9372f6064a69ca1e74d65d6195dcee5a44851e8b47dR241)
 is wha you meant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921651630


##
mkdocs/docs/configuration.md:
##
@@ -195,6 +198,86 @@ PyIceberg uses 
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines file 
paths for a table's data. PyIceberg
+introduces a pluggable LocationProvider module; the LocationProvider used may 
be specified on a per-table basis via
+table properties. PyIceberg defaults to the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider),
+which generates file paths that are optimized for object storage.
+
+### SimpleLocationProvider
+
+The SimpleLocationProvider places file names underneath a `data` directory in 
the table's storage location. For example,
+a non-partitioned table might have a data file with location:
+
+```txt
+s3://bucket/ns/table/data/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet
+```
+
+When data is partitioned, files under a given partition are grouped into a 
subdirectory, with that partition key
+and value as the directory name. For example, a table partitioned over a 
string column `category` might have a data file
+with location:
+
+```txt
+s3://bucket/ns/table/data/category=orders/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet
+```
+
+The SimpleLocationProvider is enabled for a table by explicitly setting its 
`write.object-storage.enabled` table
+property to `False`.
+
+### ObjectStoreLocationProvider
+
+When several files are stored under the same prefix, cloud object stores such 
as S3 often [throttle requests on 
prefixes](https://repost.aws/knowledge-center/http-5xx-errors-s3),
+resulting in slowdowns.
+
+The ObjectStoreLocationProvider counteracts this by injecting deterministic 
hashes, in the form of binary directories,
+into file paths, to distribute files across a larger number of object store 
prefixes.
+
+Paths contain partitions just before the file name and a `data` directory 
beneath the table's location, in a similar
+manner to the 
[SimpleLocationProvider](configuration.md#simplelocationprovider). For example, 
a table partitioned over a string
+column `category` might have a data file with location: (note the additional 
binary directories)
+
+```txt
+s3://bucket/ns/table/data/0101/0110/1001/10110010/category=orders/-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet
+```
+
+The `write.object-storage.enabled` table property determines whether the 
ObjectStoreLocationProvider is enabled for a
+table. It is used by default.
+
+ Partition Exclusion
+
+When the ObjectStoreLocationProvider is used, the table property 
`write.object-storage.partitioned-paths`, which
+defaults to `True`, can be set to `False` as an additional optimization for 
object stores. This omits partition keys and
+values from data file paths *entirely* to further reduce key size. With it 
disabled, the same data file above would
+instead be written to: (note the absence of `category=orders`)
+
+```txt
+s3://bucket/ns/table/data/1101/0100/1011/00111010-0-0-5affc076-96a4-48f2-9cd2-d5efbc9f0c94-1.parquet
+```

Review Comment:
   I have the False case just above ("the same data file above" here) - or do 
you mean making that more explicit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652049


##
mkdocs/docs/configuration.md:
##
@@ -195,6 +198,86 @@ PyIceberg uses 
[S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
 
 
 
+## Location Providers
+
+Iceberg works with the concept of a LocationProvider that determines file 
paths for a table's data. PyIceberg

Review Comment:
   I've changed to backticks around `LocationProvider` and its implementations 
throughout. I keep them as Location Provider (e.g. Object Store Location 
Provider, without backticks) in section headings though for readability (to not 
have code-like terms in headings).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652324


##
mkdocs/docs/configuration.md:
##
@@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key| Options   | 
Default | Description   
  |
-| -- | - | 
--- | 
---
 |
-| `write.parquet.compression-codec`  | `{uncompressed,zstd,gzip,snappy}` | 
zstd| Sets the Parquet compression coddec.  
  |
-| `write.parquet.compression-level`  | Integer   | 
null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
-| `write.parquet.row-group-limit`| Number of rows| 
1048576 | The upper bound of the number of entries within a single row group
  |
-| `write.parquet.page-size-bytes`| Size in bytes | 
1MB | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit` | Number of rows| 
2   | Set a target threshold for the maximum number of rows within a column 
chunk |
-| `write.parquet.dict-size-bytes`| Size in bytes | 
2MB | Set the dictionary page size limit per row group  
  |
-| `write.metadata.previous-versions-max` | Integer   | 
100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| Key  | Options   
| Default | Description 
|
+|--|---|-|-|
+| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` 
| zstd| Sets the Parquet compression coddec.
|
+| `write.parquet.compression-level`| Integer   
| null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
+| `write.parquet.row-group-limit`  | Number of rows
| 1048576 | The upper bound of the number of entries within a single row group  
|
+| `write.parquet.page-size-bytes`  | Size in bytes 
| 1MB | Set a target threshold for the approximate encoded size of data 
pages within a column chunk |
+| `write.parquet.page-row-limit`   | Number of rows
| 2   | Set a target threshold for the maximum number of rows within a 
column chunk |
+| `write.parquet.dict-size-bytes`  | Size in bytes 
| 2MB | Set the dictionary page size limit per row group
|
+| `write.metadata.previous-versions-max`   | Integer   
| 100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| `write.object-storage.enabled`   | Boolean   
| True| Enables the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) 
that adds a hash component to file paths|
+| `write.object-storage.partitioned-paths` | Boolean   
| True| Controls whether [partition values are included in file 
paths](configuration.md#partition-exclusion) when object storage is enabled |
+| `write.py-location-provider.impl`| String of form `module.ClassName` 
| null| Optional, [custom 
LocationProvider](configuration.md#loading-a-custom-locationprovider) 
implementation  |

Review Comment:
   (The above screenshot also shows how code/backticks hyperlinks look, I think 
they're fine. This is now relevant because of 
https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652049.



-- 
This is an automated message from the Apache Git Service.
To respo

Re: [PR] Docs: Location Provider Documentation [iceberg-python]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #1537:
URL: https://github.com/apache/iceberg-python/pull/1537#discussion_r1921652515


##
mkdocs/docs/configuration.md:
##
@@ -54,15 +54,18 @@ Iceberg tables support table properties to configure table 
behavior.
 
 ### Write options
 
-| Key| Options   | 
Default | Description   
  |
-| -- | - | 
--- | 
---
 |
-| `write.parquet.compression-codec`  | `{uncompressed,zstd,gzip,snappy}` | 
zstd| Sets the Parquet compression coddec.  
  |
-| `write.parquet.compression-level`  | Integer   | 
null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
-| `write.parquet.row-group-limit`| Number of rows| 
1048576 | The upper bound of the number of entries within a single row group
  |
-| `write.parquet.page-size-bytes`| Size in bytes | 
1MB | Set a target threshold for the approximate encoded size of data pages 
within a column chunk |
-| `write.parquet.page-row-limit` | Number of rows| 
2   | Set a target threshold for the maximum number of rows within a column 
chunk |
-| `write.parquet.dict-size-bytes`| Size in bytes | 
2MB | Set the dictionary page size limit per row group  
  |
-| `write.metadata.previous-versions-max` | Integer   | 
100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| Key  | Options   
| Default | Description 
|
+|--|---|-|-|
+| `write.parquet.compression-codec`| `{uncompressed,zstd,gzip,snappy}` 
| zstd| Sets the Parquet compression coddec.
|
+| `write.parquet.compression-level`| Integer   
| null| Parquet compression level for the codec. If not set, it is up to 
PyIceberg  |
+| `write.parquet.row-group-limit`  | Number of rows
| 1048576 | The upper bound of the number of entries within a single row group  
|
+| `write.parquet.page-size-bytes`  | Size in bytes 
| 1MB | Set a target threshold for the approximate encoded size of data 
pages within a column chunk |
+| `write.parquet.page-row-limit`   | Number of rows
| 2   | Set a target threshold for the maximum number of rows within a 
column chunk |
+| `write.parquet.dict-size-bytes`  | Size in bytes 
| 2MB | Set the dictionary page size limit per row group
|
+| `write.metadata.previous-versions-max`   | Integer   
| 100 | The max number of previous version metadata files to keep before 
deleting after commit. |
+| `write.object-storage.enabled`   | Boolean   
| True| Enables the 
[ObjectStoreLocationProvider](configuration.md#objectstorelocationprovider) 
that adds a hash component to file paths|
+| `write.object-storage.partitioned-paths` | Boolean   
| True| Controls whether [partition values are included in file 
paths](configuration.md#partition-exclusion) when object storage is enabled |

Review Comment:
   I've checked all hyperlinks on the current version and they work as intended



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-

Re: [PR] Core: List namespaces/tables when testing identifier with a dot [iceberg]

2025-01-19 Thread via GitHub



smaheshwar-pltr commented on code in PR #11991:
URL: https://github.com/apache/iceberg/pull/11991#discussion_r1921682719


##
open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java:
##
@@ -90,4 +90,10 @@ protected boolean overridesRequestedLocation() {
 RESTCompatibilityKitSuite.RCK_OVERRIDES_REQUESTED_LOCATION,
 false);
   }
+
+  @Override
+  protected boolean supportsNamesWithDot() {
+// underlying JDBC catalog doesn't support namespaces with a dot
+return false;

Review Comment:
   (Semi-nit) The REST catalog that people test with the RCK *could* support 
namespaces with a dot though, right? Should we maybe make this configurable via
   
   ```java
   return PropertyUtil.propertyAsBoolean(
   restCatalog.properties(),
   RESTCompatibilityKitSuite.RCK_SUPPORTS_NAMES_WITH_DOT,
   false);
   ```
   
   like elsewhere in this class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Flaky test `TestRewritePositionDeleteFilesAction` initializationError [iceberg]

2025-01-19 Thread via GitHub



manuzhang opened a new issue, #12009:
URL: https://github.com/apache/iceberg/issues/12009

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   https://github.com/apache/iceberg/actions/runs/12857234112/job/35845760586
   ```
   TestRewritePositionDeleteFilesAction > initializationError FAILED
   java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:44503
   at 
org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344)
   at 
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:304)
   at org.eclipse.jetty.server.Server.lambda$doStart$0(Server.java:402)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
   at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
   at 
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
   at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
   at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
   at org.eclipse.jetty.server.Server.doStart(Server.java:398)
   at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
   at 
org.apache.iceberg.rest.RESTCatalogServer.start(RESTCatalogServer.java:116)
   at 
org.apache.iceberg.rest.RESTServerExtension.beforeAll(RESTServerExtension.java:62)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$13(ClassBasedTestDescriptor.java:396)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllCallbacks(ClassBasedTestDescriptor.java:396)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:212)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:85)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:153)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146)
   at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100)
   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
   at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:160)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146)
   at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100)
   at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
   at 
org.junit.platform.engine.

Re: [PR] Core: List namespaces/tables when testing identifier with a dot [iceberg]

2025-01-19 Thread via GitHub



nastra commented on code in PR #11991:
URL: https://github.com/apache/iceberg/pull/11991#discussion_r1921923905


##
open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java:
##
@@ -90,4 +90,10 @@ protected boolean overridesRequestedLocation() {
 RESTCompatibilityKitSuite.RCK_OVERRIDES_REQUESTED_LOCATION,
 false);
   }
+
+  @Override
+  protected boolean supportsNamesWithDot() {
+// underlying JDBC catalog doesn't support namespaces with a dot
+return false;

Review Comment:
   fair point, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]

2025-01-19 Thread via GitHub



nastra merged PR #11824:
URL: https://github.com/apache/iceberg/pull/11824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: add apache amoro(incubating) with iceberg (#11965) [iceberg]

2025-01-19 Thread via GitHub



nastra commented on code in PR #11966:
URL: https://github.com/apache/iceberg/pull/11966#discussion_r1921916823


##
docs/docs/amoro.md:
##
@@ -0,0 +1,90 @@
+---
+title: "Apache Amoro"
+---
+
+
+# Apache Amoro With Iceberg
+
+**[Apache Amoro(incubating)](https://amoro.apache.org/docs/latest/)** is a 
Lakehouse management system built on open data lake formats. Working with 
compute engines including Flink, Spark, and Trino, Amoro brings pluggable and
+**[Table Maintenance](https://amoro.apache.org/docs/latest/self-optimizing/)** 
features for Lakehouse to provide out-of-the-box data warehouse experience, and 
helps data platforms or products easily build infra-decoupled, 
stream-and-batch-fused and lake-native architecture.
+
+
+# Auto Self-optimizing
+
+Lakehouse is characterized by its openness and loose coupling, with data and 
files maintained by users through various engines. While this
+architecture appears to be well-suited for T+1 scenarios, as more attention is 
paid to applying Lakehouse to streaming data warehouses and real-time
+analysis scenarios, challenges arise. For example:
+
+- Streaming writes bring a massive amount of fragment files
+- CDC ingestion and streaming updates generate excessive redundant data
+- Using the new data lake format leads to orphan files and expired snapshots.
+
+These issues can significantly affect the performance and cost of data 
analysis. Therefore, Amoro has introduced a Self-optimizing mechanism to
+create an out-of-the-box Streaming Lakehouse management service that is as 
user-friendly as a traditional database or data warehouse. The new table
+format is used for this purpose. Self-optimizing involves various procedures 
such as file compaction, deduplication, and sorting.
+
+The architecture and working mechanism of Self-optimizing are shown in the 
figure below:
+
+![Self-optimizing 
architecture](https://github.com/apache/amoro/blob/master/docs/images/concepts/self-optimizing_arch.png)
+
+The Optimizer is a component responsible for executing Self-optimizing tasks. 
It is a resident process managed by AMS. AMS is responsible for

Review Comment:
   I don't see a mention that AMS stands for Amoro Meta Store in the text. 
Could you please add that to the first time where AMS is mentioned?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Docs: add apache amoro(incubating) with iceberg (#11965) [iceberg]

2025-01-19 Thread via GitHub



czy006 commented on code in PR #11966:
URL: https://github.com/apache/iceberg/pull/11966#discussion_r1921936611


##
docs/docs/amoro.md:
##
@@ -0,0 +1,90 @@
+---
+title: "Apache Amoro"
+---
+
+
+# Apache Amoro With Iceberg
+
+**[Apache Amoro(incubating)](https://amoro.apache.org/docs/latest/)** is a 
Lakehouse management system built on open data lake formats. Working with 
compute engines including Flink, Spark, and Trino, Amoro brings pluggable and
+**[Table Maintenance](https://amoro.apache.org/docs/latest/self-optimizing/)** 
features for Lakehouse to provide out-of-the-box data warehouse experience, and 
helps data platforms or products easily build infra-decoupled, 
stream-and-batch-fused and lake-native architecture.
+
+
+# Auto Self-optimizing
+
+Lakehouse is characterized by its openness and loose coupling, with data and 
files maintained by users through various engines. While this
+architecture appears to be well-suited for T+1 scenarios, as more attention is 
paid to applying Lakehouse to streaming data warehouses and real-time
+analysis scenarios, challenges arise. For example:
+
+- Streaming writes bring a massive amount of fragment files
+- CDC ingestion and streaming updates generate excessive redundant data
+- Using the new data lake format leads to orphan files and expired snapshots.
+
+These issues can significantly affect the performance and cost of data 
analysis. Therefore, Amoro has introduced a Self-optimizing mechanism to
+create an out-of-the-box Streaming Lakehouse management service that is as 
user-friendly as a traditional database or data warehouse. The new table
+format is used for this purpose. Self-optimizing involves various procedures 
such as file compaction, deduplication, and sorting.
+
+The architecture and working mechanism of Self-optimizing are shown in the 
figure below:
+
+![Self-optimizing 
architecture](https://github.com/apache/amoro/blob/master/docs/images/concepts/self-optimizing_arch.png)
+
+The Optimizer is a component responsible for executing Self-optimizing tasks. 
It is a resident process managed by AMS. AMS is responsible for

Review Comment:
   already add



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on code in PR #12008:
URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921943161


##
.github/workflows/publish-iceberg-rest-fixture-docker.yml:
##
@@ -20,9 +20,8 @@
 name: Build and Push 'iceberg-rest-fixture' Docker Image
 
 on:
-  push:
-tags:
-  - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'

Review Comment:
   Ah, good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Build: Nighly build for Iceberg REST fixtures [iceberg]

2025-01-19 Thread via GitHub



Fokko commented on code in PR #12008:
URL: https://github.com/apache/iceberg/pull/12008#discussion_r1921943661


##
.github/workflows/publish-iceberg-rest-fixture-docker.yml:
##
@@ -20,9 +20,8 @@
 name: Build and Push 'iceberg-rest-fixture' Docker Image
 
 on:
-  push:
-tags:
-  - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'
+  schedule:
+- cron: '0 2 * * *' # run at 2 AM UTC

Review Comment:
   ```suggestion
 push:
   tags:
 - 'apache-iceberg-[0-9]+.[0-9]+.[0-9]+'
 schedule:
   - cron: '0 2 * * *' # run at 2 AM UTC
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Flaky Spark tests due to initializationError [iceberg]

2025-01-19 Thread via GitHub



manuzhang commented on issue #12009:
URL: https://github.com/apache/iceberg/issues/12009#issuecomment-2601264016

   Failure from another tests 
https://github.com/apache/iceberg/actions/runs/12838153809/job/35803224974#step:7:3982
   
   ```
   TestMigrateTableAction > initializationError FAILED
   java.io.IOException: Failed to bind to 0.0.0.0/0.0.0.0:35439
   at 
org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:344)
   at 
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:304)
   at org.eclipse.jetty.server.Server.lambda$doStart$0(Server.java:402)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   at 
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
   at 
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
   at 
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024)
   at 
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
   at 
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   at 
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   at 
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   at 
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
   at org.eclipse.jetty.server.Server.doStart(Server.java:398)
   at 
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
   at 
org.apache.iceberg.rest.RESTCatalogServer.start(RESTCatalogServer.java:116)
   at 
org.apache.iceberg.rest.RESTServerExtension.beforeAll(RESTServerExtension.java:62)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllCallbacks$13(ClassBasedTestDescriptor.java:396)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeBeforeAllCallbacks(ClassBasedTestDescriptor.java:396)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:212)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.before(ClassBasedTestDescriptor.java:85)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:153)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146)
   at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100)
   at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
   at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:160)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:146)
   at 
org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:144)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:143)
   at 
org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:100)
   at 
org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
   at 
org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.ja

Re: [PR] WIP: Add headers for type/field/schema [iceberg-cpp]

2025-01-19 Thread via GitHub



lidavidm commented on PR #31:
URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2601202575

   Just to make sure @gaborkaszab @wgtmac: are we ok with the Arrow-style type 
representation here? (Types are represented by a class hierarchy, erased behind 
smart pointers; nested types store the fields in the type object, not the field 
object)
   
   The alternatives are:
   - cuDF style: type objects are minimal, just a type ID, and there is no 
hierarchy, just the base `DataType` class which is not type-erased and 
therefore there is no need for a smart pointer. As a trade-off, nested fields 
have to be extracted from the field and the base DataType has to have fields 
for all possible parameterized types
   - arrow-java style: there is still a hierarchy of type objects, but child 
fields are stored on the Field, not the DataType. Avoids a conceptual 
dependency cycle between DataType and Field
   - variant style: like arrow-java, but instead of type erasure and a type 
hierarchy (or conversely, like cuDF but avoids redundant fields), we have a 
`std::variant` with all possibilities. Avoids boxing but the object is larger 
(in e.g. a vector)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] WIP: Add headers for type/field/schema [iceberg-cpp]

2025-01-19 Thread via GitHub



lidavidm commented on PR #31:
URL: https://github.com/apache/iceberg-cpp/pull/31#issuecomment-2601203055

   Arrow-style does let you do a bunch of compile-time metaprogramming (e.g. 
see arrow::TypeTraits)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump arrow-schema from 53.3.0 to 53.4.0 [iceberg-rust]

2025-01-19 Thread via GitHub



Xuanwo merged PR #900:
URL: https://github.com/apache/iceberg-rust/pull/900


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump opendal from 0.51.0 to 0.51.1 [iceberg-rust]

2025-01-19 Thread via GitHub



Xuanwo commented on PR #898:
URL: https://github.com/apache/iceberg-rust/pull/898#issuecomment-2601205731

   Thank you @kevinjqliu for reviewing this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump opendal from 0.51.0 to 0.51.1 [iceberg-rust]

2025-01-19 Thread via GitHub



Xuanwo merged PR #898:
URL: https://github.com/apache/iceberg-rust/pull/898


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark: Don't skip tests in TestSelect for SparkSessionCatalog [iceberg]

2025-01-19 Thread via GitHub



manuzhang commented on PR #11824:
URL: https://github.com/apache/iceberg/pull/11824#issuecomment-2601217390

   @nastra @Fokko please help review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump async-trait from 0.1.84 to 0.1.85 [iceberg-rust]

2025-01-19 Thread via GitHub



Xuanwo merged PR #897:
URL: https://github.com/apache/iceberg-rust/pull/897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump aws-sdk-s3tables from 1.3.0 to 1.4.0 [iceberg-rust]

2025-01-19 Thread via GitHub



Xuanwo merged PR #899:
URL: https://github.com/apache/iceberg-rust/pull/899


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Delete orphan files [iceberg-python]

2025-01-19 Thread via GitHub



omkenge commented on issue #1200:
URL: 
https://github.com/apache/iceberg-python/issues/1200#issuecomment-2601440980

   Hello @ndrluis 
   I think #1285  is now merged can I start working on this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] chore(deps): Bump arrow-arith from 53.3.0 to 53.4.0 [iceberg-rust]

2025-01-19 Thread via GitHub



liurenjie1024 commented on PR #901:
URL: https://github.com/apache/iceberg-rust/pull/901#issuecomment-2601470708

   I think we should skip this upgrade as it requires upgrading msrv?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] test: Introduce datafusion engine for executing sqllogictest. [iceberg-rust]

2025-01-19 Thread via GitHub



liurenjie1024 commented on PR #895:
URL: https://github.com/apache/iceberg-rust/pull/895#issuecomment-2601473466

   > This follows the 
[datafusion/datafusion/sqllogictest](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs#L122-L124)
 tests suite.
   Do you think its worth mentioning that 
[conversion.rs](https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/src/engines/conversion.rs#L26)
 and 
[normalize.rs](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/src/engines/datafusion_engine/normalize.rs)
 are both copied over from datafusion/datafusion/sqllogictest?
   
   I've synced with @Fokko and I'll add some notice in LICENSE as what we did 
in 
[pyiceberg](https://github.com/apache/iceberg-python/blob/f0346472e4301f2ea3679e0793bb8623f2bb80f1/LICENSE#L206).
   
   > Is it possible to take 
[datafusion_sqllogictest](https://github.com/apache/datafusion/blob/e9a77e0ea3e30b7f2718c9cea1fed023dca1f646/datafusion/sqllogictest/Cargo.toml#L33)
 as a dependency instead of copying over the code?
   
   Currently impossible because it's not published as crates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]

2025-01-19 Thread via GitHub



ZENOTME commented on PR #882:
URL: https://github.com/apache/iceberg-rust/pull/882#issuecomment-2601477027

   > I think it's a step moving forward, but I think this pr didn't handle 
nested struct type well, see https://github.com/apache/iceberg-rust/issues/405
   
Hi @liurenjie1024, could you elaborate which part this PR miss? This PR is 
not intent to complete #405. It only support nest type but not the projected 
nested filed of structs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]

2025-01-19 Thread via GitHub



ZENOTME commented on code in PR #882:
URL: https://github.com/apache/iceberg-rust/pull/882#discussion_r1921861153


##
crates/iceberg/src/arrow/schema.rs:
##
@@ -43,7 +43,9 @@ use crate::spec::{
 use crate::{Error, ErrorKind};
 
 /// When iceberg map type convert to Arrow map type, the default map field 
name is "key_value".
-pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value";
+pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value";

Review Comment:
   The reason we need to make them public is to construct the meta of the write 
record batch like: 
https://github.com/apache/iceberg-rust/blob/c44311aa88c505ce8ddce22bed7448a77213e563/crates/integration_tests/tests/scan_all_type.rs#L277.
 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] feat: support scan nested type(struct, map, list) [iceberg-rust]

2025-01-19 Thread via GitHub



ZENOTME commented on code in PR #882:
URL: https://github.com/apache/iceberg-rust/pull/882#discussion_r1921861930


##
crates/iceberg/src/spec/datatypes.rs:
##
@@ -226,8 +228,10 @@ pub enum PrimitiveType {
 /// Timestamp in microsecond precision, with timezone
 Timestamptz,
 /// Timestamp in nanosecond precision, without timezone
+#[serde(rename = "timestamp_ns")]

Review Comment:
   
[scan_all_type.rs](https://github.com/apache/iceberg-rust/pull/882/files/c44311aa88c505ce8ddce22bed7448a77213e563#diff-c15373e1fa9c9e75b55e5e7e6677b5bb2595974062520e1ae3325205c65802ee)
 find this bug and I fix it here. I can separate it out of this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

66 matches

Mail list logo