Re: [I] Cherrypick the data rows [deleted or old values] from a past snapshot [iceberg]

2025-02-14 Thread via GitHub
Shekharrajak commented on issue #12271: URL: https://github.com/apache/iceberg/issues/12271#issuecomment-2660797404 Thanks @RussellSpitzer for sharing! re-adding files will update the manifest and data will be query-able ? Can you please share your solution or APIs ? -- This is an automa

Re: [I] Enhance iceberg-go to Support Nessie API for All Catalog Operations [iceberg-go]

2025-02-14 Thread via GitHub
shubham-tomar commented on issue #291: URL: https://github.com/apache/iceberg-go/issues/291#issuecomment-2660766077 Hi @zeroshade, i am facing issue while creating table This is how i am loading catalog ``` config.URI := "http://localhost:19120/iceberg"; config.WarehouseLocati

Re: [PR] [WIP] Ignore UnknownType in General Parquet Writer [iceberg]

2025-02-14 Thread via GitHub
HonahX commented on code in PR #12177: URL: https://github.com/apache/iceberg/pull/12177#discussion_r1956840017 ## parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java: ## @@ -56,6 +56,10 @@ public class TypeToMessageType { LogicalTypeAnnotation.timesta

Re: [I] [feature] Add all column projection logic [iceberg-python]

2025-02-14 Thread via GitHub
gabeiglio commented on issue #1636: URL: https://github.com/apache/iceberg-python/issues/1636#issuecomment-2660597246 Work on default-values here `[PR](https://github.com/apache/iceberg-python/pull/1644)` -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Discussion: support append files [iceberg-go]

2025-02-14 Thread via GitHub
zeroshade commented on issue #287: URL: https://github.com/apache/iceberg-go/issues/287#issuecomment-2660576094 This is definitely on our roadmap, any assistance would be appreciated. Currently I'm not able to give a direct ETA on this being available yet -- This is an automated message f

Re: [I] Trying to create a table [iceberg-go]

2025-02-14 Thread via GitHub
zeroshade commented on issue #302: URL: https://github.com/apache/iceberg-go/issues/302#issuecomment-2660583912 Hmm, looks like `blobFileIO` is missing `WriteFile(name string, p []byte) error` I fix this tomorrow, simple oversight. -- This is an automated message from the Apache G

Re: [PR] Materialized View Spec [iceberg]

2025-02-14 Thread via GitHub
bennychow commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1956953050 ## format/view-spec.md: ## @@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when

Re: [PR] Materialized View Spec [iceberg]

2025-02-14 Thread via GitHub
bennychow commented on code in PR #11041: URL: https://github.com/apache/iceberg/pull/11041#discussion_r1956953050 ## format/view-spec.md: ## @@ -160,6 +179,56 @@ Each entry in `version-log` is a struct with the following fields: | _required_ | `timestamp-ms` | Timestamp when

Re: [PR] Added description of CLI usage in README [iceberg-go]

2025-02-14 Thread via GitHub
zeroshade commented on code in PR #301: URL: https://github.com/apache/iceberg-go/pull/301#discussion_r1956952796 ## README.md: ## @@ -82,6 +82,42 @@ $ cd iceberg-go/cmd/iceberg && go build . * Plan to add [Apache Arrow](https://pkg.go.dev/github.com/apache/arrow-go/) support

Re: [PR] Core: Try create Iceberg metadata table for Jdbc catalog in initialization [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #11427: Core: Try create Iceberg metadata table for Jdbc catalog in initialization URL: https://github.com/apache/iceberg/pull/11427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] FlinkSchemaUtil.toSchema should return Schema or ResolvedSchema instead of deprecated TableSchema [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] commented on issue #10950: URL: https://github.com/apache/iceberg/issues/10950#issuecomment-2660553581 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Spark: Relativize in-memory paths for data file and rewritable delete file locations [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] commented on PR #11525: URL: https://github.com/apache/iceberg/pull/11525#issuecomment-2660553629 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Core: Try create Iceberg metadata table for Jdbc catalog in initialization [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] commented on PR #11427: URL: https://github.com/apache/iceberg/pull/11427#issuecomment-2660553607 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [PR] Spark: Relativize in-memory paths for data file and rewritable delete file locations [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #11525: Spark: Relativize in-memory paths for data file and rewritable delete file locations URL: https://github.com/apache/iceberg/pull/11525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Spark 3.5: Support RewriteManifestsProcedure with a target size parameter [iceberg]

2025-02-14 Thread via GitHub
github-actions[bot] commented on PR #11959: URL: https://github.com/apache/iceberg/pull/11959#issuecomment-2660553689 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] [WIP] Ignore UnknownType in General Parquet Writer [iceberg]

2025-02-14 Thread via GitHub
HonahX commented on code in PR #12177: URL: https://github.com/apache/iceberg/pull/12177#discussion_r1956840017 ## parquet/src/main/java/org/apache/iceberg/parquet/TypeToMessageType.java: ## @@ -56,6 +56,10 @@ public class TypeToMessageType { LogicalTypeAnnotation.timesta

[I] Problems using rewriteTablePath action on local filesystem tables [iceberg]

2025-02-14 Thread via GitHub
sfc-gh-sozer opened a new issue, #12277: URL: https://github.com/apache/iceberg/issues/12277 ### Apache Iceberg version 1.8.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 I appreciate the new rewriteTablePath method to support migrat

Re: [PR] List data and metadata directories instead of table root [iceberg]

2025-02-14 Thread via GitHub
karuppayya commented on PR #12278: URL: https://github.com/apache/iceberg/pull/12278#issuecomment-2660448261 cc: @aokolnychyi @RussellSpitzer @szehon-ho @anuragmantri @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Scan metrics [iceberg]

2025-02-14 Thread via GitHub
eshishki closed pull request #12276: Scan metrics URL: https://github.com/apache/iceberg/pull/12276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issu

Re: [PR] List data and metadata directories instead of table root [iceberg]

2025-02-14 Thread via GitHub
karuppayya commented on PR #12278: URL: https://github.com/apache/iceberg/pull/12278#issuecomment-2660441981 This change doesnt solve the following cases: 1. Tables that the same location root 2. Tables that are that have root as data or metadata directories of a different table

[PR] List data and metadata directories instead of table root [iceberg]

2025-02-14 Thread via GitHub
karuppayya opened a new pull request, #12278: URL: https://github.com/apache/iceberg/pull/12278 ### Issue Tables t1 and t2 use a common prefix for their table location `/path/to/shared_root`. t1 has it data and metadata dir -> `/path/to/shared_root/t1` t2 has its data a and metad

Re: [PR] AWS, AZURE: Move docker-based tests to integration test source [iceberg]

2025-02-14 Thread via GitHub
anuragmantri commented on PR #12274: URL: https://github.com/apache/iceberg/pull/12274#issuecomment-2660435251 The moved classes include `TestS3FileIO` (see https://github.com/apache/iceberg/issues/12237) and `ADLSFileIOTest`. These seem like unit tests that need to run with `gradle test`.

[PR] Scan metrics [iceberg]

2025-02-14 Thread via GitHub
eshishki opened a new pull request, #12276: URL: https://github.com/apache/iceberg/pull/12276 spark-sql ()> SET spark.sql.cli.print.header=true; key value spark.sql.cli.print.header true Time taken: 0.783 seconds, Fetched 1 row(s) spark-sql ()> call my_catalog.system.scan_

[PR] Spark-3.5: Add unit tests for ColumnarBatchUtil [iceberg]

2025-02-14 Thread via GitHub
anuragmantri opened a new pull request, #12275: URL: https://github.com/apache/iceberg/pull/12275 Fixes: https://github.com/apache/iceberg/issues/12054 `ColumnarBatchUtil` class was added as part of delete logic refactor https://github.com/apache/iceberg/pull/11933. This PR adds unit

Re: [PR] Spark-3.5: Add unit tests for ColumnarBatchUtil [iceberg]

2025-02-14 Thread via GitHub
anuragmantri commented on code in PR #12275: URL: https://github.com/apache/iceberg/pull/12275#discussion_r1956798756 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/data/vectorized/TestColumnarBatchUtil.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Softwa

Re: [I] [feat] add missing metadata tables [iceberg-python]

2025-02-14 Thread via GitHub
soumya-ghosh commented on issue #1053: URL: https://github.com/apache/iceberg-python/issues/1053#issuecomment-2660391908 @kevinjqliu awaiting your thoughts on above 4 comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Trying to create a table [iceberg-go]

2025-02-14 Thread via GitHub
ebrotz opened a new issue, #302: URL: https://github.com/apache/iceberg-go/issues/302 ### Question I'm trying to create a table on a local MinIO instance and I'm just getting a `filesystem IO does not support writing` I'm using a postgres catalog and already have some entries in ther

Re: [PR] AWS, AZURE: Move docker-based tests to integration test source [iceberg]

2025-02-14 Thread via GitHub
anuragmantri commented on code in PR #12274: URL: https://github.com/apache/iceberg/pull/12274#discussion_r1956788320 ## .baseline/checkstyle/checkstyle-suppressions.xml: ## @@ -23,16 +23,16 @@ for your changes to take effect in its Checkstyle integration. --> - -

[PR] AWS, AZURE: Move docker-based tests to integration test source [iceberg]

2025-02-14 Thread via GitHub
anuragmantri opened a new pull request, #12274: URL: https://github.com/apache/iceberg/pull/12274 Fixes: https://github.com/apache/iceberg/issues/12236 During the testing of new releases, some community members observed test failures for certain Docker-based tests. I have encountered

Re: [PR] AWS, AZURE: Move docker-based tests to integration test source [iceberg]

2025-02-14 Thread via GitHub
anuragmantri commented on code in PR #12274: URL: https://github.com/apache/iceberg/pull/12274#discussion_r1956753811 ## baseline.gradle: ## @@ -78,7 +78,7 @@ subprojects { tasks.withType(JavaCompile).configureEach { options.errorprone.errorproneArgs.addAll (

[I] Partition spec mismatch when 'compatibility.snapshot-id-inheritance.enabled' is true [iceberg]

2025-02-14 Thread via GitHub
sfc-gh-yijli opened a new issue, #12273: URL: https://github.com/apache/iceberg/issues/12273 ### Apache Iceberg version None ### Query engine Spark ### Please describe the bug 🐞 The behavior of `add_files` procedure in Spark is affected by table property `c

Re: [I] [feature request] Support reading equality delete files [iceberg-python]

2025-02-14 Thread via GitHub
sfc-gh-mrojas commented on issue #1210: URL: https://github.com/apache/iceberg-python/issues/1210#issuecomment-2660215518 @Zyiqin-Miranda is there any progress on supporting equality deletes in pyiceberg ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Support reading initial-defaults [iceberg-python]

2025-02-14 Thread via GitHub
gabeiglio commented on PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#issuecomment-2660194994 No problem! I was trying to get a test case for this by evolving the schema of a table and adding a new field with some initial-default value, but i think we have to wait for V3

Re: [PR] Docs: Add rewrite-table-path in spark procedure [iceberg]

2025-02-14 Thread via GitHub
szehon-ho commented on code in PR #12115: URL: https://github.com/apache/iceberg/pull/12115#discussion_r1956628329 ## docs/docs/spark-procedures.md: ## @@ -972,4 +972,100 @@ CALL catalog_name.system.compute_table_stats(table => 'my_table', snapshot_id => Collect statistics of

Re: [PR] Support Remove Branch or Tag APIs [iceberg-python]

2025-02-14 Thread via GitHub
Fokko closed pull request #822: Support Remove Branch or Tag APIs URL: https://github.com/apache/iceberg-python/pull/822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Implement update for `remove-snapshot-ref` action [iceberg-python]

2025-02-14 Thread via GitHub
Fokko merged PR #1598: URL: https://github.com/apache/iceberg-python/pull/1598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Implement update for `remove-snapshot-ref` action [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on PR #1598: URL: https://github.com/apache/iceberg-python/pull/1598#issuecomment-2660130311 @grihabor Sorry for not following up here, thanks for adding this, let's get this in πŸš€ -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Support reading initial-defaults [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on PR #1644: URL: https://github.com/apache/iceberg-python/pull/1644#issuecomment-2660123523 > Since initial-default projection happens after filtering in _task_to_record_batches Im wondering if this will yield the correct results given a pyarrow_filter for this field.

Re: [PR] Fix: `SqlCatalog` list_namespaces() should return only sub-namespaces [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #1629: URL: https://github.com/apache/iceberg-python/pull/1629#discussion_r1956631975 ## tests/catalog/test_sql.py: ## @@ -1117,17 +1117,30 @@ def test_create_namespace_with_empty_identifier(catalog: SqlCatalog, empty_names lazy_fixture("c

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2660113119 Ok, I actually found some things to help me out, documenting for later (can't look at this just right now). There's a test that explicitly removes data from the cache of the bloc

Re: [PR] Fix: `SqlCatalog` list_namespaces() should return only sub-namespaces [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #1629: URL: https://github.com/apache/iceberg-python/pull/1629#discussion_r1956607961 ## pyiceberg/catalog/sql.py: ## @@ -610,15 +610,26 @@ def list_namespaces(self, namespace: Union[str, Identifier] = ()) -> List[Identi table_stmt = sele

Re: [PR] Fix: `SqlCatalog` list_namespaces() should return only sub-namespaces [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #1629: URL: https://github.com/apache/iceberg-python/pull/1629#discussion_r1956606924 ## pyiceberg/catalog/sql.py: ## @@ -610,15 +610,26 @@ def list_namespaces(self, namespace: Union[str, Identifier] = ()) -> List[Identi table_stmt = sele

Re: [PR] Support `wasb://` and `wasbs://` [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on PR #1663: URL: https://github.com/apache/iceberg-python/pull/1663#issuecomment-2660069707 There is also an open issue on the `adlfs` side: https://github.com/fsspec/adlfs/issues/403 Regarding https://github.com/fsspec/adlfs/pull/493, is the protocol identical? --

Re: [PR] S3: Disable strong integrity checksums [iceberg]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #12264: URL: https://github.com/apache/iceberg/pull/12264#discussion_r1956601588 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3RequestUtil.java: ## @@ -149,4 +151,10 @@ static void configurePermission( Function aclSetter) { aclSetter.

Re: [I] Link LEARN MORE vom https://iceberg.apache.org/about/ runs into Not Found [iceberg]

2025-02-14 Thread via GitHub
RussellSpitzer closed issue #12265: Link LEARN MORE vom https://iceberg.apache.org/about/ runs into Not Found URL: https://github.com/apache/iceberg/issues/12265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
RussellSpitzer commented on PR #12272: URL: https://github.com/apache/iceberg/pull/12272#issuecomment-2660033318 Thank you @danicafine and @Fokko for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
RussellSpitzer commented on code in PR #12272: URL: https://github.com/apache/iceberg/pull/12272#discussion_r1956581520 ## site/docs/about.md: ## @@ -22,7 +22,7 @@ Iceberg is a high-performance format for huge analytic tables. Iceberg brings th - + Re

Re: [PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
RussellSpitzer merged PR #12272: URL: https://github.com/apache/iceberg/pull/12272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #12272: URL: https://github.com/apache/iceberg/pull/12272#discussion_r1956565398 ## site/docs/about.md: ## @@ -22,7 +22,7 @@ Iceberg is a high-performance format for huge analytic tables. Iceberg brings th - + Review Comm

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2660002476 Hi @Fokko, yes, it could be that the underlying issue is the same (i.e. Spark moving the SerializedTable to disk and closing the IO that is still in use). I will see if I can f

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
Fokko commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2659989552 Thanks for raising this @mgmarino. I think this is related to another issue I fixed recently: https://github.com/apache/iceberg/pull/11858 Would it be possible to add a test to illu

Re: [PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #12272: URL: https://github.com/apache/iceberg/pull/12272#discussion_r1956553063 ## site/docs/about.md: ## @@ -22,7 +22,7 @@ Iceberg is a high-performance format for huge analytic tables. Iceberg brings th - + Review Comm

Re: [PR] Spark: Structured Streaming read limit support follow-up [iceberg]

2025-02-14 Thread via GitHub
wypoon commented on PR #12260: URL: https://github.com/apache/iceberg/pull/12260#issuecomment-2659965202 Thanks @singhpk234. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Spark: Structured Streaming read limit support follow-up [iceberg]

2025-02-14 Thread via GitHub
wypoon commented on code in PR #12260: URL: https://github.com/apache/iceberg/pull/12260#discussion_r1956540735 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -309,6 +312,49 @@ private static StreamingOffset determineStarting

Re: [PR] Spark: Remove closing of IO in SerializableTable* [iceberg]

2025-02-14 Thread via GitHub
mgmarino commented on PR #12129: URL: https://github.com/apache/iceberg/pull/12129#issuecomment-2659943896 @nastra should I ask in the dev mailing list to try for feedback? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu closed issue #1492: [feature] Add support for `write.data.path` and `write.metadata.path` URL: https://github.com/apache/iceberg-python/issues/1492 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [feature] Add support for `write.data.path` and `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu commented on issue #1492: URL: https://github.com/apache/iceberg-python/issues/1492#issuecomment-2659907096 All done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu commented on PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#issuecomment-2659906743 Thanks @geruh for the contribution! And thanks @smaheshwar-pltr @Fokko for the review! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] CORE: return false when view exists endpoint isn't supported [iceberg]

2025-02-14 Thread via GitHub
danielcweeks commented on code in PR #12259: URL: https://github.com/apache/iceberg/pull/12259#discussion_r1956504678 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -1239,7 +1239,9 @@ public List listViews(SessionContext context, Namespace namespa

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu merged PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Core: use ReachableFileCleanup when table has discontinuous snapshots [iceberg]

2025-02-14 Thread via GitHub
MavsLee commented on PR #12261: URL: https://github.com/apache/iceberg/pull/12261#issuecomment-2659899475 cc @amogh-jahagirdar @flyrain @RussellSpitzer pls help review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Upsert: Reuse existing expression to detect rows to be inserted [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu merged PR #1662: URL: https://github.com/apache/iceberg-python/pull/1662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu commented on PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#issuecomment-2659867627 Thanks for the great suggestions, i added them and double check the docs rendering ![Screenshot 2025-02-14 at 9 14 56β€― AM](https://github.com/user-attachments/assets/ecdeb

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
kevinjqliu commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956481300 ## mkdocs/docs/configuration.md: ## @@ -203,12 +204,16 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya ## Location Pro

[PR] Minor: update Learn More to point to spark quickstart [iceberg]

2025-02-14 Thread via GitHub
danicafine opened a new pull request, #12272: URL: https://github.com/apache/iceberg/pull/12272 Fixes #12265 Button now points to /spark-quickstart (same as 'Learn More' on the homepage). https://github.com/user-attachments/assets/f1448df0-7890-4689-875d-b40d0e68a7e4"; />

Re: [PR] OpenAPI: Add overwrite option when registering an iceberg table [iceberg]

2025-02-14 Thread via GitHub
nastra commented on code in PR #12239: URL: https://github.com/apache/iceberg/pull/12239#discussion_r1956389984 ## open-api/rest-catalog-open-api.yaml: ## @@ -3463,6 +3463,10 @@ components: type: string metadata-location: type: string +over

Re: [I] Snowflake managed Open Catalog and Azure ADLS2 [iceberg-python]

2025-02-14 Thread via GitHub
christophediprima commented on issue #1606: URL: https://github.com/apache/iceberg-python/issues/1606#issuecomment-2659726962 I opened two PR: https://github.com/apache/iceberg-python/pull/1663 https://github.com/fsspec/adlfs/pull/493 -- This is an automated message from the Ap

Re: [I] Add properties support for HadoopTables.load() [iceberg]

2025-02-14 Thread via GitHub
RussellSpitzer commented on issue #12251: URL: https://github.com/apache/iceberg/issues/12251#issuecomment-2659693667 I'm very interested to know how manifest caching is helping you, To my knowledge it's generally disabled for just about everyone. Do you have a long lived processes that jus

Re: [PR] Spark: Rewrite V2 deletes to V3 DVs [iceberg]

2025-02-14 Thread via GitHub
nastra commented on code in PR #12250: URL: https://github.com/apache/iceberg/pull/12250#discussion_r1956228490 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeletesRewrite.java: ## @@ -213,45 +217,61 @@ static class PositionDeletesWriterFactory

Re: [PR] Spark: Rewrite V2 deletes to V3 DVs [iceberg]

2025-02-14 Thread via GitHub
nastra commented on code in PR #12250: URL: https://github.com/apache/iceberg/pull/12250#discussion_r1956220825 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewritePositionDeleteFilesSparkAction.java: ## @@ -404,8 +408,31 @@ private void validateAndInitOpti

Re: [PR] implement a new iceberg data type: protected_type [iceberg-python]

2025-02-14 Thread via GitHub
yigal-rozenberg closed pull request #1594: implement a new iceberg data type: protected_type URL: https://github.com/apache/iceberg-python/pull/1594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] implement a new iceberg data type: protected_type [iceberg-python]

2025-02-14 Thread via GitHub
yigal-rozenberg commented on PR #1594: URL: https://github.com/apache/iceberg-python/pull/1594#issuecomment-2659379279 I agree, this merge request is premature, and might be required down the road once we tie things together with Apache Parquet and Arrow projects. the Iceberg-Java interface

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
smaheshwar-pltr commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956168492 ## pyiceberg/table/locations.py: ## @@ -64,6 +71,35 @@ def new_data_location(self, data_file_name: str, partition_key: Optional[Partiti str

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
smaheshwar-pltr commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956148860 ## mkdocs/docs/configuration.md: ## @@ -203,12 +204,16 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya ## Locatio

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
smaheshwar-pltr commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956153130 ## mkdocs/docs/configuration.md: ## @@ -203,12 +204,16 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya ## Locatio

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
smaheshwar-pltr commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956148860 ## mkdocs/docs/configuration.md: ## @@ -203,12 +204,16 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya ## Locatio

[I] Cherrypick the data rows [deleted or old values] from a past snapshot [iceberg]

2025-02-14 Thread via GitHub
Shekharrajak opened a new issue, #12271: URL: https://github.com/apache/iceberg/issues/12271 ### Feature Request / Improvement Hello team, Is there any way to pick the specific partition or data rows from the old snapshots to main snapshot ? Example: When we del

Re: [PR] OpenAPI: Add RemoveSchemas REST update type [iceberg]

2025-02-14 Thread via GitHub
amogh-jahagirdar merged PR #12022: URL: https://github.com/apache/iceberg/pull/12022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] OpenAPI: Add RemoveSchemas REST update type [iceberg]

2025-02-14 Thread via GitHub
amogh-jahagirdar commented on PR #12022: URL: https://github.com/apache/iceberg/pull/12022#issuecomment-2659113436 Thanks @gaborkaszab , thanks @advancedxy @rdblue @flyrain @nastra @stevenzwu for reviewing! I'll go ahead and merge since the vote passed -- This is an automated message from

Re: [PR] Add support for `write.metadata.path` [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #1642: URL: https://github.com/apache/iceberg-python/pull/1642#discussion_r1956008322 ## mkdocs/docs/configuration.md: ## @@ -203,12 +204,16 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya ## Location Provider

Re: [PR] Bump to Iceberg Java 1.8.0 [iceberg-python]

2025-02-14 Thread via GitHub
Fokko merged PR #1633: URL: https://github.com/apache/iceberg-python/pull/1633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955942576 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Apply residuals when reading a table [iceberg-python]

2025-02-14 Thread via GitHub
Fokko commented on code in PR #1654: URL: https://github.com/apache/iceberg-python/pull/1654#discussion_r1955971167 ## pyiceberg/io/pyarrow.py: ## @@ -1342,9 +1342,8 @@ def _get_column_projection_values( def _task_to_record_batches( fs: FileSystem, task: FileScanTask,

Re: [I] Add properties support for HadoopTables.load() [iceberg]

2025-02-14 Thread via GitHub
qqchang2nd commented on issue #12251: URL: https://github.com/apache/iceberg/issues/12251#issuecomment-2658960764 You're right that HadoopTables isn't a catalog - it's a lower-level implementation for managing Iceberg tables directly on HDFS without a catalog. Let me explain our use c

Re: [PR] Docker: Pin QEMU version temporarily [iceberg]

2025-02-14 Thread via GitHub
nastra merged PR #12262: URL: https://github.com/apache/iceberg/pull/12262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

[I] After WRITE ORDERED BY is set in the Iceberg table, Spark will generate an additional layer of Job tasks and no shuffle is generated. Is this a bug? [iceberg]

2025-02-14 Thread via GitHub
SGITLOGIN opened a new issue, #12268: URL: https://github.com/apache/iceberg/issues/12268 ### Apache Iceberg version 1.6.1 ### Query engine Spark ### Please describe the bug 🐞 ### Create table spark.sql(""" CREATE TABLE test.iceberg_impression_log_05 (

Re: [PR] API: Deprecate NestedType.of in favor of builder [iceberg]

2025-02-14 Thread via GitHub
nastra merged PR #12227: URL: https://github.com/apache/iceberg/pull/12227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] API: Deprecate NestedType.of in favor of builder [iceberg]

2025-02-14 Thread via GitHub
nastra commented on PR #12227: URL: https://github.com/apache/iceberg/pull/12227#issuecomment-2658878696 LGTM, thanks @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955843549 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on PR #11837: URL: https://github.com/apache/iceberg/pull/11837#issuecomment-2658767328 Thanks for taking a look @pvary, @amogh-jahagirdar and @steveloughran ! I'll be offline for a couple of days but will take a deeper look after that. Another thing we discussed

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955828374 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955843549 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955828374 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: add variant builder implementation [iceberg]

2025-02-14 Thread via GitHub
XBaith commented on code in PR #11857: URL: https://github.com/apache/iceberg/pull/11857#discussion_r1955818227 ## core/src/main/java/org/apache/iceberg/variants/VariantBuilderBase.java: ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Core: Bulk deletion in RemoveSnapshots [iceberg]

2025-02-14 Thread via GitHub
gaborkaszab commented on code in PR #11837: URL: https://github.com/apache/iceberg/pull/11837#discussion_r1955819540 ## core/src/main/java/org/apache/iceberg/BulkDeleteConsumer.java: ## @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] update component version [iceberg]

2025-02-14 Thread via GitHub
qixian-jiajia closed pull request #12267: update component version URL: https://github.com/apache/iceberg/pull/12267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] update component version [iceberg]

2025-02-14 Thread via GitHub
qixian-jiajia commented on PR #12267: URL: https://github.com/apache/iceberg/pull/12267#issuecomment-2658690918 agree -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] update component version [iceberg]

2025-02-14 Thread via GitHub
qixian-jiajia opened a new pull request, #12267: URL: https://github.com/apache/iceberg/pull/12267 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Add Variant custom logical type for Avro [iceberg]

2025-02-14 Thread via GitHub
XBaith commented on code in PR #12238: URL: https://github.com/apache/iceberg/pull/12238#discussion_r1955779484 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -265,6 +265,11 @@ public Schema map(Schema map, Supplier value) { } } + @Overr

Re: [PR] Spark: Fix assertion checks [iceberg]

2025-02-14 Thread via GitHub
nastra commented on code in PR #12255: URL: https://github.com/apache/iceberg/pull/12255#discussion_r1955706957 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewritePositionDeleteFilesAction.java: ## @@ -1075,49 +1075,49 @@ private void checkResult(

  1   2   >