Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
omkenge closed pull request #1239: feat: Implement list_views Method and __is_view Utility Function URL: https://github.com/apache/iceberg-python/pull/1239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add list view for hive catalog [iceberg-python]

2024-10-25 Thread via GitHub
omkenge closed pull request #1249: Add list view for hive catalog URL: https://github.com/apache/iceberg-python/pull/1249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] Add list view for hive catalog [iceberg-python]

2024-10-25 Thread via GitHub
omkenge opened a new pull request, #1249: URL: https://github.com/apache/iceberg-python/pull/1249 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
omkenge commented on PR #1239: URL: https://github.com/apache/iceberg-python/pull/1239#issuecomment-2439396859 Hello Team lets closed this PR ,I will add list_view for hive catalog in new PR -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Bump junit from 5.10.1 to 5.11.1 [iceberg]

2024-10-25 Thread via GitHub
findepi commented on PR #11262: URL: https://github.com/apache/iceberg/pull/11262#issuecomment-2439395678 thank you @tomtongue for your work on this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Flink 1.20: Update Flink to use planned Avro reads [iceberg]

2024-10-25 Thread via GitHub
pvary commented on code in PR #11386: URL: https://github.com/apache/iceberg/pull/11386#discussion_r1817698482 ## flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkAvroReaderWriter.java: ## @@ -91,7 +91,7 @@ private void writeAndValidate(Schema schema, List

Re: [PR] Flink 1.20: Update Flink to use planned Avro reads [iceberg]

2024-10-25 Thread via GitHub
jbonofre commented on PR #11386: URL: https://github.com/apache/iceberg/pull/11386#issuecomment-2439382344 I fixed the issue on `ValueReaders` about strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Core: Update TableMetadataParser to ensure all streams closed [iceberg]

2024-10-25 Thread via GitHub
findepi commented on PR #11220: URL: https://github.com/apache/iceberg/pull/11220#issuecomment-2439372774 Merged, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Core: Update TableMetadataParser to ensure all streams closed [iceberg]

2024-10-25 Thread via GitHub
findepi merged PR #11220: URL: https://github.com/apache/iceberg/pull/11220 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-25 Thread via GitHub
singhpk234 commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1817628531 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,8 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817603690 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable(

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
stevie9868 commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439181222 > Ah, do you have `_autocommit` set to `True`? Since both delete and fast_append ultimately call transaction's `_apply` to queue up the updates, having `_autocommit` set to `Tru

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
stevie9868 commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439178061 > Thanks for the PR @stevie9868. This sounds like an important bug to address. > > Do you know if this bug only applies to the `overwrite` function or all functions in Tr

Re: [PR] feat: more builders and writing manifests [iceberg-go]

2024-10-25 Thread via GitHub
dwilson1988 commented on code in PR #177: URL: https://github.com/apache/iceberg-go/pull/177#discussion_r1817404112 ## manifest.go: ## @@ -567,6 +570,97 @@ func ReadManifestList(in io.Reader) ([]ManifestFile, error) { return out, dec.Error() } +// WriteManifestListV2

Re: [PR] Spec: Fix table of content generation [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #11067: URL: https://github.com/apache/iceberg/pull/11067#issuecomment-2438751173 Thanks @ajantha-bhat and @danielcweeks , @rdblue , @manuzhang and @amogh-jahagirdar for review -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
stevie9868 commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439167367 Thanks for the detail walk through! I believe if [self.update_snapshot(snapshot_properties=snapshot_properties).fast_append()](https://github.com/apache/iceberg-python/blo

Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]

2024-10-25 Thread via GitHub
shiv-io commented on PR #1242: URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-2439162592 @sungwy I used [tabulario/iceberg-rest](https://hub.docker.com/r/tabulario/iceberg-rest) image to spin up a REST catalog server locally to test with -- This is an automated mess

[PR] Bump werkzeug from 3.0.4 to 3.0.6 [iceberg-python]

2024-10-25 Thread via GitHub
dependabot[bot] opened a new pull request, #1248: URL: https://github.com/apache/iceberg-python/pull/1248 Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.4 to 3.0.6. Release notes Sourced from https://github.com/pallets/werkzeug/releases";>werkzeug's releases.

Re: [I] Implement rolling manifest-writers [iceberg-python]

2024-10-25 Thread via GitHub
github-actions[bot] commented on issue #596: URL: https://github.com/apache/iceberg-python/issues/596#issuecomment-2439072396 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [PR] Add `view_exists` method to REST Catalog [iceberg-python]

2024-10-25 Thread via GitHub
sungwy commented on PR #1242: URL: https://github.com/apache/iceberg-python/pull/1242#issuecomment-2439077800 Hi @shiv-io - thank you for putting together this PR! > When I tested catalog.view_exists('default.bar') with a local REST catalog, I got the following exception. This also o

Re: [I] Implement rolling manifest-writers [iceberg-python]

2024-10-25 Thread via GitHub
github-actions[bot] closed issue #596: Implement rolling manifest-writers URL: https://github.com/apache/iceberg-python/issues/596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar merged PR #11354: URL: https://github.com/apache/iceberg/pull/11354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] Dynamically support Spark native engine in Iceberg [iceberg]

2024-10-25 Thread via GitHub
github-actions[bot] commented on PR #9721: URL: https://github.com/apache/iceberg/pull/9721#issuecomment-2439070558 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [I] Add a table API to compute partition stats. [iceberg]

2024-10-25 Thread via GitHub
github-actions[bot] commented on issue #10105: URL: https://github.com/apache/iceberg/issues/10105#issuecomment-2439070912 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] read from Iceberg table throw java.lang.ArrayIndexOutOfBoundsException: 3 [iceberg]

2024-10-25 Thread via GitHub
github-actions[bot] commented on issue #10103: URL: https://github.com/apache/iceberg/issues/10103#issuecomment-2439070899 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] Spark procedure to compute partition stats. [iceberg]

2024-10-25 Thread via GitHub
github-actions[bot] commented on issue #10106: URL: https://github.com/apache/iceberg/issues/10106#issuecomment-2439070945 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Dynamically support Spark native engine in Iceberg [iceberg]

2024-10-25 Thread via GitHub
github-actions[bot] closed pull request #9721: Dynamically support Spark native engine in Iceberg URL: https://github.com/apache/iceberg/pull/9721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Block writing to sorted tables [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu opened a new issue, #1247: URL: https://github.com/apache/iceberg-python/issues/1247 ### Apache Iceberg version None ### Please describe the bug 🐞 Verify that we disallow writing an unsorted data frame to a table with sort order. -- This is an automated mes

Re: [PR] feat: more builders and writing manifests [iceberg-go]

2024-10-25 Thread via GitHub
dwilson1988 commented on code in PR #177: URL: https://github.com/apache/iceberg-go/pull/177#discussion_r1817398466 ## manifest.go: ## @@ -567,6 +570,97 @@ func ReadManifestList(in io.Reader) ([]ManifestFile, error) { return out, dec.Error() } +// WriteManifestListV2

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817447914 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable(

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
viirya commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817444036 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -81,14 +84,15 @@ private CloseableIterable newParquetIterable( Spa

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439024843 Ah, do you have `_autocommit` set to `True`? Since both delete and fast_append ultimately call transaction's `_apply` to queue up the updates, having `_autocommit` set to `Tr

Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
omkenge commented on PR #1239: URL: https://github.com/apache/iceberg-python/pull/1239#issuecomment-2438960683 HI @kevinjqliu , You are right .. So no need to add the list_view for aws glue catalog But can I work on Hive Catalog get_views method ? -- This is an automated message from

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439020328 > We have encountered a data loss issue when using pyIceberg to perform an overwrite operation. Typically, an overwrite operation involves creating both a delete snapshot and an

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439020695 Please let me know if the above makes sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817436902 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -125,4 +129,28 @@ private CloseableIterable newOrcIterable( .

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
viirya commented on code in PR #11390: URL: https://github.com/apache/iceberg/pull/11390#discussion_r1817429106 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -125,4 +129,28 @@ private CloseableIterable newOrcIterable( .wit

Re: [PR] AWS: Refresh vended credentials [iceberg]

2024-10-25 Thread via GitHub
singhpk234 commented on code in PR #11389: URL: https://github.com/apache/iceberg/pull/11389#discussion_r1815553135 ## aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438965356 @pvary Thank you for your suggestion! You're correct that adding such a test would help prevent future changes from inadvertently affecting this behavior without notice. Currently, Sp

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on code in PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#discussion_r1817422310 ## pyiceberg/table/__init__.py: ## @@ -231,9 +233,13 @@ def __enter__(self) -> Transaction: """Start a transaction to update the table.""" r

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
kevinjqliu commented on code in PR #11354: URL: https://github.com/apache/iceberg/pull/11354#discussion_r1817229656 ## core/src/test/java/org/apache/iceberg/TestSnapshotJson.java: ## @@ -74,21 +72,23 @@ public void testToJsonWithOperation() throws IOException { Immu

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2438957024 Thanks for the PR @stevie9868. This sounds like an important bug to address. PS I reran the CI -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
dramaticlly commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438914981 > @huaxingao its a good find, im just wondering, where do we add _pos to the schema? Can we just not do it there? Just curious if its possible I think it might be from here h

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438938087 @szehon-ho I think we still need the `_pos` in the `requiredSchema` to build [`posAccessor`](https://github.com/apache/iceberg/blob/main/data/src/main/java/org/apache/iceberg/data/Dele

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-25 Thread via GitHub
dramaticlly commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1817405662 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,8 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | fals

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-25 Thread via GitHub
himadripal commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1817404590 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,8 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1817239389 ## format/puffin-spec.md: ## @@ -123,6 +123,54 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct va

Re: [PR] API: Add Variant data type [iceberg]

2024-10-25 Thread via GitHub
aihuaxu commented on code in PR #11324: URL: https://github.com/apache/iceberg/pull/11324#discussion_r1817394294 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -402,6 +406,101 @@ public int hashCode() { } } + /** A VariantLike implementation for testi

Re: [PR] feat: more builders and writing manifests [iceberg-go]

2024-10-25 Thread via GitHub
zeroshade commented on code in PR #177: URL: https://github.com/apache/iceberg-go/pull/177#discussion_r1817390906 ## manifest.go: ## @@ -567,6 +570,97 @@ func ReadManifestList(in io.Reader) ([]ManifestFile, error) { return out, dec.Error() } +// WriteManifestListV2 w

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar commented on PR #11354: URL: https://github.com/apache/iceberg/pull/11354#issuecomment-2438901117 Thanks @kevinjqliu , I caught up on the discussion and this looks right to me! thanks @nastra , @RussellSpitzer for the reviews. -- This is an automated message from the Apac

Re: [PR] Exclude reading pos_ column if it's not in the scan list [iceberg]

2024-10-25 Thread via GitHub
szehon-ho commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2438907910 @huaxingao its a good find, im just wondering, where do we add _pos to the schema? Can we just not do it there? Just curious if its possible -- This is an automated message from t

Re: [PR] Spec: add variant type [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1817325374 ## format/spec.md: ## @@ -444,6 +449,9 @@ Sorting floating-point numbers should produce the following behavior: `-NaN` < ` A data or delete file is associa

Re: [PR] Fix ADLSLocation file parsing [iceberg]

2024-10-25 Thread via GitHub
danielcweeks commented on PR #11395: URL: https://github.com/apache/iceberg/pull/11395#issuecomment-2438788687 Thanks @mrcnc , though overall it's really unfortunate that we have notably different behavior between S3 and ADLS in the URI handling. S3 allows for query params (though they're

Re: [PR] [KafkaConnect] Fix RecordConverter for UUID and Fixed Types [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #11346: URL: https://github.com/apache/iceberg/pull/11346#issuecomment-2438754792 Thanks @singhpk234 for the PR and @jbonofre, @bryanck and @ajantha-bhat For Review! -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
omkenge commented on PR #1239: URL: https://github.com/apache/iceberg-python/pull/1239#issuecomment-2438763695 Hi @kevinjqliu , You are correct that AWS Glue is not support Iceberg View. It's feasible to implement a list_views function in PyIceberg with the AWS Glue Catalog, even thoug

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1817237625 ## format/spec.md: ## @@ -585,13 +589,19 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo | _optional_ | _optional_ | _option

Re: [PR] Spec: add variant type [iceberg]

2024-10-25 Thread via GitHub
aihuaxu commented on PR #10831: URL: https://github.com/apache/iceberg/pull/10831#issuecomment-2438757395 > @aihuaxu, I think there are a couple of things missing: > > * The Avro appendix should be updated to state that a Variant is stored as a Record with two fields, a required binar

Re: [PR] [KafkaConnect] Fix RecordConverter for UUID and Fixed Types [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer merged PR #11346: URL: https://github.com/apache/iceberg/pull/11346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #11354: URL: https://github.com/apache/iceberg/pull/11354#issuecomment-2438753142 > Yes I meant checking there is no top level "operation" field { snapshot { operation: {} // <-- Did we do this before? summary: {} } @RussellSpitzer I don't thin

Re: [PR] Spec: Fix table of content generation [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer merged PR #11067: URL: https://github.com/apache/iceberg/pull/11067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #11354: URL: https://github.com/apache/iceberg/pull/11354#issuecomment-2438745450 > @RussellSpitzer On write, the `operation` field is added to the `summary` map https://github.com/apache/iceberg/pull/11354/files#diff-7ed51a90c01ae74858022052b57c6b39544e1fa5f4

Re: [PR] Spec: add variant type [iceberg]

2024-10-25 Thread via GitHub
aihuaxu commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1816076539 ## format/spec.md: ## @@ -1025,28 +1033,29 @@ Values should be stored in Parquet using the types and logical type annotations Lists must use the [3-level represe

Re: [PR] Spec: add variant type [iceberg]

2024-10-25 Thread via GitHub
aihuaxu commented on code in PR #10831: URL: https://github.com/apache/iceberg/pull/10831#discussion_r1817306749 ## format/spec.md: ## @@ -1297,54 +1308,56 @@ Example This serialization scheme is for storing single values as individual binary values in the lower and upper bo

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1817300773 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -711,7 +719,7 @@ public static void importSparkPartitions( spec,

Re: [PR] Core: Snapshot `summary` map must have `operation` key [iceberg]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #11354: URL: https://github.com/apache/iceberg/pull/11354#issuecomment-2438725062 @RussellSpitzer On write, the `operation` field is added to the `summary` map https://github.com/apache/iceberg/pull/11354/files#diff-7ed51a90c01ae74858022052b57c6b39544e1fa5f4bf

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1817289632 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -92,6 +98,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Logica

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating Spark tables [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11157: URL: https://github.com/apache/iceberg/pull/11157#discussion_r1817288548 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkTableUtil.java: ## @@ -92,6 +98,8 @@ import org.apache.spark.sql.catalyst.plans.logical.Logica

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1817242810 ## format/puffin-spec.md: ## @@ -123,6 +123,57 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct va

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #10774: URL: https://github.com/apache/iceberg/pull/10774#issuecomment-2438645577 Yes I think we are good here! Thanks @manuzhang for the patch and @szehon-ho for the review -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1817243291 ## format/puffin-spec.md: ## @@ -123,6 +123,57 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct va

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer merged PR #10774: URL: https://github.com/apache/iceberg/pull/10774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1817231952 ## format/spec.md: ## @@ -454,35 +457,40 @@ The schema of a manifest file is a struct called `manifest_entry` with the follo `data_file` is a struct with the

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1817227764 ## format/spec.md: ## @@ -841,19 +855,45 @@ Notes: ## Delete Formats -This section details how to encode row-level deletes in Iceberg delete files. Row-leve

Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1239: URL: https://github.com/apache/iceberg-python/pull/1239#issuecomment-2438625062 Does AWS glue catalog currently support Iceberg view? I could not find any documentation on iceberg view support. On the Java side, GlueCatalog does not current support Iceberg

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-25 Thread via GitHub
emkornfield commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1817225777 ## format/spec.md: ## @@ -841,19 +855,45 @@ Notes: ## Delete Formats -This section details how to encode row-level deletes in Iceberg delete files. Row-leve

Re: [PR] feat: Implement list_views Method and __is_view Utility Function [iceberg-python]

2024-10-25 Thread via GitHub
kevinjqliu commented on PR #1239: URL: https://github.com/apache/iceberg-python/pull/1239#issuecomment-2438622549 Sorry about the confusion, I didn't see that this PR is specific to glue. My comment on the integration test is for adding iceberg views in general. I don't think we have suffic

Re: [PR] fix: do not sort indices for `ProjectionMask::leaves` [iceberg-rust]

2024-10-25 Thread via GitHub
sdd commented on PR #682: URL: https://github.com/apache/iceberg-rust/pull/682#issuecomment-2438401322 @liurenjie1024 or @Xuanwo are you able to re-run the tests on this? I can't explain why the test would fail the way that it has in CI and when I reproduce the change from this PR on `main`

Re: [PR] abort the whole table transaction if any updates in the transaction has failed [iceberg-python]

2024-10-25 Thread via GitHub
stevie9868 commented on PR #1246: URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2438355903 @HonahX Thanks for unblocking the testing actions! But looks like the curl command in Python CI/lint-and-test 3.10 times out. -- This is an automated message from th

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1817032978 ## core/src/main/java/org/apache/iceberg/data/PartitionStatsRecord.java: ## @@ -0,0 +1,170 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [I] Nessie Iceberg REST catalog and writing to localstack raises `OSError: When initiating multiple part upload` [iceberg-python]

2024-10-25 Thread via GitHub
allilou commented on issue #1087: URL: https://github.com/apache/iceberg-python/issues/1087#issuecomment-2438337665 > > I updated my docker-compose.yaml to use extra_hosts and it worked. Closing this issue. > > I'm facing the same error, can you please give a snippet how you add the

Re: [PR] Core: Remove one comment from FastAppend [iceberg]

2024-10-25 Thread via GitHub
gaborkaszab commented on code in PR #10995: URL: https://github.com/apache/iceberg/pull/10995#discussion_r1816970746 ## core/src/test/java/org/apache/iceberg/TestFastAppend.java: ## @@ -252,11 +252,36 @@ public void testFailure() { assertThat(new File(newManifest.path())).d

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1816957914 ## data/src/test/java/org/apache/iceberg/data/TestPartitionStatsHandler.java: ## @@ -0,0 +1,569 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1816945414 ## core/src/test/java/org/apache/iceberg/TestTables.java: ## @@ -93,6 +93,26 @@ public static TestTable create( return new TestTable(ops, name, reporter);

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1816936025 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -249,4 +250,45 @@ public void set(int pos, T value) { throw new UnsupportedOperat

Re: [PR] Data: Add partition stats writer and reader [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11216: URL: https://github.com/apache/iceberg/pull/11216#discussion_r1816930724 ## core/src/main/java/org/apache/iceberg/PartitionStats.java: ## @@ -249,4 +250,45 @@ public void set(int pos, T value) { throw new UnsupportedOperat

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1816909771 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -78,6 +78,15 @@ private CatalogProperties() {} public static final boolean IO_MANIF

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1816909771 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -78,6 +78,15 @@ private CatalogProperties() {} public static final boolean IO_MANIF

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating partitioned Spark tables [iceberg]

2024-10-25 Thread via GitHub
manuzhang commented on PR #11157: URL: https://github.com/apache/iceberg/pull/11157#issuecomment-2438088192 Yes, it's used in `listPartitions` while the title was not accurate. Migrating unpartitioned Spark tables has the same issue. -- This is an automated message from the Apache Git Ser

Re: [I] Javadoc issues [iceberg]

2024-10-25 Thread via GitHub
jbonofre commented on issue #10378: URL: https://github.com/apache/iceberg/issues/10378#issuecomment-2438057961 @RussellSpitzer sure thing ! I will ! Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] AWS: Refresh vended credentials [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11389: URL: https://github.com/apache/iceberg/pull/11389#discussion_r1816847378 ## aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] AWS: Refresh vended credentials [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar commented on code in PR #11389: URL: https://github.com/apache/iceberg/pull/11389#discussion_r1815090267 ## aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] AWS: Refresh vended credentials [iceberg]

2024-10-25 Thread via GitHub
singhpk234 commented on code in PR #11389: URL: https://github.com/apache/iceberg/pull/11389#discussion_r1815553135 ## aws/src/main/java/org/apache/iceberg/aws/s3/VendedCredentialsProvider.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Spark: add property to disable client-side purging in spark [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on code in PR #11317: URL: https://github.com/apache/iceberg/pull/11317#discussion_r1816807747 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -365,24 +368,35 @@ public boolean purgeTable(Identifier ident) { Str

Re: [PR] Fix ADLSLocation file parsing [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #11395: URL: https://github.com/apache/iceberg/pull/11395#issuecomment-2438030667 LGTM. @danielcweeks This adds in that test I was looking for where URI would fail, although looks like we have a bug in the current implementation anyway. -- This is an autom

Re: [PR] Spark 3.5: Fix NotSerializableException when migrating partitioned Spark tables [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on PR #11157: URL: https://github.com/apache/iceberg/pull/11157#issuecomment-2437997523 > `ExecutorService` is used to parallelize reading files to build manifests on the Spark executors for Spark table migration procedures (`add_files`, `migrate`, `snapshot`).

Re: [PR] AWS: Refresh vended credentials [iceberg]

2024-10-25 Thread via GitHub
nastra closed pull request #11389: AWS: Refresh vended credentials URL: https://github.com/apache/iceberg/pull/11389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Core: Track data files by spec id instead of full PartitionSpec [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar commented on PR #11323: URL: https://github.com/apache/iceberg/pull/11323#issuecomment-2437961434 The change looks good to me, I'll go ahead and merge since @rdblue comment was addressed. Thanks for the improvement @nastra , and for the reviews @singhpk234 @rdblue! -- T

Re: [I] Javadoc issues [iceberg]

2024-10-25 Thread via GitHub
RussellSpitzer commented on issue #10378: URL: https://github.com/apache/iceberg/issues/10378#issuecomment-2438002646 @jbonofre Did you want to work on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat: Add 'Create Namespace' command to CLI [iceberg-go]

2024-10-25 Thread via GitHub
zeroshade commented on code in PR #179: URL: https://github.com/apache/iceberg-go/pull/179#discussion_r1816797304 ## cmd/iceberg/main.go: ## @@ -70,6 +71,7 @@ type Config struct { Uuid bool `docopt:"uuid"` Location bool `docopt:"location"` Propsbo

Re: [PR] Core: Track data files by spec id instead of full PartitionSpec [iceberg]

2024-10-25 Thread via GitHub
amogh-jahagirdar merged PR #11323: URL: https://github.com/apache/iceberg/pull/11323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

  1   2   >