Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676751929 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676751929 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Core:support redis and http lock-manager [iceberg]

2024-07-12 Thread via GitHub
BsoBird commented on code in PR #10688: URL: https://github.com/apache/iceberg/pull/10688#discussion_r1676751334 ## build.gradle: ## @@ -358,6 +358,7 @@ project(':iceberg-core') { implementation libs.jackson.databind implementation libs.caffeine implementation lib

[I] The unit test for class TestFlinkIcebergSink cannot be executed [iceberg]

2024-07-12 Thread via GitHub
dzzxjl opened a new issue, #10694: URL: https://github.com/apache/iceberg/issues/10694 ### Query engine Flink ### Question The unit test for class TestFlinkIcebergSink cannot be executed. In IDEA, the default unit test command is: `:iceberg-flink:iceberg-flink-1.17:test

Re: [PR] Core:support redis and http lock-manager [iceberg]

2024-07-12 Thread via GitHub
BsoBird commented on PR #10688: URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226738178 @danielcweeks @rdblue I know what you mean, Sir. 1.About abandoning the catalog implementation that relies on lockManager. I think this is too radical. I agree that it would b

Re: [PR] Core:support redis and http lock-manager [iceberg]

2024-07-12 Thread via GitHub
BsoBird commented on code in PR #10688: URL: https://github.com/apache/iceberg/pull/10688#discussion_r1676690715 ## build.gradle: ## @@ -358,6 +358,7 @@ project(':iceberg-core') { implementation libs.jackson.databind implementation libs.caffeine implementation lib

Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-12 Thread via GitHub
jayceslesar commented on PR #922: URL: https://github.com/apache/iceberg-python/pull/922#issuecomment-2226714861 went to generate the mkdocs and spawned https://github.com/apache/iceberg-python/issues/923 but I think the approach looks good. The only way to make less repeatable would be to

[I] Move mkdocs action/workflow into `docs` group [iceberg-python]

2024-07-12 Thread via GitHub
jayceslesar opened a new issue, #923: URL: https://github.com/apache/iceberg-python/issues/923 ### Feature Request / Improvement I was trying to set up mkdocs on this repo and found the existing setup a little unintuitive -- in most cases I use mkdocs as a `docs` extra in whatever de

Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-12 Thread via GitHub
HonahX commented on code in PR #922: URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676596155 ## pyiceberg/io/__init__.py: ## @@ -46,6 +48,10 @@ logger = logging.getLogger(__name__) +AWS_REGION = "client.region" Review Comment: I chose `client.` for

Re: [I] [Bug] Load the proper AWS credential for glue/dynamodb catalog [iceberg-python]

2024-07-12 Thread via GitHub
HonahX commented on issue #892: URL: https://github.com/apache/iceberg-python/issues/892#issuecomment-2226614416 Hi @kevinjqliu @jayceslesar. Thanks for the issue and valuable discussion. I would like to give a try with my proposal in https://github.com/apache/iceberg-python/issues/570#issu

Re: [I] How to move Iceberg table from one location to another [iceberg]

2024-07-12 Thread via GitHub
anuragmantri commented on issue #3142: URL: https://github.com/apache/iceberg/issues/3142#issuecomment-2226580307 There is now a PR in review which rewrites metadata with new location and also does some other checks. Please take a look at the PR for some ideas https://github.com/apach

[PR] Standardize AWS credential names [iceberg-python]

2024-07-12 Thread via GitHub
HonahX opened a new pull request, #922: URL: https://github.com/apache/iceberg-python/pull/922 There has been many discussions and concerns over the current behavior of loading AWS credential for glue/dynamo catalog: #892, #515, #570 This PR tries to standardize the property names of

Re: [PR] Encryption integration and test [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1676567998 ## core/src/main/java/org/apache/iceberg/TableMetadataParser.java: ## @@ -274,10 +291,12 @@ public static TableMetadata read(FileIO io, String path) { } public s

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676567628 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -155,7 +213,11 @@ static Snapshot fromJson(JsonNode node) { operation, summary,

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676566942 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -269,7 +323,11 @@ public Snapshot apply() { operation(), summary(base),

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676566569 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -257,10 +273,48 @@ public Snapshot apply() { .run(index -> manifestFiles[index] = manif

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676566153 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -257,10 +273,48 @@ public Snapshot apply() { .run(index -> manifestFiles[index] = manif

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676566569 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -257,10 +273,48 @@ public Snapshot apply() { .run(index -> manifestFiles[index] = manif

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676566490 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676563921 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -257,10 +273,48 @@ public Snapshot apply() { .run(index -> manifestFiles[index] = manif

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676563382 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -237,10 +244,19 @@ public Snapshot apply() { OutputFile manifestList = manifestListPath();

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676561882 ## core/src/main/java/org/apache/iceberg/encryption/EncryptionUtil.java: ## @@ -71,30 +70,35 @@ public static KeyManagementClient createKmsClient(Map catalogPro }

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676561247 ## core/src/main/java/org/apache/iceberg/BaseSnapshot.java: ## @@ -143,7 +201,24 @@ private void cacheManifests(FileIO fileIO) { if (allManifests == null) {

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676560690 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Spark 3.3/3.4: support read of partition metadata column when table is over 1k [iceberg]

2024-07-12 Thread via GitHub
szehon-ho commented on PR #10641: URL: https://github.com/apache/iceberg/pull/10641#issuecomment-2226523211 Merged, thanks @dramaticlly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Spark 3.3/3.4: support read of partition metadata column when table is over 1k [iceberg]

2024-07-12 Thread via GitHub
szehon-ho merged PR #10641: URL: https://github.com/apache/iceberg/pull/10641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676555890 ## core/src/main/java/org/apache/iceberg/BaseSnapshot.java: ## @@ -62,14 +70,56 @@ class BaseSnapshot implements Snapshot { Map summary, Integer schemaId,

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676552264 ## core/src/main/java/org/apache/iceberg/encryption/StandardEncryptionManager.java: ## @@ -92,13 +94,45 @@ public ByteBuffer wrapKey(ByteBuffer secretKey) { public

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
HonahX commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676549791 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
HonahX commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676549791 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676550818 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -172,6 +234,7 @@ static Snapshot fromJson(JsonNode node) { } } + // Tests only Review Com

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676546177 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -147,6 +169,42 @@ static Snapshot fromJson(JsonNode node) { if (node.has(MANIFEST_LIST)) {

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676544528 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -147,6 +169,42 @@ static Snapshot fromJson(JsonNode node) { if (node.has(MANIFEST_LIST)) {

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676543810 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -147,6 +169,42 @@ static Snapshot fromJson(JsonNode node) { if (node.has(MANIFEST_LIST)) {

Re: [PR] Spark 3.3/3.4: support read of partition metadata column when table is over 1k [iceberg]

2024-07-12 Thread via GitHub
dramaticlly commented on PR #10641: URL: https://github.com/apache/iceberg/pull/10641#issuecomment-2226487110 @szehon-ho -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-12 Thread via GitHub
anuragmantri commented on code in PR #10644: URL: https://github.com/apache/iceberg/pull/10644#discussion_r1676517338 ## docs/docs/spark-writes.md: ## @@ -228,6 +232,24 @@ SET spark.wap.branch = audit-branch INSERT INTO prod.db.table VALUES (3, 'c'); ``` +### Via DataFrames

Re: [PR] Encryption integration and test [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1676515597 ## core/src/main/java/org/apache/iceberg/TableMetadataParser.java: ## @@ -123,6 +127,7 @@ public static void internalWrite( TableMetadata metadata, OutputFile out

Re: [PR] Encryption integration and test [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1676514642 ## api/src/main/java/org/apache/iceberg/encryption/EncryptingFileIO.java: ## @@ -109,14 +111,19 @@ public InputFile newInputFile(ManifestFile manifest) { } } -

Re: [PR] Encryption integration and test [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1676512526 ## .palantir/revapi.yml: ## @@ -1018,6 +1018,17 @@ acceptedBreaks: old: "method void org.apache.iceberg.PositionDeletesTable.PositionDeletesBatchScan::(org.apach

Re: [PR] Encryption integration and test [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #5544: URL: https://github.com/apache/iceberg/pull/5544#discussion_r1676510489 ## .palantir/revapi.yml: ## @@ -1018,6 +1018,17 @@ acceptedBreaks: old: "method void org.apache.iceberg.PositionDeletesTable.PositionDeletesBatchScan::(org.apach

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676510282 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -106,6 +124,10 @@ public static String toJson(Snapshot snapshot, boolean pretty) { } static

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676501189 ## core/src/main/java/org/apache/iceberg/SnapshotParser.java: ## @@ -172,6 +234,7 @@ static Snapshot fromJson(JsonNode node) { } } + // Tests only Review Com

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676495226 ## core/src/main/java/org/apache/iceberg/BaseSnapshot.java: ## @@ -143,7 +201,24 @@ private void cacheManifests(FileIO fileIO) { if (allManifests == null) {

Re: [PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-12 Thread via GitHub
szehon-ho commented on code in PR #10644: URL: https://github.com/apache/iceberg/pull/10644#discussion_r1676490679 ## docs/docs/spark-writes.md: ## @@ -228,6 +232,24 @@ SET spark.wap.branch = audit-branch INSERT INTO prod.db.table VALUES (3, 'c'); ``` +### Via DataFrames + +

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676486826 ## core/src/main/java/org/apache/iceberg/encryption/EncryptionUtil.java: ## @@ -71,30 +70,35 @@ public static KeyManagementClient createKmsClient(Map catalogPro }

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676486225 ## core/src/main/java/org/apache/iceberg/encryption/EncryptionUtil.java: ## @@ -71,30 +70,35 @@ public static KeyManagementClient createKmsClient(Map catalogPro }

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676463963 ## pyiceberg/io/pyarrow.py: ## @@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: Optional[pa.Array], field_id: int, _: st except ValueEr

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676463963 ## pyiceberg/io/pyarrow.py: ## @@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: Optional[pa.Array], field_id: int, _: st except ValueEr

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676465583 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676478216 ## core/src/main/java/org/apache/iceberg/BaseSnapshot.java: ## @@ -62,14 +70,56 @@ class BaseSnapshot implements Snapshot { Map summary, Integer schemaId,

Re: [PR] Repair manifest action [iceberg]

2024-07-12 Thread via GitHub
danielcweeks commented on PR #10445: URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2226375695 > It looks similar to my attempt in #2608 @szehon-ho would you have any issue if we proceed with this PR? I think there's overlap between the two, but this one addresses two

Re: [PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-12 Thread via GitHub
anuragmantri commented on code in PR #10644: URL: https://github.com/apache/iceberg/pull/10644#discussion_r1676475608 ## docs/docs/spark-writes.md: ## @@ -228,6 +232,24 @@ SET spark.wap.branch = audit-branch INSERT INTO prod.db.table VALUES (3, 'c'); ``` +### Via DataFrames

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676470278 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676470278 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676469711 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676467771 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Allow writing dataframes that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676467035 ## tests/integration/test_writes/test_writes.py: ## @@ -964,18 +964,38 @@ def test_sanitize_character_partitioned(catalog: Catalog) -> None: assert len(tbl.sc

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676465401 ## api/src/main/java/org/apache/iceberg/ManifestListFile.java: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contrib

Re: [PR] Allow writing dataframes that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676465583 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: Value

Re: [PR] Allow writing dataframes that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676463963 ## pyiceberg/io/pyarrow.py: ## @@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: Optional[pa.Array], field_id: int, _: st except ValueEr

Re: [PR] Manifest list encryption [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1676462895 ## api/src/main/java/org/apache/iceberg/Snapshot.java: ## @@ -162,6 +162,16 @@ default Iterable removedDeleteFiles(FileIO io) { */ String manifestListLocation();

[PR] Update _check_compatible_schema to support subset of schema [iceberg-python]

2024-07-12 Thread via GitHub
syun64 opened a new pull request, #921: URL: https://github.com/apache/iceberg-python/pull/921 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Deprecate to_requested_schema [iceberg-python]

2024-07-12 Thread via GitHub
HonahX merged PR #918: URL: https://github.com/apache/iceberg-python/pull/918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

2024-07-12 Thread via GitHub
sumedhsakdeo commented on PR #10336: URL: https://github.com/apache/iceberg/pull/10336#issuecomment-2226308996 Thank you so much @szehon-ho for your contribution to spark side. OPTIONS is indeed the right way to achieve this functionality. I was wondering if UPDATE and DELETE support is als

Re: [I] glue.endpoint config implementation? [iceberg-python]

2024-07-12 Thread via GitHub
HonahX closed issue #414: glue.endpoint config implementation? URL: https://github.com/apache/iceberg-python/issues/414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Glue endpoint config variable, continue #530 [iceberg-python]

2024-07-12 Thread via GitHub
HonahX merged PR #920: URL: https://github.com/apache/iceberg-python/pull/920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] [Bug] Load the proper AWS credential for glue/dynamodb catalog [iceberg-python]

2024-07-12 Thread via GitHub
jayceslesar commented on issue #892: URL: https://github.com/apache/iceberg-python/issues/892#issuecomment-2226257388 maybe its fine to just break compatibility leading up to 1.0.0? What does the java iceberg do when looking for s3 credentials? -- This is an automated message from the Apa

Re: [PR] [Docs] Add examples for DataFrame branch writes [iceberg]

2024-07-12 Thread via GitHub
anuragmantri commented on code in PR #10644: URL: https://github.com/apache/iceberg/pull/10644#discussion_r1676383706 ## docs/docs/spark-writes.md: ## @@ -332,6 +332,30 @@ The writer must enable the `mergeSchema` option. ```scala data.writeTo("prod.db.sample").option("mergeSch

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
grantatspothero commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676380801 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -192,6 +192,11 @@ protected void cleanUncommitted(Set committed) { } } + @Override

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
grantatspothero commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676375211 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -198,6 +198,14 @@ protected void cleanUncommitted(Set committed) { } } + @Override

Re: [PR] support PyArrow timestamptz with Etc/UTC [iceberg-python]

2024-07-12 Thread via GitHub
HonahX commented on PR #910: URL: https://github.com/apache/iceberg-python/pull/910#issuecomment-2226229012 Merged! Thanks for the great work from @syun64 and reviews from @Fokko @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Cannot cast a datetime type with a timezone into a timestampz type. [iceberg-python]

2024-07-12 Thread via GitHub
HonahX closed issue #863: Cannot cast a datetime type with a timezone into a timestampz type. URL: https://github.com/apache/iceberg-python/issues/863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] support PyArrow timestamptz with Etc/UTC [iceberg-python]

2024-07-12 Thread via GitHub
HonahX merged PR #910: URL: https://github.com/apache/iceberg-python/pull/910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [I] Chnage the description in Table metadata spec about the cardinality/mapping between snapshot and puffin [iceberg]

2024-07-12 Thread via GitHub
karuppayya commented on issue #10693: URL: https://github.com/apache/iceberg/issues/10693#issuecomment-2226179436 cc: @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Allow writing dataframes that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
syun64 commented on code in PR #829: URL: https://github.com/apache/iceberg-python/pull/829#discussion_r1676337956 ## pyiceberg/table/__init__.py: ## @@ -484,10 +484,6 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) _check_schema_com

Re: [I] provide night build [iceberg-python]

2024-07-12 Thread via GitHub
kevinjqliu commented on issue #734: URL: https://github.com/apache/iceberg-python/issues/734#issuecomment-2226162839 Closing in favor of #872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] provide night build [iceberg-python]

2024-07-12 Thread via GitHub
kevinjqliu closed issue #734: provide night build URL: https://github.com/apache/iceberg-python/issues/734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Allow writing dataframes that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-12 Thread via GitHub
kevinjqliu commented on code in PR #829: URL: https://github.com/apache/iceberg-python/pull/829#discussion_r1676323868 ## pyiceberg/table/__init__.py: ## @@ -484,10 +484,6 @@ def append(self, df: pa.Table, snapshot_properties: Dict[str, str] = EMPTY_DICT) _check_schema

Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on PR #10352: URL: https://github.com/apache/iceberg/pull/10352#issuecomment-2226155639 Sorry about the delay on this, got busy and forgot I had this open! I've seen more related issue reports to this, so I'm going to prioritize it. -- This is an automated messa

Re: [I] How to move Iceberg table from one location to another [iceberg]

2024-07-12 Thread via GitHub
namangoel31 commented on issue #3142: URL: https://github.com/apache/iceberg/issues/3142#issuecomment-2226154660 @cccs-jc, how do you determine the schema for writing to avro? I'm not able to get anything useful. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Core:support redis and http lock-manager [iceberg]

2024-07-12 Thread via GitHub
danielcweeks commented on PR #10688: URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226150038 I don't think we should be adding new `LockManager` implementations. The general discussion has been that we want to deprecate catalog implementations that rely on external lockin

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676308188 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -198,6 +198,14 @@ protected void cleanUncommitted(Set committed) { } } + @Overrid

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676308188 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -198,6 +198,14 @@ protected void cleanUncommitted(Set committed) { } } + @Overrid

Re: [PR] Core:support redis and http lock-manager [iceberg]

2024-07-12 Thread via GitHub
rdblue commented on code in PR #10688: URL: https://github.com/apache/iceberg/pull/10688#discussion_r1676303612 ## build.gradle: ## @@ -358,6 +358,7 @@ project(':iceberg-core') { implementation libs.jackson.databind implementation libs.caffeine implementation libs

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676297839 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -198,6 +198,14 @@ protected void cleanUncommitted(Set committed) { } } + @Overrid

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on PR #10523: URL: https://github.com/apache/iceberg/pull/10523#issuecomment-2226118154 Thanks @grantatspothero the overall approach makes sense and this time it is closely dependent on the internal state of `FastAppend` which combined with the new tests should ma

Re: [PR] Core: Allow SnapshotProducer to skip uncommitted manifest cleanup after commit [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10523: URL: https://github.com/apache/iceberg/pull/10523#discussion_r1676113425 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -565,6 +570,10 @@ protected boolean canInheritSnapshotId() { return canInheritSnap

Re: [PR] #10668 - Support case-insensitivity for column names in PartitionSpec [iceberg]

2024-07-12 Thread via GitHub
dramaticlly commented on PR #10678: URL: https://github.com/apache/iceberg/pull/10678#issuecomment-2226008338 > > @sl255051 appreciate you are taking the stub for the PR. > > But I am wondering why do you think column name case insensitivity is the right behavior when building PartitionSp

Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } +Map> specToDel

Re: [PR] #10668 - Support case-insensitivity for column names in PartitionSpec [iceberg]

2024-07-12 Thread via GitHub
sl255051 commented on PR #10678: URL: https://github.com/apache/iceberg/pull/10678#issuecomment-2225993288 > @sl255051 appreciate you are taking the stub for the PR. > > But I am wondering why do you think column name case insensitivity is the right behavior when building PartitionSpe

Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } +Map> specToDel

Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } +Map> specToDel

Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } +Map> specToDel

Re: [PR] Docs: Update defaults for distribution mode [iceberg]

2024-07-12 Thread via GitHub
szehon-ho commented on code in PR #10575: URL: https://github.com/apache/iceberg/pull/10575#discussion_r1676206823 ## docs/docs/configuration.md: ## @@ -67,7 +67,7 @@ Iceberg tables support table properties to configure table behavior, like the de | write.metadata.metrics.colu

Re: [PR] Support Spark Column Stats [iceberg]

2024-07-12 Thread via GitHub
findepi commented on code in PR #10659: URL: https://github.com/apache/iceberg/pull/10659#discussion_r1676177423 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -175,7 +181,25 @@ public Statistics estimateStatistics() { protected Statis

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-12 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1676163120 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -192,4 +209,65 @@ public synchronized T next() { return queue.poll(); } } +

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-12 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1676170877 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -88,7 +91,18 @@ private ParallelIterator( @Override public void close() {

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-12 Thread via GitHub
findepi commented on code in PR #10691: URL: https://github.com/apache/iceberg/pull/10691#discussion_r1676163120 ## core/src/main/java/org/apache/iceberg/util/ParallelIterable.java: ## @@ -192,4 +209,65 @@ public synchronized T next() { return queue.poll(); } } +

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-12 Thread via GitHub
findepi commented on PR #10691: URL: https://github.com/apache/iceberg/pull/10691#issuecomment-2225908686 @stevenzwu thanks for your comments! > Curious if you have done any performance testing. echo to another comment. wondering if the default queue size of 10K would affect the thro

Re: [PR] Core: Limit memory used by ParallelIterable [iceberg]

2024-07-12 Thread via GitHub
findepi commented on PR #10691: URL: https://github.com/apache/iceberg/pull/10691#issuecomment-2225902783 > > Can't the caller set a lower limit then, by calling the new constructor overload? > > Yes, that's possible but then you already have to inherit quite a few classes to overloa

  1   2   >