Re: [PR] Exclude reading _pos column if it's not in the scan list [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2451380014 Hm then can we we just add _pos to requiredSchema (with a comment)? Probably cleaner with a flag to ReadConf but not sure if its feasible. fyi @aokolnychyi -- This is an

Re: [I] bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema [iceberg-rust]

2024-10-31 Thread via GitHub
chenzl25 commented on issue #627: URL: https://github.com/apache/iceberg-rust/issues/627#issuecomment-2451280714 I think this issue has been resolved by type promotion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema [iceberg-rust]

2024-10-31 Thread via GitHub
chenzl25 closed issue #627: bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema URL: https://github.com/apache/iceberg-rust/issues/627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Exclude reading _pos column if it's not in the scan list [iceberg]

2024-10-31 Thread via GitHub
huaxingao commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2451241576 @szehon-ho Thanks for the comment. We actually also use the [requiredSchema](https://github.com/apache/iceberg/blob/fda2b3a5706fd580b0371e8a7c4b31d536eac0a3/spark/v3.5/spark/src

[I] feat: Support Parquet modular encryption [iceberg-rust]

2024-10-31 Thread via GitHub
adamreeve opened a new issue, #686: URL: https://github.com/apache/iceberg-rust/issues/686 Hi The Java Iceberg implementation is adding support for using native Parquet modular encryption, which is being developed as part of version 3 of the Iceberg specification: https://github.com/

Re: [I] Core: checkpoint validation in BaseOverwriteFiles [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] closed issue #9718: Core: checkpoint validation in BaseOverwriteFiles URL: https://github.com/apache/iceberg/issues/9718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Exclude reading _pos column if it's not in the scan list [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on PR #11390: URL: https://github.com/apache/iceberg/pull/11390#issuecomment-2451120638 Sorry I still wanted to see if it can be done earlier, what do you think https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/Batc

Re: [I] how to integrations object storage ceph ? [iceberg]

2024-10-31 Thread via GitHub
ravileg commented on issue #7158: URL: https://github.com/apache/iceberg/issues/7158#issuecomment-2451116433 > I am looking a solution to use Ceph with Iceberg. Currently I used MinIO but the we looking for an alternative solution to replace MinIO. Could you share technical detail how to co

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10246: URL: https://github.com/apache/iceberg/pull/10246#issuecomment-2451055953 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Connect to multiple Azure accounts [iceberg-python]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #636: URL: https://github.com/apache/iceberg-python/issues/636#issuecomment-2451058448 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apac

Re: [I] Connect to multiple Azure accounts [iceberg-python]

2024-10-31 Thread via GitHub
github-actions[bot] closed issue #636: Connect to multiple Azure accounts URL: https://github.com/apache/iceberg-python/issues/636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Documentation page returning 404 [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #10249: URL: https://github.com/apache/iceberg/issues/10249#issuecomment-2451056046 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [I] How to reinitialize/refresh iceberg catalog object in spark catalog on an ongoing spark session [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #10227: URL: https://github.com/apache/iceberg/issues/10227#issuecomment-2451055771 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Don't let `o.a.i.util.Tasks` log unnecessary stack traces [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10206: URL: https://github.com/apache/iceberg/pull/10206#issuecomment-2451055754 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Spark: Use compressed trie for storing set of files to remove on driver for orphan files [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10229: URL: https://github.com/apache/iceberg/pull/10229#issuecomment-2451055827 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Parquet: page skipping using filtered row groups for non-vectorized read [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10228: URL: https://github.com/apache/iceberg/pull/10228#issuecomment-2451055791 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Nessie: Make handleExceptionsForCommits public in NessieUtil [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10248: URL: https://github.com/apache/iceberg/pull/10248#issuecomment-2451056013 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Flink/Azure job graph serialization fails when used with storage account shared key authentication [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #10245: URL: https://github.com/apache/iceberg/issues/10245#issuecomment-2451055891 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] API, Core: Add Snapshot#dataManifests/deleteManifests APIs which more efficiently filter by added snapshot ID [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10244: URL: https://github.com/apache/iceberg/pull/10244#issuecomment-2451055858 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Build: Move build configurations to project dirs [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10097: URL: https://github.com/apache/iceberg/pull/10097#issuecomment-2451055518 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If

Re: [I] Iceberg Hidden Partitioning and Spark SQL Wide Transformation Optimization [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #10187: URL: https://github.com/apache/iceberg/issues/10187#issuecomment-2451055638 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occur

Re: [PR] Fix incorrect metrics calculation for iceberg table due to column name transformation with special characters. [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10204: URL: https://github.com/apache/iceberg/pull/10204#issuecomment-2451055726 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [I] Improve read times and reduce size of metadata.json by storing schemas in external files [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] closed issue #9734: Improve read times and reduce size of metadata.json by storing schemas in external files URL: https://github.com/apache/iceberg/issues/9734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] OpenAPI: Add AppendDataFile models to openapi spec for fine grained metadata commits [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #10202: URL: https://github.com/apache/iceberg/pull/10202#issuecomment-2451055680 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pul

Re: [PR] Build: Move build configurations to project dirs [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] closed pull request #10097: Build: Move build configurations to project dirs URL: https://github.com/apache/iceberg/pull/10097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] How to insert overwrite with a single commit [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] closed issue #9720: How to insert overwrite with a single commit URL: https://github.com/apache/iceberg/issues/9720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Core: checkpoint validation in BaseOverwriteFiles [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #9718: URL: https://github.com/apache/iceberg/issues/9718#issuecomment-2451055248 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] How to insert overwrite with a single commit [iceberg]

2024-10-31 Thread via GitHub
github-actions[bot] commented on issue #9720: URL: https://github.com/apache/iceberg/issues/9720#issuecomment-2451055266 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Max number of columns [iceberg]

2024-10-31 Thread via GitHub
puchengy commented on issue #9220: URL: https://github.com/apache/iceberg/issues/9220#issuecomment-2451035447 @ajantha-bhat Hi, where are you seeing it is starting from 10k? All I see is it is starting from 1k https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/

Re: [PR] Bump version to 0.8.0 [iceberg-python]

2024-10-31 Thread via GitHub
HonahX merged PR #1276: URL: https://github.com/apache/iceberg-python/pull/1276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Bump version to 0.8.0 [iceberg-python]

2024-10-31 Thread via GitHub
HonahX commented on PR #1276: URL: https://github.com/apache/iceberg-python/pull/1276#issuecomment-2451020886 @Fokko Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Core: Add validation for table commit properties [iceberg]

2024-10-31 Thread via GitHub
dramaticlly commented on code in PR #11437: URL: https://github.com/apache/iceberg/pull/11437#discussion_r1825258756 ## core/src/main/java/org/apache/iceberg/util/PropertyUtil.java: ## @@ -100,6 +102,30 @@ public static String propertyAsString( return defaultValue; } +

[PR] Bump pandas from 2.0.3 to 2.2.3 [iceberg-python]

2024-10-31 Thread via GitHub
dependabot[bot] opened a new pull request, #1282: URL: https://github.com/apache/iceberg-python/pull/1282 Bumps [pandas](https://github.com/pandas-dev/pandas) from 2.0.3 to 2.2.3. Release notes Sourced from https://github.com/pandas-dev/pandas/releases";>pandas's releases. P

Re: [PR] Core: Add validation for table commit properties [iceberg]

2024-10-31 Thread via GitHub
dramaticlly commented on code in PR #11437: URL: https://github.com/apache/iceberg/pull/11437#discussion_r1825260078 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -486,6 +489,10 @@ public int propertyAsInt(String property, int defaultValue) { return Pr

Re: [PR] Core: Add validation for table commit properties [iceberg]

2024-10-31 Thread via GitHub
dramaticlly commented on code in PR #11437: URL: https://github.com/apache/iceberg/pull/11437#discussion_r1825257015 ## core/src/main/java/org/apache/iceberg/util/PropertyUtil.java: ## @@ -100,6 +102,30 @@ public static String propertyAsString( return defaultValue; } +

[PR] Bump mypy-boto3-glue from 1.35.25 to 1.35.53 [iceberg-python]

2024-10-31 Thread via GitHub
dependabot[bot] opened a new pull request, #1281: URL: https://github.com/apache/iceberg-python/pull/1281 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.35.25 to 1.35.53. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commi

Re: [PR] Core: Add validation for table commit properties [iceberg]

2024-10-31 Thread via GitHub
dramaticlly commented on code in PR #11437: URL: https://github.com/apache/iceberg/pull/11437#discussion_r1825256804 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -95,6 +95,13 @@ private TableProperties() {} public static final String COMMIT_TOTAL_RETRY

[PR] Bump pyspark from 3.5.2 to 3.5.3 [iceberg-python]

2024-10-31 Thread via GitHub
dependabot[bot] opened a new pull request, #1280: URL: https://github.com/apache/iceberg-python/pull/1280 Bumps [pyspark](https://github.com/apache/spark) from 3.5.2 to 3.5.3. Commits https://github.com/apache/spark/commit/32232e9ed33bb16b93ad58cfde8b82e0f07c0970";>32232e9 Prep

Re: [PR] Core: log retry sleep time [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer commented on PR #11413: URL: https://github.com/apache/iceberg/pull/11413#issuecomment-2450866674 No, the RC was cut yesterday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Manifest list encryption [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer commented on PR #7770: URL: https://github.com/apache/iceberg/pull/7770#issuecomment-2450883445 @rdblue Did you have any more comments on this one? I can do another pass as well but I'd like to finish this up as well soon -- This is an automated message from the Apache Git

[I] [bug] read from multiple s3 regions [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu opened a new issue, #1279: URL: https://github.com/apache/iceberg-python/issues/1279 ### Apache Iceberg version None ### Please describe the bug 🐞 ### Problem I want to read files from multiple s3 regions. For example, my metadata files are in `us-west-2` b

Re: [PR] Core: log retry sleep time [iceberg]

2024-10-31 Thread via GitHub
sullis commented on PR #11413: URL: https://github.com/apache/iceberg/pull/11413#issuecomment-2450863716 @RussellSpitzer will this be included in Iceberg 1.7.0 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] [bug] read from multiple s3 regions [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on issue #1279: URL: https://github.com/apache/iceberg-python/issues/1279#issuecomment-2450862469 Maybe similar issue for GCS/Azure, since we only cached 1 instance of each FileSystem -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Ignore schema merge updates from long -> int [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer commented on code in PR #11419: URL: https://github.com/apache/iceberg/pull/11419#discussion_r1825187149 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java: ## @@ -226,4 +226,40 @@ public void testWriteWithCaseSensitiveOpt

Re: [PR] Core: log retry sleep time [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer commented on PR #11413: URL: https://github.com/apache/iceberg/pull/11413#issuecomment-2450851798 Thanks @sullis for the pr and @jbonofre for the fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Core: log retry sleep time [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer merged PR #11413: URL: https://github.com/apache/iceberg/pull/11413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [I] Block writing to sorted tables [iceberg-python]

2024-10-31 Thread via GitHub
Fokko commented on issue #1247: URL: https://github.com/apache/iceberg-python/issues/1247#issuecomment-2450831332 I don't think we should block the writes as that's pretty aggressive. What we do today when a table has a sort order, we write the data, but set the sort-order to none:

[PR] Pass table-token to subsequent requests [iceberg-python]

2024-10-31 Thread via GitHub
Fokko opened a new pull request, #1278: URL: https://github.com/apache/iceberg-python/pull/1278 See open-api spec: https://github.com/apache/iceberg/blob/ea61ee46db17d94f22a5ef11fd913146557bdce7/open-api/rest-catalog-open-api.yaml#L927-L929 Resolves #1113 -- This is an autom

Re: [PR] Support `Table.to_arrow_batch_reader` to return RecordBatchReader instead of a fully materialized Arrow Table [iceberg-python]

2024-10-31 Thread via GitHub
corleyma commented on code in PR #786: URL: https://github.com/apache/iceberg-python/pull/786#discussion_r1637425676 ## pyiceberg/io/pyarrow.py: ## @@ -1795,15 +1873,19 @@ def write_file(io: FileIO, table_metadata: TableMetadata, tasks: Iterator[WriteT def write_parquet(

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho merged PR #11396: URL: https://github.com/apache/iceberg/pull/11396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Docs: warn `parallelism > 1` doesn't work for migration procedures [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer commented on PR #11417: URL: https://github.com/apache/iceberg/pull/11417#issuecomment-2450790986 Thanks @manuzhang We'll figure out that fix before the next release I promise. Thanks @Fokko for review -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Docs: warn `parallelism > 1` doesn't work for migration procedures [iceberg]

2024-10-31 Thread via GitHub
RussellSpitzer merged PR #11417: URL: https://github.com/apache/iceberg/pull/11417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on PR #11396: URL: https://github.com/apache/iceberg/pull/11396#issuecomment-2450763796 Merged, thanks @dramaticlly , also @RussellSpitzer @himadripal @huaxingao @singhpk234 for reviews ! -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Core: delete temp metadata file when version already exists [iceberg]

2024-10-31 Thread via GitHub
leesf commented on PR #11350: URL: https://github.com/apache/iceberg/pull/11350#issuecomment-2449690582 @Fokko I think the PR to good the merge, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1825102384 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -107,4 +116,18 @@ static void purgeCatalogTestEntries(RESTCatalog catalog) {

Re: [I] Support Vended Credentials for Azure Data Lake Store [iceberg-python]

2024-10-31 Thread via GitHub
sfc-gh-tbenroeck commented on issue #1146: URL: https://github.com/apache/iceberg-python/issues/1146#issuecomment-2450669506 I created a custom FileIO fix as a temporary workaround and I've submitted [Polaris #418](https://github.com/apache/polaris/issues/) ``` catalog = load_catalog(

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1825101193 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,18 +86,30 @@ protected static Object[][] parameters() { }

Re: [PR] Spark: Add view support to SparkSessionCatalog [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11388: URL: https://github.com/apache/iceberg/pull/11388#discussion_r1825020356 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java: ## @@ -37,12 +38,19 @@ public class RESTCatalogServer { private static fina

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1825097791 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -107,4 +116,18 @@ static void purgeCatalogTestEntries(RESTCatalog catalog) {

Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

2024-10-31 Thread via GitHub
corleyma commented on issue #30: URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-2450705382 @kevinjqliu alas it's not as simple for iceberg because of the need to do field id-based projection to handle schema evolution. Somewhat relatedly: from what I remember,

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
haizhou-zhao commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1825086332 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -36,16 +41,46 @@ import org.apache.iceberg.catalog.SupportsNamespaces

Re: [I] Runtime jars are not including module's license and notice [iceberg]

2024-10-31 Thread via GitHub
manuzhang commented on issue #11431: URL: https://github.com/apache/iceberg/issues/11431#issuecomment-2449176257 Do you mean `sources` jar like https://repository.apache.org/content/repositories/orgapacheiceberg-1175/org/apache/iceberg/iceberg-spark-runtime-3.5_2.13/1.7.0/iceberg-spark-runti

Re: [PR] Spark: Add view support to SparkSessionCatalog [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11388: URL: https://github.com/apache/iceberg/pull/11388#discussion_r1825021901 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -35,7 +35,9 @@ abstract class BaseCatalog ProcedureCatalog,

Re: [PR] API: Add RemoveUnusedSpecs in Table [iceberg]

2024-10-31 Thread via GitHub
amogh-jahagirdar closed pull request #10755: API: Add RemoveUnusedSpecs in Table URL: https://github.com/apache/iceberg/pull/10755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] API: Add RemoveUnusedSpecs in Table [iceberg]

2024-10-31 Thread via GitHub
advancedxy opened a new pull request, #10755: URL: https://github.com/apache/iceberg/pull/10755 This is a continue work of #3462, all the credits should goes to @RussellSpitzer. Previously there was no way to remove partition specs from a table once they were added. To fix this we

Re: [PR] API: Add RemoveUnusedSpecs in Table [iceberg]

2024-10-31 Thread via GitHub
amogh-jahagirdar commented on PR #10755: URL: https://github.com/apache/iceberg/pull/10755#issuecomment-2450573869 @advancedxy I updated the PR to your branch https://github.com/advancedxy/iceberg/pull/1 in case there was still agreement on adding all of this metadata cleanup as part of sna

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1824745679 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,18 +86,30 @@ protected static Object[][] parameters() { }

[PR] Bump version to 0.8.0 [iceberg-python]

2024-10-31 Thread via GitHub
Fokko opened a new pull request, #1276: URL: https://github.com/apache/iceberg-python/pull/1276 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1824948163 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,13 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1824947630 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,13 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [PR] Core: Add validation for table commit properties [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11437: URL: https://github.com/apache/iceberg/pull/11437#discussion_r1824969871 ## core/src/main/java/org/apache/iceberg/util/PropertyUtil.java: ## @@ -100,6 +102,30 @@ public static String propertyAsString( return defaultValue; } +

Re: [PR] Spark: Add view support to SparkSessionCatalog [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11388: URL: https://github.com/apache/iceberg/pull/11388#discussion_r1824960092 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -76,15 +76,21 @@ static Map environmentCatalogConfig() { HashMap:

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1824947630 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,13 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
dramaticlly commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1824919971 ## docs/docs/spark-procedures.md: ## @@ -447,9 +453,9 @@ Using the same defaults as bin-pack to determine which files to rewrite. CALL catalog_name.system.rewri

Re: [I] Expose PyIceberg table as PyArrow Dataset [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on issue #30: URL: https://github.com/apache/iceberg-python/issues/30#issuecomment-2450455606 Reference to delta table's `to_pyarrow_dataset` implementation https://github.com/delta-io/delta-rs/blob/3f355d87119661fc7cf28877b620b589277ba1d1/python/deltalake/table.py#L

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-10-31 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2450412161 Hi, I just found out in the milestones that v1.7 will no longer support Java 8. However, AWS glue 4.0 only supports Java 8. Therefore, we won't be able to use v1.7. I al

Re: [PR] Add list_views for hive catalog [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on code in PR #1251: URL: https://github.com/apache/iceberg-python/pull/1251#discussion_r1824793043 ## pyiceberg/catalog/hive.py: ## @@ -407,7 +407,30 @@ def register_table(self, identifier: Union[str, Identifier], metadata_location: raise NotImple

Re: [PR] Core: Store schema and spec in TaskContext to avoid unnecessary deserialization (#11235) [iceberg]

2024-10-31 Thread via GitHub
singhpk234 commented on PR #11280: URL: https://github.com/apache/iceberg/pull/11280#issuecomment-2450322800 This sounds interesting, is it just the ser-de that causes this ? what about the increase in memory pressure to hold this in memory ? -- This is an automated message from the Apach

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1817399329 ## docs/docs/spark-procedures.md: ## @@ -393,6 +393,7 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `max-concurrent-file-g

Re: [PR] Doc: Update rewrite data files spark procedure [iceberg]

2024-10-31 Thread via GitHub
szehon-ho commented on code in PR #11396: URL: https://github.com/apache/iceberg/pull/11396#discussion_r1824797640 ## docs/docs/spark-procedures.md: ## @@ -402,7 +403,12 @@ Iceberg can compact data files in parallel using Spark with the `rewriteDataFile | `rewrite-all` | false

Re: [I] Support for timestamp downcasting when loading data to iceberg tables [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on issue #1045: URL: https://github.com/apache/iceberg-python/issues/1045#issuecomment-2450338545 The documentation is at https://py.iceberg.apache.org/configuration/#nanoseconds-support Do you think there's a better place to signal this to the users? -- This

Re: [PR] Deprecate for 0.8.0 release [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on code in PR #1269: URL: https://github.com/apache/iceberg-python/pull/1269#discussion_r1824783872 ## mkdocs/docs/configuration.md: ## @@ -341,7 +341,7 @@ catalog: !!! warning "Deprecated Properties" -`profile_name`, `region_name`, `botocore_sessio

Re: [PR] fix: list_tables method in glue catalog now only return tables. [iceberg-python]

2024-10-31 Thread via GitHub
kevinjqliu commented on PR #1258: URL: https://github.com/apache/iceberg-python/pull/1258#issuecomment-2450290813 Thanks for the contribution @omkenge ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on PR #11093: URL: https://github.com/apache/iceberg/pull/11093#issuecomment-2450262762 Hey @haizhou-zhao this is looking really close. Some minor comments on how I think we can improve the Extension handling to better isolate the test, but other than that, I think i

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1824749990 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -76,15 +78,22 @@ static Map environmentCatalogConfig() { HashMap:

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1824745679 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,18 +86,30 @@ protected static Object[][] parameters() { }

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1824745679 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/TestBaseWithCatalog.java: ## @@ -59,18 +86,30 @@ protected static Object[][] parameters() { }

Re: [PR] Add REST Catalog tests to Spark 3.5 integration test [iceberg]

2024-10-31 Thread via GitHub
danielcweeks commented on code in PR #11093: URL: https://github.com/apache/iceberg/pull/11093#discussion_r1824709276 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -107,4 +116,18 @@ static void purgeCatalogTestEntries(RESTCatalog catalog) {

Re: [PR] Replace `numpy` usage and remove from `pyproject.toml` [iceberg-python]

2024-10-31 Thread via GitHub
Fokko merged PR #1272: URL: https://github.com/apache/iceberg-python/pull/1272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [I] [RewriteDataFiles] add option from-snapshot to support minor compaction [iceberg]

2024-10-31 Thread via GitHub
mkegelCognism commented on issue #10824: URL: https://github.com/apache/iceberg/issues/10824#issuecomment-2449900384 @xianyouQ any news around this? Did you implement a workaround for this or has this been implemented? -- This is an automated message from the Apache Git Service. To respon

Re: [I] Runtime jars are not including module's license and notice [iceberg]

2024-10-31 Thread via GitHub
Fokko commented on issue #11431: URL: https://github.com/apache/iceberg/issues/11431#issuecomment-2449806988 No problem at all, these things are important for the release, so better to double check it than having non-ASF compliant releases. -- This is an automated message from the Apache

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-31 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2449765015 Hadoop common might still need some dependency during runtime. Let me test end to end. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] (AWS) Docs: List all AWS S3 properties from all language impl. [iceberg]

2024-10-31 Thread via GitHub
Neuw84 commented on code in PR #11383: URL: https://github.com/apache/iceberg/pull/11383#discussion_r1824257182 ## docs/docs/aws.md: ## @@ -717,13 +724,21 @@ install_dependencies () { install_dependencies $LIB_PATH $ICEBERG_MAVEN_URL $ICEBERG_VERSION "${ICEBERG_PACKAGES[@]}"

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-31 Thread via GitHub
ajantha-bhat commented on PR #11279: URL: https://github.com/apache/iceberg/pull/11279#issuecomment-2449642232 @Fokko: Thanks for the review. We can merge this and I can rebase the docker PR. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] (AWS) Docs: List all AWS S3 properties from all language impl. [iceberg]

2024-10-31 Thread via GitHub
Neuw84 commented on code in PR #11383: URL: https://github.com/apache/iceberg/pull/11383#discussion_r1824255200 ## docs/docs/aws.md: ## @@ -669,6 +669,13 @@ Users can use catalog properties to override the defaults. For example, to confi --conf spark.sql.catalog.my_catalog.htt

Re: [I] Runtime jars are not including module's license and notice [iceberg]

2024-10-31 Thread via GitHub
ajantha-bhat commented on issue #11431: URL: https://github.com/apache/iceberg/issues/11431#issuecomment-2449626642 Thanks. Seems to be the problem with the archive utitlity.app in the mac. Which was overwriting the identical files without the prompt. I can confirm it is not the issu

Re: [PR] Spark: support rewrite on specified target branch [iceberg]

2024-10-31 Thread via GitHub
amitgilad3 commented on PR #8797: URL: https://github.com/apache/iceberg/pull/8797#issuecomment-2449684887 Hey @jackye1995 - i fixed all your comments but for some reason one test fails and i am not able to reproduce it locally , any chance to assist with this ? -- This is an automated

Re: [I] Runtime jars are not including module's license and notice [iceberg]

2024-10-31 Thread via GitHub
ajantha-bhat closed issue #11431: Runtime jars are not including module's license and notice URL: https://github.com/apache/iceberg/issues/11431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [I] Unsupported Spark Creating Views Operation for s3_catalog [iceberg]

2024-10-31 Thread via GitHub
nastra closed issue #11440: Unsupported Spark Creating Views Operation for s3_catalog URL: https://github.com/apache/iceberg/issues/11440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Unsupported Spark Creating Views Operation for s3_catalog [iceberg]

2024-10-31 Thread via GitHub
nastra commented on issue #11440: URL: https://github.com/apache/iceberg/issues/11440#issuecomment-2449609037 This is because your `s3_catalog` is of type `hadoop` and that catalog doesn't support iceberg views -- This is an automated message from the Apache Git Service. To respond to the

  1   2   >