[PR] feat(table): Implement converting Iceberg schema and types to Arrow [iceberg-go]

2024-10-09 Thread via GitHub
zeroshade opened a new pull request, #168: URL: https://github.com/apache/iceberg-go/pull/168 #155 implemented the conversion of Arrow schemas to Iceberg which will be needed for reading data from Parquet files or otherwise. This PR implements the reverse, converting Iceberg schemas and typ

Re: [I] Support commit retries [iceberg-python]

2024-10-09 Thread via GitHub
kevinjqliu commented on issue #269: URL: https://github.com/apache/iceberg-python/issues/269#issuecomment-2402909492 As a workaround, to manually retry commits, update the table metadata by using ``` table = table.refresh() ``` before calling `commit()` again -- This is an

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793737541 ## .github/workflows/java-ci.yml: ## @@ -108,7 +108,7 @@ jobs: runs-on: ubuntu-22.04 strategy: matrix: -jvm: [11, 17, 21] +jvm

Re: [PR] Flink: FlinkSink & IcebergSink desynchronized tests alignment [iceberg]

2024-10-09 Thread via GitHub
rodmeneses commented on PR #11249: URL: https://github.com/apache/iceberg/pull/11249#issuecomment-2403483118 @pvary @stevenzwu could you guys please start the CI pipelines on this PR? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Flink: Tests alignment for the Flink Sink v2-based implemenation (IcebergSink) [iceberg]

2024-10-09 Thread via GitHub
rodmeneses commented on PR #11219: URL: https://github.com/apache/iceberg/pull/11219#issuecomment-2403492186 Hi @arkadius I have started working in backporting the RANGE distribution to the IcebergSink. The unit tests in my code will benefit from the new marker interface you are introdu

[PR] Bump pypa/cibuildwheel from 2.21.1 to 2.21.3 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] opened a new pull request, #1224: URL: https://github.com/apache/iceberg-python/pull/1224 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.21.1 to 2.21.3. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwh

Re: [PR] Bump pypa/cibuildwheel from 2.21.1 to 2.21.2 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #1216: URL: https://github.com/apache/iceberg-python/pull/1216#issuecomment-2403520507 Superseded by #1224. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Bump pypa/cibuildwheel from 2.21.1 to 2.21.2 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #1216: Bump pypa/cibuildwheel from 2.21.1 to 2.21.2 URL: https://github.com/apache/iceberg-python/pull/1216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] feat(table): add conversion from Arrow Schema to Iceberg [iceberg-go]

2024-10-09 Thread via GitHub
nastra merged PR #155: URL: https://github.com/apache/iceberg-go/pull/155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

Re: [PR] build(deps): bump github.com/hamba/avro/v2 from 2.23.0 to 2.26.0 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #159: URL: https://github.com/apache/iceberg-go/pull/159#issuecomment-2402835620 Looks like github.com/hamba/avro/v2 is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.27.39 to 1.27.43 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] opened a new pull request, #167: URL: https://github.com/apache/iceberg-go/pull/167 Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.39 to 1.27.43. Commits https://github.com/aws/aws-sdk-go-v2/commit/0cbb5aa17f9078cb45

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.27.39 to 1.27.41 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #162: build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.27.39 to 1.27.41 URL: https://github.com/apache/iceberg-go/pull/162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.61.2 to 1.65.2 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] opened a new pull request, #166: URL: https://github.com/apache/iceberg-go/pull/166 Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.61.2 to 1.65.2. Commits https://github.com/aws/aws-sdk-go-v2/commit/0cbb5aa17f9078cb

Re: [PR] build(deps): bump github.com/hamba/avro/v2 from 2.23.0 to 2.26.0 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #159: build(deps): bump github.com/hamba/avro/v2 from 2.23.0 to 2.26.0 URL: https://github.com/apache/iceberg-go/pull/159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.37 to 1.17.41 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] opened a new pull request, #165: URL: https://github.com/apache/iceberg-go/pull/165 Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.17.37 to 1.17.41. Commits https://github.com/aws/aws-sdk-go-v2/commit/0cbb5aa17f907

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.37 to 1.17.39 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #161: URL: https://github.com/apache/iceberg-go/pull/161#issuecomment-2402837089 Superseded by #165. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.37 to 1.17.39 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #161: build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.17.37 to 1.17.39 URL: https://github.com/apache/iceberg-go/pull/161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.27.39 to 1.27.41 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #162: URL: https://github.com/apache/iceberg-go/pull/162#issuecomment-2402837888 Superseded by #167. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.61.2 to 1.65.0 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #164: URL: https://github.com/apache/iceberg-go/pull/164#issuecomment-2402837310 Superseded by #166. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.61.2 to 1.65.0 [iceberg-go]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #164: build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.61.2 to 1.65.0 URL: https://github.com/apache/iceberg-go/pull/164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] OpenAPI: Define REST Catalog models for Row-Level Updates [iceberg]

2024-10-09 Thread via GitHub
jackye1995 commented on code in PR #11287: URL: https://github.com/apache/iceberg/pull/11287#discussion_r1793868108 ## open-api/rest-catalog-open-api.py: ## @@ -896,19 +896,6 @@ class SetPartitionStatisticsUpdate(BaseUpdate): ) -class TableRequirement(BaseModel): Revie

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
sumedhsakdeo commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1793865228 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is:

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1794003825 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns i

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1794307274 ## format/spec.md: ## @@ -841,14 +842,38 @@ Notes: ## Delete Formats -This section details how to encode row-level deletes in Iceberg delete files. Row-level del

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1794321464 ## gradle/libs.versions.toml: ## @@ -47,6 +47,7 @@ flink120 = { strictly = "1.20.0"} google-libraries-bom = "26.47.0" guava = "33.3.0-jre" hadoop2 = "2.7.3" +had

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1794322243 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestCompressionSettings.java: ## @@ -108,14 +108,14 @@ public static Object[][] parameters() {

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1794321692 ## spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -30,7 +30,7 @@ import org.

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1794322082 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -138,6 +138,7 @@ public class TestRewriteDataFilesAction e

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1794321900 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownInRowLevelOperations.java: ## @@ -260,7 +267,12 @@ privat

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793776491 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestStructuredStreaming.java: ## @@ -302,4 +303,12 @@ private MemoryStream newMemoryStream(i

Re: [PR] Core: fix NPE with HadoopFileIO because FileIOParser doesn't serialize Hadoop configuration [iceberg]

2024-10-09 Thread via GitHub
stevenzwu commented on PR #10926: URL: https://github.com/apache/iceberg/pull/10926#issuecomment-2402802516 > @stevenzwu for Option 1, wouldn't https://github.com/apache/iceberg/pull/10926/files#r1718243019 also solve the issue with the NPE without introducing a Hadoop dependency on the Par

Re: [PR] feat(table): add conversion from Arrow Schema to Iceberg [iceberg-go]

2024-10-09 Thread via GitHub
zeroshade commented on code in PR #155: URL: https://github.com/apache/iceberg-go/pull/155#discussion_r1793848993 ## table/scanner_test.go: ## @@ -54,13 +54,10 @@ func TestScanner(t *testing.T) { {"test_partitioned_by_years", iceberg.LessThan(iceberg.Reference("

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1794079437 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns i

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1794072935 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,21 @@ public static class ConstantVectorHolder extends Vector

Re: [PR] Iceberg Kafka Connect :: Writer Per Topic Partition Design [iceberg]

2024-10-09 Thread via GitHub
kumarpritam863 commented on PR #11290: URL: https://github.com/apache/iceberg/pull/11290#issuecomment-2402743316 @bryanck Thanks for the quick response. we were able to optimise on the memory and performance also by adding some configurations like: "iceberg.tables.write-props.write.par

Re: [PR] Config for deciding whether to use Iceberg Time type [iceberg]

2024-10-09 Thread via GitHub
kumarpritam863 commented on PR #11174: URL: https://github.com/apache/iceberg/pull/11174#issuecomment-2402751955 @bryanck thanks for the quick response. Kafka Connect SMT's are a bit slow and putting that just for this conversion would unnecessarily lower the performance. Also as there is n

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793766774 ## spark/v4.0/spark/src/main/scala/org/apache/spark/sql/stats/ThetaSketchAgg.scala: ## @@ -119,3 +122,12 @@ case class ThetaSketchAgg( compactSketch.toByt

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793767329 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -138,6 +138,7 @@ public class TestRewriteDataFilesAct

Re: [PR] Spec: Add v3 types and type promotion [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #10955: URL: https://github.com/apache/iceberg/pull/10955#discussion_r1794205253 ## format/spec.md: ## @@ -1089,6 +1118,7 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | Primitive type | Hash specificat

[PR] Remove spring-boot dependency [iceberg]

2024-10-09 Thread via GitHub
jbonofre opened a new pull request, #11291: URL: https://github.com/apache/iceberg/pull/11291 This PR removes the spring-boot dependency from Iceberg and implement Aliyun OSS Mock using the JDK HTTP server. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Spec: Add v3 types and type promotion [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #10955: URL: https://github.com/apache/iceberg/pull/10955#discussion_r1794202328 ## format/spec.md: ## @@ -1089,6 +1118,7 @@ The types below are not currently valid for bucketing, and so are not hashed. Ho | Primitive type | Hash specificat

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794211865 ## format/puffin-spec.md: ## @@ -123,6 +123,44 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] More accurate estimate on parquet row groups size [iceberg]

2024-10-09 Thread via GitHub
jinyangli34 commented on PR #11258: URL: https://github.com/apache/iceberg/pull/11258#issuecomment-2403412692 Run benchmark again, increased `NUM_RECORDS` from 1M to 5M Tested 4 groups: **main**: main branch without change in this PR **PR**: this PR **PR+2**: two more getBuffe

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794213504 ## format/puffin-spec.md: ## @@ -123,6 +123,44 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794214522 ## format/puffin-spec.md: ## @@ -123,6 +123,44 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794227286 ## format/puffin-spec.md: ## @@ -123,6 +123,44 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] [For testing only] Testing the BaseIncrementalChangelogScan implementation from #9888 [iceberg]

2024-10-09 Thread via GitHub
wypoon closed pull request #10954: [For testing only] Testing the BaseIncrementalChangelogScan implementation from #9888 URL: https://github.com/apache/iceberg/pull/10954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Support commit retries [iceberg-python]

2024-10-09 Thread via GitHub
maxlucuta commented on issue #269: URL: https://github.com/apache/iceberg-python/issues/269#issuecomment-2402969232 Have also have experience not being able write to tables in highly distributed environments. Refreshing the table in isolation, in addition to adding some retry logic did not

Re: [PR] open-api: Build runtime jar for test fixture [iceberg]

2024-10-09 Thread via GitHub
danielcweeks commented on code in PR #11279: URL: https://github.com/apache/iceberg/pull/11279#discussion_r1793937666 ## build.gradle: ## @@ -1006,6 +1009,37 @@ project(':iceberg-open-api') { recommend.set(true) } check.dependsOn('validateRESTCatalogSpec') + + // Cre

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1793922257 ## arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java: ## @@ -262,6 +265,142 @@ public void testReadColumnFilter2() throws Exception {

Re: [PR] Spec: Support geo type [iceberg]

2024-10-09 Thread via GitHub
Kontinuation commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1793717159 ## format/spec.md: ## @@ -1286,6 +1291,7 @@ This serialization scheme is for storing single values as individual binary valu | **`struct`** | N

Re: [PR] AWS: Introduce opt-in S3LocationProvider which is optimized for S3 performance [iceberg]

2024-10-09 Thread via GitHub
danielcweeks commented on PR #2: URL: https://github.com/apache/iceberg/pull/2#issuecomment-2402888017 > What do you think about this approach @danielcweeks: > > > Is there an optimal number of directories and depth? maybe we can just create those and put rest of the entropy i

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794238145 ## format/puffin-spec.md: ## @@ -123,6 +123,44 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct values,

Re: [PR] Spec: Add v3 types and type promotion [iceberg]

2024-10-09 Thread via GitHub
rdblue merged PR #10955: URL: https://github.com/apache/iceberg/pull/10955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Puffin: Add delete-vector-v1 blob type [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11238: URL: https://github.com/apache/iceberg/pull/11238#discussion_r1794250448 ## format/puffin-spec.md: ## @@ -123,6 +123,49 @@ The blob metadata for this blob may include following properties: - `ndv`: estimate of number of distinct

Re: [PR] Spec: Support geo type [iceberg]

2024-10-09 Thread via GitHub
Kontinuation commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1793695769 ## format/spec.md: ## @@ -1102,6 +1105,7 @@ Hash results are not dependent on decimal scale, which is part of the type, not 4. UUIDs are encoded using big endi

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793770367 ## spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestCompressionSettings.java: ## @@ -108,14 +108,14 @@ public static Object[][] parameters()

Re: [PR] OpenAPI: Define REST Catalog models for Row-Level Updates [iceberg]

2024-10-09 Thread via GitHub
geruh commented on code in PR #11287: URL: https://github.com/apache/iceberg/pull/11287#discussion_r1793870830 ## open-api/rest-catalog-open-api.py: ## @@ -896,19 +896,6 @@ class SetPartitionStatisticsUpdate(BaseUpdate): ) -class TableRequirement(BaseModel): Review Com

Re: [PR] OpenAPI: Define REST Catalog models for Row-Level Updates [iceberg]

2024-10-09 Thread via GitHub
jackye1995 commented on code in PR #11287: URL: https://github.com/apache/iceberg/pull/11287#discussion_r1793872507 ## open-api/rest-catalog-open-api.yaml: ## @@ -3082,6 +3132,47 @@ components: default-sort-order-id: type: integer +AssertOverwriteRows:

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793744239 ## gradle/libs.versions.toml: ## @@ -137,6 +139,7 @@ hadoop2-common = { module = "org.apache.hadoop:hadoop-common", version.ref = "ha hadoop2-hdfs = { module

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793743752 ## gradle/libs.versions.toml: ## @@ -47,6 +47,7 @@ flink120 = { strictly = "1.20.0"} google-libraries-bom = "26.47.0" guava = "33.3.0-jre" hadoop2 = "2.7.3"

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793748247 ## spark/v4.0/build.gradle: ## @@ -33,6 +34,7 @@ configure(sparkProjects) { force "com.fasterxml.jackson.module:jackson-module-scala_${scalaVersion}:

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793751529 ## spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -30,7 +30,7 @@ import

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11257: URL: https://github.com/apache/iceberg/pull/11257#discussion_r1793758286 ## spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestSystemFunctionPushDownInRowLevelOperations.java: ## @@ -260,7 +267,12 @@ p

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1794135310 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns i

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1794135680 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns i

Re: [PR] feat: Safer PartitionSpec & SchemalessPartitionSpec [iceberg-rust]

2024-10-09 Thread via GitHub
Xuanwo commented on PR #645: URL: https://github.com/apache/iceberg-rust/pull/645#issuecomment-2402617523 Hi, @c-thiel, thanks a lot for working on this. This issue does seem complex. I didn't join the discussion before, so I will take some time to go through all the references and then rev

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
RussellSpitzer commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1794025905 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,21 @@ public static class ConstantVectorHolder extends Vector

Re: [PR] Spec: Adds Row Lineage [iceberg]

2024-10-09 Thread via GitHub
sumedhsakdeo commented on code in PR #11130: URL: https://github.com/apache/iceberg/pull/11130#discussion_r1794016589 ## format/spec.md: ## @@ -298,16 +298,101 @@ Iceberg tables must not use field ids greater than 2147483447 (`Integer.MAX_VALU The set of metadata columns is:

[PR] Bump getdaft from 0.3.2 to 0.3.6 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] opened a new pull request, #1225: URL: https://github.com/apache/iceberg-python/pull/1225 Bumps [getdaft](https://github.com/Eventual-Inc/Daft) from 0.3.2 to 0.3.6. Release notes Sourced from https://github.com/Eventual-Inc/Daft/releases";>getdaft's releases.

Re: [PR] Bump getdaft from 0.3.2 to 0.3.5 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #1214: URL: https://github.com/apache/iceberg-python/pull/1214#issuecomment-2403555295 Superseded by #1225. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Bump getdaft from 0.3.2 to 0.3.5 [iceberg-python]

2024-10-09 Thread via GitHub
dependabot[bot] closed pull request #1214: Bump getdaft from 0.3.2 to 0.3.5 URL: https://github.com/apache/iceberg-python/pull/1214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Spec v3: Add deletion vectors to the table spec [iceberg]

2024-10-09 Thread via GitHub
rdblue commented on code in PR #11240: URL: https://github.com/apache/iceberg/pull/11240#discussion_r1794325342 ## format/spec.md: ## @@ -841,14 +842,38 @@ Notes: ## Delete Formats -This section details how to encode row-level deletes in Iceberg delete files. Row-level del

Re: [PR] Initial Support for Spark 4.0 preview [iceberg]

2024-10-09 Thread via GitHub
huaxingao commented on PR #11257: URL: https://github.com/apache/iceberg/pull/11257#issuecomment-2403559725 @RussellSpitzer Thanks for your review! I have addressed the comments and switched back to Preview1, along with reverting a few changes I made for Preview2/snapshot. I switched back t

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1794325498 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,21 @@ public static class ConstantVectorHolder extends VectorHolder

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1794325498 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,21 @@ public static class ConstantVectorHolder extends VectorHolder

Re: [PR] Arrow: add support for null vectors [iceberg]

2024-10-09 Thread via GitHub
slessard commented on code in PR #10953: URL: https://github.com/apache/iceberg/pull/10953#discussion_r1794325498 ## arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorHolder.java: ## @@ -140,12 +141,21 @@ public static class ConstantVectorHolder extends VectorHolder

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793063470 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -71,6 +72,7 @@ public String partition() { private final PartitionSet deleteFilePartition

Re: [PR] Data loss in the Incremental Co-operative Mode of Rebalancing [iceberg]

2024-10-09 Thread via GitHub
kumarpritam863 commented on PR #11289: URL: https://github.com/apache/iceberg/pull/11289#issuecomment-2401889072 @bryanck @fqaiser94 can we please review this. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] fix(arrow): Use new ParquetMetaDataReader instead [iceberg-rust]

2024-10-09 Thread via GitHub
Xuanwo opened a new pull request, #661: URL: https://github.com/apache/iceberg-rust/pull/661 This PR will use new ParquetMetaDataReader instead of old `metadata_load` API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] fix(arrow): Use new ParquetMetaDataReader instead [iceberg-rust]

2024-10-09 Thread via GitHub
Xuanwo commented on PR #661: URL: https://github.com/apache/iceberg-rust/pull/661#issuecomment-2401903861 Hi, @liurenjie1024, would you like to take a quick review over this PR? This will fix the CI on our main branch. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-09 Thread via GitHub
kevinjqliu commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1793598016 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licens

Re: [PR] Add clarifying docs to transform result types [iceberg-python]

2024-10-09 Thread via GitHub
kevinjqliu merged PR #1211: URL: https://github.com/apache/iceberg-python/pull/1211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-09 Thread via GitHub
kevinjqliu commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1792157759 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor licens

Re: [PR] Add clarifying docs to transform result types [iceberg-python]

2024-10-09 Thread via GitHub
kevinjqliu commented on PR #1211: URL: https://github.com/apache/iceberg-python/pull/1211#issuecomment-2402460430 Merged! Thanks @kevinzwang for the contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on PR #11158: URL: https://github.com/apache/iceberg/pull/11158#issuecomment-2402473805 > Discussed with @nastra offline, overall feel like the change is good but I think it's worth running some benchmarks as a sanity check. There really shouldn't be much of a change afterw

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-09 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1793038900 ## docker/iceberg-rest-adapter-image/README.md: ## @@ -0,0 +1,87 @@ + + +# Iceberg rest adapter image + +For converting different catalog implementations into a

Re: [PR] feat: add gcp oauth support [iceberg-rust]

2024-10-09 Thread via GitHub
Xuanwo merged PR #654: URL: https://github.com/apache/iceberg-rust/pull/654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[PR] feat: Derive PartialEq for FileScanTask [iceberg-rust]

2024-10-09 Thread via GitHub
Xuanwo opened a new pull request, #660: URL: https://github.com/apache/iceberg-rust/pull/660 This PR derives `PartialEq` for `FileScanTask` so users can compare them while needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] chore(deps): Bump crate-ci/typos from 1.24.6 to 1.25.0 [iceberg-rust]

2024-10-09 Thread via GitHub
liurenjie1024 closed pull request #658: chore(deps): Bump crate-ci/typos from 1.24.6 to 1.25.0 URL: https://github.com/apache/iceberg-rust/pull/658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore(deps): Bump crate-ci/typos from 1.24.6 to 1.25.0 [iceberg-rust]

2024-10-09 Thread via GitHub
dependabot[bot] commented on PR #658: URL: https://github.com/apache/iceberg-rust/pull/658#issuecomment-2401567568 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-09 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1793036463 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

Re: [PR] REST: Docker file for Rest catalog adapter image [iceberg]

2024-10-09 Thread via GitHub
ajantha-bhat commented on code in PR #11283: URL: https://github.com/apache/iceberg/pull/11283#discussion_r1793036021 ## docker/iceberg-rest-adapter-image/Dockerfile: ## @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor lice

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793045220 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -372,8 +367,14 @@ private boolean manifestHasDeletedFiles( for (ManifestEntry entry :

[PR] Workers gets stuck as there is no-coordinator for emitting Start_Commit request in Incremental Cooperative Rebalancing[ICR] Mode [iceberg]

2024-10-09 Thread via GitHub
kumarpritam863 opened a new pull request, #11288: URL: https://github.com/apache/iceberg/pull/11288 **PROBLEMS:** The **open** and **close** method of a sink task receives the **delta** of **partition**. Like in case of **open(Collection partitions)**, partitions are not the complete

Re: [PR] Workers gets stuck as there is no-coordinator for emitting Start_Commit request in Incremental Cooperative Rebalancing[ICR] Mode [iceberg]

2024-10-09 Thread via GitHub
kumarpritam863 commented on PR #11288: URL: https://github.com/apache/iceberg/pull/11288#issuecomment-2401874390 @bryanck @fqaiser94 can we please review this PR. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793462877 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -533,4 +531,51 @@ private Pair metricsEvaluator return metricsEvaluators.get(partitio

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793489743 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -215,7 +213,7 @@ private List writeNewManifests() throws IOException { } if (newManifests

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793063470 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -71,6 +72,7 @@ public String partition() { private final PartitionSet deleteFilePartition

Re: [PR] Core: Switch usage to DataFileSet / DeleteFileSet [iceberg]

2024-10-09 Thread via GitHub
nastra commented on code in PR #11158: URL: https://github.com/apache/iceberg/pull/11158#discussion_r1793049678 ## core/src/main/java/org/apache/iceberg/ManifestFilterManager.java: ## @@ -71,6 +72,7 @@ public String partition() { private final PartitionSet deleteFilePartition

  1   2   >