Re: [PR] Flink: flink/*: replaced .size() > 0 with isEmpty() [iceberg]
nastra commented on PR #8819: URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761008540 @PickBas thanks for working on this. I think it would be great to combine all changes in a single PR rather than opening 1-file PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Derive 'operation' from view version [iceberg]
nastra commented on code in PR #8678: URL: https://github.com/apache/iceberg/pull/8678#discussion_r1357853053 ## core/src/main/java/org/apache/iceberg/view/BaseMetastoreViewCatalog.java: ## @@ -161,7 +162,7 @@ private View create(ViewOperations ops) { .defaultNamespace(defaultNamespace) .defaultCatalog(defaultCatalog) .timestampMillis(System.currentTimeMillis()) - .putSummary("operation", "create") + .putAllSummary(EnvironmentContext.get()) Review Comment: in https://iceberg.apache.org/view-spec/#summary we mention that the summary might contain the engine name/version, so I figured it makes sense to add this info from `EnvironmentContext` here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Derive 'operation' from view version [iceberg]
nastra commented on code in PR #8678: URL: https://github.com/apache/iceberg/pull/8678#discussion_r1357855073 ## core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java: ## @@ -99,7 +100,7 @@ public void basicCreateView() { .timestampMillis(view.currentVersion().timestampMillis()) .versionId(1) .schemaId(0) -.putSummary("operation", "create") +.summary(view.currentVersion().summary()) Review Comment: we could also use `EnvironmentContext.get()` here for comparison, but I'm not sure if this might later cause test issues when view support is added for other engines -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: suport read/write Manifest [iceberg-rust]
liurenjie1024 commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357853771 ## crates/iceberg/src/spec/partition.rs: ## @@ -54,6 +56,24 @@ impl PartitionSpec { pub fn builder() -> PartitionSpecBuilder { PartitionSpecBuilder::default() } + +/// Construct the partition type from schema +pub fn partition_type(&self, schema: &Schema) -> Result { +let mut fields = Vec::with_capacity(self.fields.len()); +for field in &self.fields { +let field = schema.field_by_id(field.source_id).ok_or_else(|| { Review Comment: I think this is incorrect? The partition type should be determined by type after transformation. ## crates/iceberg/src/spec/values.rs: ## @@ -966,6 +978,547 @@ mod timestamptz { } } +mod serde { Review Comment: I would suggest to move this into another pr and resolve them first, otherwise this pr would get too large to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: flink/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas commented on PR #8819: URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761042424 > @PickBas thanks for working on this. I think it would be great to combine all changes in a single PR rather than opening 1-file PRs @nastra Thank you for your feedback. I've done 1 PR per module so we could keep the changes traceable and manageable, as @Fokko requested in PR #8813. I may do it another way: keep PRs open for spark and core modules, since they include most of the changes, and combine the rest of the modules into a single PR. If that works for both you and @Fokko, I will surely do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: suport read/write Manifest [iceberg-rust]
ZENOTME commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357920808 ## crates/iceberg/src/spec/values.rs: ## @@ -966,6 +978,547 @@ mod timestamptz { } } +mod serde { Review Comment: If `RawLiteral` looks well, I can seperate it to another PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: suport read/write Manifest [iceberg-rust]
ZENOTME commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1357921009 ## crates/iceberg/src/spec/partition.rs: ## @@ -54,6 +56,24 @@ impl PartitionSpec { pub fn builder() -> PartitionSpecBuilder { PartitionSpecBuilder::default() } + +/// Construct the partition type from schema +pub fn partition_type(&self, schema: &Schema) -> Result { +let mut fields = Vec::with_capacity(self.fields.len()); +for field in &self.fields { +let field = schema.field_by_id(field.source_id).ok_or_else(|| { Review Comment: Yes. My bad.🥵 ## crates/iceberg/src/spec/partition.rs: ## @@ -54,6 +56,24 @@ impl PartitionSpec { pub fn builder() -> PartitionSpecBuilder { PartitionSpecBuilder::default() } + +/// Construct the partition type from schema +pub fn partition_type(&self, schema: &Schema) -> Result { +let mut fields = Vec::with_capacity(self.fields.len()); +for field in &self.fields { +let field = schema.field_by_id(field.source_id).ok_or_else(|| { Review Comment: Yes. My bad.🥵 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: add builder to TableMetadata interface [iceberg-rust]
liurenjie1024 commented on code in PR #62: URL: https://github.com/apache/iceberg-rust/pull/62#discussion_r1357880510 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -93,22 +116,278 @@ pub struct TableMetadata { /// previous metadata file location should be added to the list. /// Tables can be configured to remove oldest metadata log entries and /// keep a fixed-size log of the most recent entries after a commit. +#[builder(default, setter(custom))] metadata_log: Vec, /// A list of sort orders, stored as full sort order objects. +#[builder(default, setter(custom))] sort_orders: HashMap, /// Default sort order id of the table. Note that this could be used by /// writers, but is not used when reading because reads use the specs /// stored in manifest files. +#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))] default_sort_order_id: i64, ///A map of snapshot references. The map keys are the unique snapshot reference /// names in the table, and the map values are snapshot reference objects. /// There is always a main branch reference pointing to the current-snapshot-id /// even if the refs map is null. +#[builder(default = "Self::default_ref()", setter(custom))] refs: HashMap, } +// We define a from implementation from builder Error to Iceberg Error +impl From for Error { Review Comment: Move this to `error` module? ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -93,22 +116,278 @@ pub struct TableMetadata { /// previous metadata file location should be added to the list. /// Tables can be configured to remove oldest metadata log entries and /// keep a fixed-size log of the most recent entries after a commit. +#[builder(default, setter(custom))] metadata_log: Vec, /// A list of sort orders, stored as full sort order objects. +#[builder(default, setter(custom))] sort_orders: HashMap, /// Default sort order id of the table. Note that this could be used by /// writers, but is not used when reading because reads use the specs /// stored in manifest files. +#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))] default_sort_order_id: i64, ///A map of snapshot references. The map keys are the unique snapshot reference /// names in the table, and the map values are snapshot reference objects. /// There is always a main branch reference pointing to the current-snapshot-id /// even if the refs map is null. +#[builder(default = "Self::default_ref()", setter(custom))] refs: HashMap, } +// We define a from implementation from builder Error to Iceberg Error +impl From for Error { +fn from(ufe: UninitializedFieldError) -> Error { +Error::new(ErrorKind::DataInvalid, ufe.to_string()) Review Comment: ```suggestion Error::new(ErrorKind::DataInvalid, "Some fields of table metadata not inited") .with_source(ufe) ``` ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -93,22 +116,278 @@ pub struct TableMetadata { /// previous metadata file location should be added to the list. /// Tables can be configured to remove oldest metadata log entries and /// keep a fixed-size log of the most recent entries after a commit. +#[builder(default, setter(custom))] metadata_log: Vec, /// A list of sort orders, stored as full sort order objects. +#[builder(default, setter(custom))] sort_orders: HashMap, /// Default sort order id of the table. Note that this could be used by /// writers, but is not used when reading because reads use the specs /// stored in manifest files. +#[builder(default = "DEFAULT_SORT_ORDER_ID", setter(custom))] default_sort_order_id: i64, ///A map of snapshot references. The map keys are the unique snapshot reference /// names in the table, and the map values are snapshot reference objects. /// There is always a main branch reference pointing to the current-snapshot-id /// even if the refs map is null. +#[builder(default = "Self::default_ref()", setter(custom))] refs: HashMap, } +// We define a from implementation from builder Error to Iceberg Error +impl From for Error { +fn from(ufe: UninitializedFieldError) -> Error { +Error::new(ErrorKind::DataInvalid, ufe.to_string()) +} +} + +impl TableMetadataBuilder { +/// Get current time in ms +fn current_time_ms() -> i64 { +UNIX_EPOCH +.elapsed() +.unwrap() +.as_millis() +.try_into() +.unwrap() +} + +fn default_ref() -> HashMap { +HashMap::from([( +"main".to_string(), +SnapshotReference { +snapshot_id: -1, +retention: SnapshotRetention::Branch { +min_snapshots_to_keep: None, +
Re: [I] Flaky test: TestRemoveOrphanFilesAction3 > orphanedFileRemovedWithParallelTasks [iceberg]
amogh-jahagirdar commented on issue #8824: URL: https://github.com/apache/iceberg/issues/8824#issuecomment-1761134284 Yeah, I'd probably start with making this https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java#L273 a `ConcurrentHashMap.newKeySet();`. Even though the implementation inserts unique elements (file paths), there are no thread safety guarantees on normal hash sets. I think it's possible there could be some weird corruptions of internal state on the underlying table/buckets in the set which lead to missing elements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Aws: aws/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8823: Aws: aws/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Mr: mr/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8820: Mr: mr/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8820 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Delta-lake: delta-lake/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8818: Delta-lake: delta-lake/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Aliyun: aliyun/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8821: Aliyun: aliyun/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Hive3: hive3/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8817: Hive3: hive3/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Parquet: parquet/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8816: Parquet: parquet/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Data: data/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8815: Data: data/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Api: api/*: replaced .size() > 0 with isEmpty() [iceberg]
PickBas closed pull request #8822: Api: api/*: replaced .size() > 0 with isEmpty() URL: https://github.com/apache/iceberg/pull/8822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
PickBas commented on PR #8819: URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761167713 @amogh-jahagirdar @nastra @Fokko Done. Now the rest of the modules are included in this pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]
chenwyi2 commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-1761169778 Hi @stevenzwu @kengtin this PR can be create too many small files when parition with dt,hout,minute and bucekt(id), suppose paralisim is 120 and bucke number is 8, then 15 writes can write into same one bucket, but there is problem, data from the previous few hours can be into one commit because of data latency, there can be 15000 and more data files if changed partition is up to 1000, can we use complete parition name instead of just bucket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
Fokko commented on PR #8819: URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761169780 > If that works for both you and @Fokko, I will surely do that. I'm open to everything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
Fokko commented on code in PR #8819: URL: https://github.com/apache/iceberg/pull/8819#discussion_r1357986438 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java: ## @@ -105,7 +105,7 @@ public void testNamespaceExists() { public void testListNamespace() { String namespace = createNamespace(); List namespaceList = glueCatalog.listNamespaces(); -Assert.assertTrue(namespaceList.size() > 0); +Assert.assertFalse(namespaceList.isEmpty()); Review Comment: Nice, much easier to read 👍 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java: ## @@ -105,7 +105,7 @@ public void testNamespaceExists() { public void testListNamespace() { String namespace = createNamespace(); List namespaceList = glueCatalog.listNamespaces(); -Assert.assertTrue(namespaceList.size() > 0); +Assert.assertFalse(namespaceList.isEmpty()); Review Comment: Nice, much easier to read 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
nastra commented on code in PR #8819: URL: https://github.com/apache/iceberg/pull/8819#discussion_r1358006481 ## aws/src/integration/java/org/apache/iceberg/aws/glue/TestGlueCatalogNamespace.java: ## @@ -105,7 +105,7 @@ public void testNamespaceExists() { public void testListNamespace() { String namespace = createNamespace(); List namespaceList = glueCatalog.listNamespaces(); -Assert.assertTrue(namespaceList.size() > 0); +Assert.assertFalse(namespaceList.isEmpty()); Review Comment: nit: we might want to switch all Junit4-style assertions to AssertJ in the AWS integration test module (in a separate PR) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
nastra commented on PR #8808: URL: https://github.com/apache/iceberg/pull/8808#issuecomment-1761210176 @fengjiajie thanks for working on this. Could you please add a test that reproduces the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat(tables): add basic table implementation [iceberg-go]
nastra commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358030118 ## table/metadata.go: ## @@ -0,0 +1,401 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package table + +import ( + "encoding/json" + "errors" + "fmt" + "io" + + "github.com/apache/iceberg-go" + + "github.com/google/uuid" +) + +// Metadata for an iceberg table as specified in the Iceberg spec +// +// https://iceberg.apache.org/spec/#iceberg-table-spec +type Metadata interface { + // Version indicates the version of this metadata, 1 for V1, 2 for V2, etc. + Version() int + // TableUUID returns a UUID that identifies the table, generated when the + // table is created. Implementations must throw an exception if a table's + // UUID does not match the expected UUID after refreshing metadata. + TableUUID() uuid.UUID + // Location is the table's base location. This is used by writers to determine + // where to store data files, manifest files, and table metadata files. + Location() string + // LastUpdatedMillis is the timestamp in milliseconds from the unix epoch when + // the table was last updated. Each table metadata file should update this + // field just before writing. + LastUpdatedMillis() int64 + // LastColumn returns the highest assigned column ID for the table. + // This is used to ensure fields are always assigned an unused ID when + // evolving schemas. + LastColumn() int + // Schemas returns the list of schemas, stored as objects with their + // schema-id. + Schemas() []*iceberg.Schema + // CurrentSchema returns the table's current schema. + CurrentSchema() *iceberg.Schema + // PartitionSpecs returns the list of all partition specs in the table. + PartitionSpecs() []iceberg.PartitionSpec + // PartitionSpec returns the current partition spec that the table is using. + PartitionSpec() iceberg.PartitionSpec + // DefaultPartitionSpec is the ID of the current spec that writers should + // use by default. + DefaultPartitionSpec() int + // LastPartitionSpecID is the highest assigned partition field ID across + // all partition specs for the table. This is used to ensure partition + // fields are always assigned an unused ID when evolving specs. + LastPartitionSpecID() *int + // Snapshots returns the list of valid snapshots. Valid snapshots are + // snapshots for which all data files exist in the file system. A data + // file must not be deleted from the file system until the last snapshot + // in which it was listed is garbage collected. + Snapshots() []Snapshot + // SnapshotByID find and return a specific snapshot by its ID. Returns + // nil if the ID is not found in the list of snapshots. + SnapshotByID(int64) *Snapshot + // SnapshotByName searches the list of snapshots for a snapshot with a given + // ref name. Returns nil if there's no ref with this name for a snapshot. + SnapshotByName(name string) *Snapshot Review Comment: it should be either 0 or 1 snapshot that is returned. In my head I was thinking about `snapshotsById` that we use in the Java impl, so having singular here is fine I think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat(tables): add basic table implementation [iceberg-go]
nastra commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358030652 ## table/table.go: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package table + +import ( + "reflect" + + "github.com/apache/iceberg-go" + "github.com/apache/iceberg-go/io" + "golang.org/x/exp/slices" +) + +type Identifier = []string + +type Table struct { + identifier Identifier + metadata Metadata + metadataLocation string + fs io.IO +} + +func (t Table) Equals(other Table) bool { + return slices.Equal(t.identifier, other.identifier) && + t.metadataLocation == other.metadataLocation && + reflect.DeepEqual(t.metadata, other.metadata) +} + +func (t Table) Identifier() Identifier { return t.identifier } +func (t Table) Metadata() Metadata { return t.metadata } +func (t Table) MetadataLocation() string { return t.metadataLocation } +func (t Table) FS() io.IO{ return t.fs } + +func (t Table) Schema() *iceberg.Schema { return t.metadata.CurrentSchema() } +func (t Table) Spec() iceberg.PartitionSpec { return t.metadata.PartitionSpec() } +func (t Table) SortOrder() SortOrder { return t.metadata.SortOrder() } +func (t Table) Properties() iceberg.Properties { return t.metadata.Properties() } +func (t Table) Location() string { return t.metadata.Location() } +func (t Table) CurrentSnapshot() *Snapshot { return t.metadata.CurrentSnapshot() } +func (t Table) SnapshotByID(id int64) *Snapshot { return t.metadata.SnapshotByID(id) } +func (t Table) SnapshotByName(name string) *Snapshot { return t.metadata.SnapshotByName(name) } Review Comment: ok thanks for clarifying -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat(tables): add basic table implementation [iceberg-go]
nastra commented on code in PR #11: URL: https://github.com/apache/iceberg-go/pull/11#discussion_r1358032220 ## table/table.go: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package table + +import ( + "reflect" + + "github.com/apache/iceberg-go" + "github.com/apache/iceberg-go/io" + "golang.org/x/exp/slices" +) + +type Identifier = []string + +type Table struct { + identifier Identifier + metadata Metadata + metadataLocation string + fs io.IO +} + +func (t Table) Equals(other Table) bool { + return slices.Equal(t.identifier, other.identifier) && + t.metadataLocation == other.metadataLocation && + reflect.DeepEqual(t.metadata, other.metadata) Review Comment: I think we can go ahead and merge this PR, but this is something to keep in mind for a follow-up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat(tables): add basic table implementation [iceberg-go]
nastra merged PR #11: URL: https://github.com/apache/iceberg-go/pull/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
Fokko commented on code in PR #63: URL: https://github.com/apache/iceberg-python/pull/63#discussion_r1358033938 ## pyiceberg/manifest.py: ## @@ -182,6 +182,7 @@ def __repr__(self) -> str: doc="Splittable offsets", ), NestedField(field_id=140, name="sort_order_id", field_type=IntegerType(), required=False, doc="Sort order ID"), +NestedField(field_id=141, name="spec_id", field_type=IntegerType(), required=False, doc="Partition spec ID"), Review Comment: I don't think we want to add the fields here since the `spec_id` isn't read/written. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]
amogh-jahagirdar commented on issue #64: URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761229740 Hm yeah I can repro this with a simple test in `table/test_init.py` ``` scan = table.scan(row_filter="x=1 AND y=1 AND z=1") assert scan.row_filter == And(EqualTo("x", 1), EqualTo("y", 1), EqualTo("z", 1)) ``` this fails the assertion because the row filter ends up being And(EqualTo("x", 1), EqualTo("y", 1)) for some reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]
amogh-jahagirdar commented on issue #64: URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761230308 Looking into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]
amogh-jahagirdar commented on issue #64: URL: https://github.com/apache/iceberg-python/issues/64#issuecomment-1761250090 https://github.com/apache/iceberg-python/blob/main/pyiceberg/expressions/parser.py#L236 this is where it'll skip anything after. I think this needs to combine all results in a single And (same applies for Or). But I need to do some testing with different combinations, not sure what how PyParse actually passes in the ParseResults when it visits the expression -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
nastra commented on code in PR #8808: URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358102450 ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java: ## @@ -98,6 +100,62 @@ public void testTwoLevelList() throws IOException { } } + @Test + public void testReadBinaryFieldAsString() throws IOException { +Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", Types.BinaryType.get())); +org.apache.avro.Schema avroSchema = AvroSchemaUtil.convert(schemaForWriteBinary.asStruct()); + +File testFile = temp.newFile(); +Assert.assertTrue(testFile.delete()); Review Comment: please use AssertJ-style assertions for newly added tests (https://iceberg.apache.org/contribute/#assertj). This makes it easier to migrate from JUnit4 to JUnit5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Fix compiler warnings [iceberg]
nastra merged PR #8763: URL: https://github.com/apache/iceberg/pull/8763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Python: Add support for Python 3.12 [iceberg-python]
Fokko commented on PR #35: URL: https://github.com/apache/iceberg-python/pull/35#issuecomment-1761321623 @steinsgateted can you pull in the main branch? It looks like that 3.12 is available: https://github.com/actions/python-versions/releases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
Fokko merged PR #8819: URL: https://github.com/apache/iceberg/pull/8819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink, Aliyun, MR, Delta-lake, Hive3, Parquet, Data: replaced .size() > 0 with isEmpty() [iceberg]
Fokko commented on PR #8819: URL: https://github.com/apache/iceberg/pull/8819#issuecomment-1761324612 Thanks @PickBas for picking this up 🙌 and @nastra and @ajantha-bhat for the review 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
amogh-jahagirdar opened a new pull request, #65: URL: https://github.com/apache/iceberg-python/pull/65 Fixes #64 . If there are multiple and/or conditions currently, our expression parser will ignore anything after the second predicate. This change fixes the issue by forwarding the remaining predicates as the argument to `*rest: BooleanExpression` for And and Or. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Run dependabot daily [iceberg-python]
Fokko opened a new pull request, #66: URL: https://github.com/apache/iceberg-python/pull/66 I would love to run dependabot daily instead of weekly. Now we're in our own repository, we'll introduce less noise. This would have helped us to identify the issue with Pydantic earlier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
amogh-jahagirdar commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358113965 ## pyiceberg/expressions/parser.py: ## @@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: +if len(result[0]) > 2: +return And(result[0][0], result[0][1], *result[0][2:]) return And(result[0][0], result[0][1]) Review Comment: Uh maybe not the most pythonic...I guess a ternary which sets the *args option if it's > 2 otherwise AlwaysTrue (for And) and then for Or would be AlwaysFalse so it just considers the first 2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
Fokko commented on PR #65: URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761329387 This is a serious one, thanks for reporting @puchengy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
Fokko commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358122959 ## pyiceberg/expressions/parser.py: ## @@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: +if len(result[0]) > 2: +return And(result[0][0], result[0][1], *result[0][2:]) return And(result[0][0], result[0][1]) def handle_or(result: ParseResults) -> Or: +if len(result[0]) > 2: +return Or(result[0][0], result[0][1], *result[0][2:]) return Or(result[0][0], result[0][1]) Review Comment: ```suggestion return Or(*result[0]) ``` ## pyiceberg/expressions/parser.py: ## @@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: +if len(result[0]) > 2: +return And(result[0][0], result[0][1], *result[0][2:]) return And(result[0][0], result[0][1]) Review Comment: What do you think of: ```suggestion return And(*result[0]) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: suport read/write Manifest [iceberg-rust]
liurenjie1024 commented on code in PR #79: URL: https://github.com/apache/iceberg-rust/pull/79#discussion_r1358133729 ## crates/iceberg/src/spec/values.rs: ## @@ -966,6 +978,547 @@ mod timestamptz { } } +mod serde { Review Comment: I'm ok with this approach, cc @JanKaul @Xuanwo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
amogh-jahagirdar commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358137792 ## pyiceberg/expressions/parser.py: ## @@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: +if len(result[0]) > 2: +return And(result[0][0], result[0][1], *result[0][2:]) return And(result[0][0], result[0][1]) Review Comment: I realized I can just pass in *result[0] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: First version of rest catalog. [iceberg-rust]
liurenjie1024 commented on code in PR #78: URL: https://github.com/apache/iceberg-rust/pull/78#discussion_r1357791341 ## crates/iceberg/Cargo.toml: ## @@ -41,20 +41,24 @@ either = "1" futures = "0.3" itertools = "0.11" lazy_static = "1" +log = "^0.4" murmur3 = "0.5.2" once_cell = "1" opendal = "0.40" ordered-float = "4.0.0" +reqwest = { version = "^0.11", features = ["json"] } Review Comment: > The problem with the features is that they are add-only, making it difficult for users to disable them. Sorry, I don't get your point, would you give a concrete example? One concern with separate crate approach is that it makes loading catalog dynamically like Python difficult. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
amogh-jahagirdar commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358138359 ## pyiceberg/expressions/parser.py: ## @@ -233,10 +233,14 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: +if len(result[0]) > 2: +return And(result[0][0], result[0][1], *result[0][2:]) return And(result[0][0], result[0][1]) Review Comment: Haha just came to the realization I can do this and don't need to differentiate based on the lengthpushed up the change. Way better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
amogh-jahagirdar commented on PR #65: URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761366348 Yes thanks @puchengy for reporting this, please also take a look a this fix when you get a chance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Make `location` in `TableCreation` optional [iceberg-rust]
Xuanwo commented on issue #67: URL: https://github.com/apache/iceberg-rust/issues/67#issuecomment-1761422712 I'm going to make this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Spark: spark/*: replaced .size() > 0 with isEmpty() [iceberg]
amogh-jahagirdar merged PR #8814: URL: https://github.com/apache/iceberg/pull/8814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add View support for REST catalog [iceberg]
nastra commented on code in PR #7913: URL: https://github.com/apache/iceberg/pull/7913#discussion_r1358250961 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,4 +385,107 @@ static TableMetadata commit(TableOperations ops, UpdateTableRequest request) { return ops.current(); } + + public static BaseView baseView(View view) { +Preconditions.checkArgument(view instanceof BaseView, "View must be a BaseView"); +return (BaseView) view; + } + + public static ListTablesResponse listViews(ViewCatalog catalog, Namespace namespace) { +return ListTablesResponse.builder().addAll(catalog.listViews(namespace)).build(); + } + + public static LoadViewResponse createView( + ViewCatalog catalog, Namespace namespace, CreateViewRequest request) { +request.validate(); + +ViewMetadata viewMetadata = request.metadata(); +ViewBuilder viewBuilder = +catalog +.buildView(TableIdentifier.of(namespace, request.name())) +.withSchema(viewMetadata.schema()) +.withProperties(viewMetadata.properties()) + .withDefaultNamespace(viewMetadata.currentVersion().defaultNamespace()) + .withDefaultCatalog(viewMetadata.currentVersion().defaultCatalog()); +viewMetadata.currentVersion().representations().stream() +.filter(r -> r instanceof SQLViewRepresentation) +.map(r -> (SQLViewRepresentation) r) +.forEach(r -> viewBuilder.withQuery(r.dialect(), r.sql())); +View view = viewBuilder.create(); + +return viewResponse(view); + } + + private static LoadViewResponse viewResponse(View view) { +ViewMetadata metadata = baseView(view).operations().current(); +return ImmutableLoadViewResponse.builder() +.metadata(metadata) +.metadataLocation(metadata.metadataFileLocation()) +.build(); + } + + public static LoadViewResponse loadView(ViewCatalog catalog, TableIdentifier viewIdentifier) { +View view = catalog.loadView(viewIdentifier); +return viewResponse(view); + } + + public static LoadViewResponse updateView( + ViewCatalog catalog, TableIdentifier ident, UpdateTableRequest request) { +View view = catalog.loadView(ident); +ViewMetadata metadata = commit(baseView(view).operations(), request); + +return ImmutableLoadViewResponse.builder() +.metadata(metadata) +.metadataLocation(metadata.metadataFileLocation()) +.build(); + } + + public static void renameView(ViewCatalog catalog, RenameTableRequest request) { +catalog.renameView(request.source(), request.destination()); + } + + public static void dropView(ViewCatalog catalog, TableIdentifier viewIdentifier) { +boolean dropped = catalog.dropView(viewIdentifier); +if (!dropped) { + throw new NoSuchViewException("View does not exist: %s", viewIdentifier); +} + } + + static ViewMetadata commit(ViewOperations ops, UpdateTableRequest request) { +AtomicBoolean isRetry = new AtomicBoolean(false); +try { + Tasks.foreach(ops) + .retry(COMMIT_NUM_RETRIES_DEFAULT) + .exponentialBackoff( + COMMIT_MIN_RETRY_WAIT_MS_DEFAULT, + COMMIT_MAX_RETRY_WAIT_MS_DEFAULT, + COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT, + 2.0 /* exponential */) + .onlyRetryOn(CommitFailedException.class) + .run( + taskOps -> { +ViewMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current(); +isRetry.set(true); + +// apply changes +ViewMetadata.Builder metadataBuilder = ViewMetadata.buildFrom(base); +request.updates().forEach(update -> update.applyTo(metadataBuilder)); Review Comment: > 1.) For creating a view, there probably needs to be a requirement similar to table's AssertTableDoesNotExist but for views, AssertViewDoesNotExist. Similar principle for ReplaceView and verifying the UUID as expected. For tables I believe `UpdateRequirement.AssertTableDoesNotExist` is mainly used for transactional cases, whereas for views we don't have transaction support. I think what would make sense is to have an assertion on the UUID of the view. > 2.) I don't see a metadata update for AddViewRepresentation but I'd imagine that should validate that the SQL dialect does not already exist for the specified version. I forgot where we landed on that discussion, We do want to prevent that right? We do have `MetadataUpdate.AddViewVersion` for this and we do validate now that there's no duplicate dialect (which was fixed as part of #7880), so we should be good here. > 3.) A requirement for the schema ID to make sure that is unchanged from the base metadata? Typically this would be done via `UpdateRequirement.AssertCurrentSchemaID` when there's a `MetadataUpdate.SetCurrentSchema` update, but for Views we don't send a `Metada
[PR] Build: add gradle configuration to enforce reproducible build [iceberg]
jbonofre opened a new pull request, #8826: URL: https://github.com/apache/iceberg/pull/8826 Close #8825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]
jbonofre commented on PR #8238: URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761561894 I propose to close this PR in favor of #8830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Upgrade to spring-web 5.3.30 [iceberg]
jbonofre opened a new pull request, #8828: URL: https://github.com/apache/iceberg/pull/8828 Close #8827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Build: enforce reproducible build [iceberg]
jbonofre opened a new issue, #8825: URL: https://github.com/apache/iceberg/issues/8825 ### Feature Request / Improvement Reproducible builds is a development practice that create an independently-verifiable path from source to binary code. A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts. Fortunately, Gradle supports reproducible build with a couple of configuration on `AbstractArchiveTask`. ### Query engine None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
nastra commented on code in PR #8808: URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358326874 ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java: ## @@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException { .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, type)) .build()) { Iterator rows = reader.iterator(); - Assert.assertTrue("Should have at least one row", rows.hasNext()); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); RowData rowData = rows.next(); - Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte); - Assert.assertArrayEquals(rowData.getBinary(1), expectedByte); - Assert.assertFalse("Should not have more than one row", rows.hasNext()); + assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0)); + assertThat(expectedByte).isEqualTo(rowData.getBinary(1)); + assertThat(rows.hasNext()).as("Should not have more than one row").isFalse(); +} + } + + @Test + public void testReadBinaryFieldAsString() throws IOException { +Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", Types.BinaryType.get())); +org.apache.avro.Schema avroSchema = AvroSchemaUtil.convert(schemaForWriteBinary.asStruct()); + +File testFile = temp.newFile(); +assertThat(testFile.delete()).isTrue(); + +ParquetWriter writer = +AvroParquetWriter.builder(new Path(testFile.toURI())) +.withDataModel(GenericData.get()) +.withSchema(avroSchema) +.build(); + +String expectedString = "hello"; + +GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); +ByteBuffer expectedBinary = ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8)); +recordBuilder.set("strbytes", expectedBinary); +GenericData.Record expectedRecord = recordBuilder.build(); + +writer.write(expectedRecord); +writer.close(); + +// read as string +Schema schemaForReadBinaryAsString = +new Schema(optional(1, "strbytes", Types.StringType.get())); +try (CloseableIterable reader = +Parquet.read(Files.localInput(testFile)) +.project(schemaForReadBinaryAsString) +.createReaderFunc( +type -> FlinkParquetReaders.buildReader(schemaForReadBinaryAsString, type)) +.build()) { + Iterator rows = reader.iterator(); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); Review Comment: this should typically be `assertThat(rows).as("...").hasNext()`. The reason for that is that AssertJ provides additional context when an assertion fails and such fluent-style checks are usually preferred over the `.isTrue()`/`.isFalse()` checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
nastra commented on code in PR #8808: URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358327260 ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java: ## @@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException { .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, type)) .build()) { Iterator rows = reader.iterator(); - Assert.assertTrue("Should have at least one row", rows.hasNext()); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); RowData rowData = rows.next(); - Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte); - Assert.assertArrayEquals(rowData.getBinary(1), expectedByte); - Assert.assertFalse("Should not have more than one row", rows.hasNext()); + assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0)); + assertThat(expectedByte).isEqualTo(rowData.getBinary(1)); + assertThat(rows.hasNext()).as("Should not have more than one row").isFalse(); +} + } + + @Test + public void testReadBinaryFieldAsString() throws IOException { +Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", Types.BinaryType.get())); +org.apache.avro.Schema avroSchema = AvroSchemaUtil.convert(schemaForWriteBinary.asStruct()); + +File testFile = temp.newFile(); +assertThat(testFile.delete()).isTrue(); + +ParquetWriter writer = +AvroParquetWriter.builder(new Path(testFile.toURI())) +.withDataModel(GenericData.get()) +.withSchema(avroSchema) +.build(); + +String expectedString = "hello"; + +GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); +ByteBuffer expectedBinary = ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8)); +recordBuilder.set("strbytes", expectedBinary); +GenericData.Record expectedRecord = recordBuilder.build(); + +writer.write(expectedRecord); +writer.close(); + +// read as string +Schema schemaForReadBinaryAsString = +new Schema(optional(1, "strbytes", Types.StringType.get())); +try (CloseableIterable reader = +Parquet.read(Files.localInput(testFile)) +.project(schemaForReadBinaryAsString) +.createReaderFunc( +type -> FlinkParquetReaders.buildReader(schemaForReadBinaryAsString, type)) +.build()) { + Iterator rows = reader.iterator(); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); + RowData rowData = rows.next(); + assertThat(rowData.getString(0)).isInstanceOf(BinaryStringData.class); + assertThat(rowData.getString(0).toString()).isEqualTo(expectedString); + assertThat(rows.hasNext()).as("Should not have more than one row").isFalse(); Review Comment: this would be `assertThat(rows).as("...").isExhausted()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Flink: Read parquet BINARY column as String for expected [iceberg]
nastra commented on code in PR #8808: URL: https://github.com/apache/iceberg/pull/8808#discussion_r1358331014 ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java: ## @@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException { .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, type)) .build()) { Iterator rows = reader.iterator(); - Assert.assertTrue("Should have at least one row", rows.hasNext()); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); RowData rowData = rows.next(); - Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte); - Assert.assertArrayEquals(rowData.getBinary(1), expectedByte); - Assert.assertFalse("Should not have more than one row", rows.hasNext()); + assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0)); + assertThat(expectedByte).isEqualTo(rowData.getBinary(1)); + assertThat(rows.hasNext()).as("Should not have more than one row").isFalse(); +} + } + + @Test + public void testReadBinaryFieldAsString() throws IOException { +Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", Types.BinaryType.get())); +org.apache.avro.Schema avroSchema = AvroSchemaUtil.convert(schemaForWriteBinary.asStruct()); + +File testFile = temp.newFile(); +assertThat(testFile.delete()).isTrue(); + +ParquetWriter writer = +AvroParquetWriter.builder(new Path(testFile.toURI())) +.withDataModel(GenericData.get()) +.withSchema(avroSchema) +.build(); + +String expectedString = "hello"; + +GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); +ByteBuffer expectedBinary = ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8)); +recordBuilder.set("strbytes", expectedBinary); +GenericData.Record expectedRecord = recordBuilder.build(); + +writer.write(expectedRecord); +writer.close(); Review Comment: rather than calling `close()` manually, I think it would be better to use `try-with-resources` here: ``` GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); ByteBuffer expectedBinary = ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8)); recordBuilder.set("strbytes", expectedBinary); GenericData.Record expectedRecord = recordBuilder.build(); try (ParquetWriter writer = AvroParquetWriter.builder(new Path(testFile.toURI())) .withDataModel(GenericData.get()) .withSchema(avroSchema) .build()) { writer.write(expectedRecord); } ``` ## flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java: ## @@ -90,17 +92,72 @@ public void testTwoLevelList() throws IOException { .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, type)) .build()) { Iterator rows = reader.iterator(); - Assert.assertTrue("Should have at least one row", rows.hasNext()); + assertThat(rows.hasNext()).as("Should have at least one row").isTrue(); RowData rowData = rows.next(); - Assert.assertArrayEquals(rowData.getArray(0).getBinary(0), expectedByte); - Assert.assertArrayEquals(rowData.getBinary(1), expectedByte); - Assert.assertFalse("Should not have more than one row", rows.hasNext()); + assertThat(expectedByte).isEqualTo(rowData.getArray(0).getBinary(0)); + assertThat(expectedByte).isEqualTo(rowData.getBinary(1)); + assertThat(rows.hasNext()).as("Should not have more than one row").isFalse(); +} + } + + @Test + public void testReadBinaryFieldAsString() throws IOException { +Schema schemaForWriteBinary = new Schema(optional(1, "strbytes", Types.BinaryType.get())); +org.apache.avro.Schema avroSchema = AvroSchemaUtil.convert(schemaForWriteBinary.asStruct()); + +File testFile = temp.newFile(); +assertThat(testFile.delete()).isTrue(); + +ParquetWriter writer = +AvroParquetWriter.builder(new Path(testFile.toURI())) +.withDataModel(GenericData.get()) +.withSchema(avroSchema) +.build(); + +String expectedString = "hello"; + +GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); +ByteBuffer expectedBinary = ByteBuffer.wrap(expectedString.getBytes(StandardCharsets.UTF_8)); +recordBuilder.set("strbytes", expectedBinary); +GenericData.Record expectedRecord = recordBuilder.build(); + +writer.write(expectedRecord); +writer.close(); Review Comment: rather than calling `close()` manually, I think it would be better to use `try-with-resources` here: ``` GenericRecordBuilder recordBuilder = new GenericRecordBuilder(avroSchema); ByteBuffer expectedBinary = ByteBuffer.wrap(e
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]
jbonofre commented on PR #8788: URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761560686 I propose to close this PR in favor of #8830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]
ajantha-bhat closed pull request #8238: Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 URL: https://github.com/apache/iceberg/pull/8238 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]
ajantha-bhat commented on PR #8238: URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761663164 > I propose to close this PR in favor of https://github.com/apache/iceberg/pull/8830 Agree. Thanks for raising the PR to support the last JDK8 version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.15 [iceberg]
dependabot[bot] commented on PR #8238: URL: https://github.com/apache/iceberg/pull/8238#issuecomment-1761663265 OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]
dependabot[bot] commented on PR #8788: URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761667373 OK, I won't notify you again about this release, but will get in touch when a new version is available. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]
ajantha-bhat commented on PR #8788: URL: https://github.com/apache/iceberg/pull/8788#issuecomment-1761667236 I propose to close this PR in favor of https://github.com/apache/iceberg/pull/8830 Agree. Thanks for raising the PR to support the last JDK8 version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 [iceberg]
ajantha-bhat closed pull request #8788: Build: Bump jetty from 9.4.43.v20210629 to 11.0.17 URL: https://github.com/apache/iceberg/pull/8788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]
ajantha-bhat closed pull request #8811: Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 URL: https://github.com/apache/iceberg/pull/8811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]
dependabot[bot] commented on PR #8811: URL: https://github.com/apache/iceberg/pull/8811#issuecomment-1761669389 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump org.springframework:spring-web from 5.3.9 to 6.0.13 [iceberg]
ajantha-bhat commented on PR #8811: URL: https://github.com/apache/iceberg/pull/8811#issuecomment-1761669307 Closing in the favour of https://github.com/apache/iceberg/pull/8828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761680652 @atifiu File statistics are not accurate and can't be used any more if you use filters. For example, you have table (col int), the max of col is 100, and the min is 0, so the statistics file is ``` max min 1001 ``` If you have `SELECT MAX(col) FROM table`, we can check the statistics file and simple return 100, but if you have `SELECT MAX(col) FROM table WHERE col < 70`, we can't use the statistics file any more. We only know that the `MAX(col)` is smaller than 70, but we have no idea what value it is, so have to compute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761688659 @huaxingao so you meant to say that with filters whether on partitioned or non partitioned column(s), aggregate pushdown will not work ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
puchengy commented on PR #65: URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761695595 @amogh-jahagirdar The test result LGTM. However, I am not familiar with the implementation, please feel free to go ahead and merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Upgrade to spring-web 5.3.30 [iceberg]
nastra merged PR #8828: URL: https://github.com/apache/iceberg/pull/8828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Upgrade to Jetty 9.4.53.v20231009 [iceberg]
nastra closed issue #8829: Upgrade to Jetty 9.4.53.v20231009 URL: https://github.com/apache/iceberg/issues/8829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Upgrade to Jetty 9.4.53.v20231009 [iceberg]
nastra merged PR #8830: URL: https://github.com/apache/iceberg/pull/8830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Upgrade to spring-web 5.3.30 [iceberg]
nastra closed issue #8827: Upgrade to spring-web 5.3.30 URL: https://github.com/apache/iceberg/issues/8827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761909655 If filters are on partitioned columns, aggregate pushdown should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
atifiu commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1761916077 It's not working. Either with between, > or <. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Replace `.size() > 0` with `!.isEmpty()` [iceberg]
amogh-jahagirdar merged PR #8813: URL: https://github.com/apache/iceberg/pull/8813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Add View support for REST catalog [iceberg]
nastra commented on code in PR #7913: URL: https://github.com/apache/iceberg/pull/7913#discussion_r1358620356 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -374,4 +385,107 @@ static TableMetadata commit(TableOperations ops, UpdateTableRequest request) { return ops.current(); } + + public static BaseView baseView(View view) { +Preconditions.checkArgument(view instanceof BaseView, "View must be a BaseView"); +return (BaseView) view; + } + + public static ListTablesResponse listViews(ViewCatalog catalog, Namespace namespace) { +return ListTablesResponse.builder().addAll(catalog.listViews(namespace)).build(); + } + + public static LoadViewResponse createView( + ViewCatalog catalog, Namespace namespace, CreateViewRequest request) { +request.validate(); + +ViewMetadata viewMetadata = request.metadata(); +ViewBuilder viewBuilder = +catalog +.buildView(TableIdentifier.of(namespace, request.name())) +.withSchema(viewMetadata.schema()) +.withProperties(viewMetadata.properties()) + .withDefaultNamespace(viewMetadata.currentVersion().defaultNamespace()) + .withDefaultCatalog(viewMetadata.currentVersion().defaultCatalog()); +viewMetadata.currentVersion().representations().stream() +.filter(r -> r instanceof SQLViewRepresentation) +.map(r -> (SQLViewRepresentation) r) +.forEach(r -> viewBuilder.withQuery(r.dialect(), r.sql())); +View view = viewBuilder.create(); + +return viewResponse(view); + } + + private static LoadViewResponse viewResponse(View view) { +ViewMetadata metadata = baseView(view).operations().current(); +return ImmutableLoadViewResponse.builder() +.metadata(metadata) +.metadataLocation(metadata.metadataFileLocation()) +.build(); + } + + public static LoadViewResponse loadView(ViewCatalog catalog, TableIdentifier viewIdentifier) { +View view = catalog.loadView(viewIdentifier); +return viewResponse(view); + } + + public static LoadViewResponse updateView( + ViewCatalog catalog, TableIdentifier ident, UpdateTableRequest request) { +View view = catalog.loadView(ident); +ViewMetadata metadata = commit(baseView(view).operations(), request); + +return ImmutableLoadViewResponse.builder() +.metadata(metadata) +.metadataLocation(metadata.metadataFileLocation()) +.build(); + } + + public static void renameView(ViewCatalog catalog, RenameTableRequest request) { +catalog.renameView(request.source(), request.destination()); + } + + public static void dropView(ViewCatalog catalog, TableIdentifier viewIdentifier) { +boolean dropped = catalog.dropView(viewIdentifier); +if (!dropped) { + throw new NoSuchViewException("View does not exist: %s", viewIdentifier); +} + } + + static ViewMetadata commit(ViewOperations ops, UpdateTableRequest request) { +AtomicBoolean isRetry = new AtomicBoolean(false); +try { + Tasks.foreach(ops) + .retry(COMMIT_NUM_RETRIES_DEFAULT) + .exponentialBackoff( + COMMIT_MIN_RETRY_WAIT_MS_DEFAULT, + COMMIT_MAX_RETRY_WAIT_MS_DEFAULT, + COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT, + 2.0 /* exponential */) + .onlyRetryOn(CommitFailedException.class) + .run( + taskOps -> { +ViewMetadata base = isRetry.get() ? taskOps.refresh() : taskOps.current(); +isRetry.set(true); + +// apply changes +ViewMetadata.Builder metadataBuilder = ViewMetadata.buildFrom(base); +request.updates().forEach(update -> update.applyTo(metadataBuilder)); Review Comment: I've opened https://github.com/apache/iceberg/pull/8831 to introduce `AssertUUID` (and deprecate `AssertTableUUID`) so that we can use it both for tables and views. Alternatively, we could also introduce `AssertViewUUID` with the same functionality as `AssertTableUUID` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
rdblue commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358624936 ## pyiceberg/expressions/parser.py: ## @@ -233,11 +233,11 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: -return And(result[0][0], result[0][1]) +return And(*result[0]) Review Comment: Why does the parser produce more than 2 child expressions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
rdblue commented on code in PR #65: URL: https://github.com/apache/iceberg-python/pull/65#discussion_r1358624936 ## pyiceberg/expressions/parser.py: ## @@ -233,11 +233,11 @@ def handle_not(result: ParseResults) -> Not: def handle_and(result: ParseResults) -> And: -return And(result[0][0], result[0][1]) +return And(*result[0]) Review Comment: Why does the parser produce more than 2 child expressions? Is this a result of using `infix_notation` that I didn't realize at the time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
rdblue commented on PR #65: URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761946877 Thanks for jumping on the fix, @amogh-jahagirdar! @Fokko, should we release 0.5.1 with this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
Fokko commented on PR #65: URL: https://github.com/apache/iceberg-python/pull/65#issuecomment-1761997037 > @Fokko, should we release 0.5.1 with this patch? Yes, I think that's a good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Expression: Fix for when multiple and/or expressions are specified via string [iceberg-python]
Fokko merged PR #65: URL: https://github.com/apache/iceberg-python/pull/65 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] [BUG] string row filter ignore 2nd (and onwards) And [iceberg-python]
Fokko closed issue #64: [BUG] string row filter ignore 2nd (and onwards) And URL: https://github.com/apache/iceberg-python/issues/64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
Fokko commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762014542 @puchengy can you fix the CI? We need to make this part of 0.5.1 since the `spec_id `was there before (as you pointed out on Slack :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] AvroWriter Issue: Incorrect iceberg_to_avro Schema Conversion for Decimal, Fixed, and UUID [iceberg-python]
Fokko closed issue #14: AvroWriter Issue: Incorrect iceberg_to_avro Schema Conversion for Decimal, Fixed, and UUID URL: https://github.com/apache/iceberg-python/issues/14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Fix Iceberg to Avro Schema Conversion: Fixed, Decimal, UUID [iceberg-python]
Fokko merged PR #53: URL: https://github.com/apache/iceberg-python/pull/53 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
puchengy commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762024206 @Fokko Will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
Fokko commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762031697 @puchengy `make install && make lint` should fix it. It also looks like the integration test is failing because FastAvro is missing the `spec_id` (and this is correct because it isn't written, the test should be updated). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
puchengy commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762044854 @Fokko thanks I will address that today or tomorrow. If this becomes a blocker of the release, feel free to take it over. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
Fokko commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762064930 @puchengy let me give it a try -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add spec_id back to data file [iceberg-python]
Fokko commented on PR #63: URL: https://github.com/apache/iceberg-python/pull/63#issuecomment-1762074280 @puchengy I think this fixes it: https://github.com/puchengy/iceberg-python/pull/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] push down min/max/count to iceberg [iceberg]
huaxingao commented on PR #6252: URL: https://github.com/apache/iceberg/pull/6252#issuecomment-1762149585 @atifiu I suspect somehow your partition filter isn't completely pushed down. In this [PR](https://github.com/apache/iceberg/pull/6524), we will discard filters that can be completely evaluated using partition metadata. Could you check the EXPLAIN and see if there is still filter? If there is filter, then somehow the partition filter is not completely pushed down and Spark has to filter again, and statistics is not accurate in this case, and we can't push down aggregates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Update to the latest version [iceberg-python]
Fokko opened a new pull request, #67: URL: https://github.com/apache/iceberg-python/pull/67 Check if the dependencies are okay -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] allow override env-variables in load_catalog [iceberg-python]
Fokko merged PR #45: URL: https://github.com/apache/iceberg-python/pull/45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Bump version to 0.5.1 [iceberg-python]
Fokko opened a new pull request, #68: URL: https://github.com/apache/iceberg-python/pull/68 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Check for empty responses [iceberg-python]
Fokko opened a new pull request, #69: URL: https://github.com/apache/iceberg-python/pull/69 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Make `next_sequence_number` private [iceberg-python]
Fokko merged PR #62: URL: https://github.com/apache/iceberg-python/pull/62 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Fix fixed type [iceberg-python]
Fokko opened a new pull request, #70: URL: https://github.com/apache/iceberg-python/pull/70 Should be `FIXED_LEN_BYTE_ARRAY` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: manifest list writer [iceberg-rust]
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1358866184 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -940,4 +1017,104 @@ mod test { r#"[{"manifest_path":"s3a://icebergdata/demo/s1/t1/metadata/05ffe08b-810f-49b3-a8f4-e88fc99b254a-m0.avro","manifest_length":6926,"partition_spec_id":0,"content":0,"sequence_number":1,"min_sequence_number":1,"added_snapshot_id":377075049360453639,"added_data_files_count":1,"existing_data_files_count":0,"deleted_data_files_count":0,"added_rows_count":3,"existing_rows_count":0,"deleted_rows_count":0,"partitions":[{"contains_null":false,"contains_nan":false,"lower_bound":[1,0,0,0,0,0,0,0],"upper_bound":[1,0,0,0,0,0,0,0]}],"key_metadata":null}]"# ); } + +#[tokio::test] +async fn test_manifest_list_writer_v1() { +let expected_manifest_list = ManifestList { +entries: vec![ManifestListEntry { +manifest_path: "/opt/bitnami/spark/warehouse/db/table/metadata/10d28031-9739-484c-92db-cdf2975cead4-m0.avro".to_string(), +manifest_length: 5806, +partition_spec_id: 0, +content: ManifestContentType::Data, +sequence_number: 0, +min_sequence_number: 0, +added_snapshot_id: 1646658105718557341, +added_data_files_count: Some(3), +existing_data_files_count: Some(0), +deleted_data_files_count: Some(0), +added_rows_count: Some(3), +existing_rows_count: Some(0), +deleted_rows_count: Some(0), +partitions: vec![], +key_metadata: vec![], +}] +}; + +let temp_dir = TempDir::new("manifest_list_v1").unwrap(); +let path = temp_dir.path().join("manifest_list_v1.avro"); +let io = FileIOBuilder::new_fs_io().build().unwrap(); +let output_file = io.new_output(path.to_str().unwrap()).unwrap(); + +let mut metadata = HashMap::new(); +metadata.insert(String::from("format-version"), String::from("1")); +let mut writer = +ManifestListWriter::new(output_file, crate::spec::FormatVersion::V1, metadata); +writer +.add_manifests(expected_manifest_list.entries.clone().into_iter()) +.unwrap(); +writer.close().await.unwrap(); + +let bs = fs::read(path).unwrap(); +let manifest_list = ManifestList::parse_with_version( +&bs, +crate::spec::FormatVersion::V1, +&StructType::new(vec![]), Review Comment: Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] feat: manifest list writer [iceberg-rust]
barronw commented on code in PR #76: URL: https://github.com/apache/iceberg-rust/pull/76#discussion_r1358866633 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -69,6 +73,25 @@ impl ManifestList { &self.entries } +/// Get the v1 schema of the manifest list entry. +pub(crate) fn v1_schema() -> Schema { Review Comment: I moved the schema definitions into `_const_fields`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org