Re: [PR] mrοΌFix issues 10639 [iceberg]
pvary commented on PR #10661: URL: https://github.com/apache/iceberg/pull/10661#issuecomment-2226801595 > compile jdk8, run jdk11, but we are actively working on jdk11 compile now, that should be in Hive-4.1.0 release When is Hive 4.1.0 planned? You might want to find and comment on the deprecation thread for Hive when it is ready: https://lists.apache.org/thread/bn2c480wdbzr089p88n1003zhw0nj9kv -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] mrοΌFix issues 10639 [iceberg]
lurnagao-dahua commented on code in PR #10661: URL: https://github.com/apache/iceberg/pull/10661#discussion_r1674923132 ## mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java: ## @@ -381,6 +386,56 @@ public void testCustomCatalog() throws IOException { testInputFormat.create(builder.conf()).validate(expectedRecords); } + @TestTemplate + public void testWorkerPool() throws Exception { +Table table = helper.createUnpartitionedTable(); +List records = helper.generateRandomRecords(1, 0L); +helper.appendToTable(null, records); Review Comment: Thank you for your guidance. I have further optimized the unit test cases! Could you please take a review when you have time again? I would be very grateful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] refine: move binary serialize in literal to datum [iceberg-rust]
ZENOTME commented on PR #456: URL: https://github.com/apache/iceberg-rust/pull/456#issuecomment-2226828202 cc @liurenjie1024 @Xuanwo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794013 ## crates/iceberg/testdata/view_metadata/ViewMetadataV2Valid.json: ## @@ -0,0 +1,58 @@ +{ Review Comment: Ok, thanks Eduard for the Feedback. My preferred way of going forward would be: 1. Merge this PR which just contains the structs & (de)serialization (very much like the `TableMetadaData` in its current state) 2. Figure out a good way for the `TableMetadataBuilder` including partition binding. We have a first shot ready that we can create a PR for. 3. Use the same pattern we use for the `TableMetadataBuilder` for the much lighter `ViewMetadataBuilder` which then includes all the tests Eduard mentioned. I am aware that a lot of things are missing in the builder. It was a deliberate decision from me to get views up-to-speed with tables first, and then in a second step extend both tables and views features to the java level. @ZENOTME , @nastra would that be OK for you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794680 ## crates/iceberg/src/spec/view_metadata.rs: ## @@ -0,0 +1,682 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Defines the [view metadata](https://iceberg.apache.org/view-spec/#view-metadata). +//! The main struct here is [ViewMetadata] which defines the data for a view. + +use serde::{Deserialize, Serialize}; +use serde_repr::{Deserialize_repr, Serialize_repr}; +use std::cmp::Ordering; +use std::fmt::{Display, Formatter}; +use std::{collections::HashMap, sync::Arc}; +use uuid::Uuid; + +use super::{ +view_version::{ViewVersion, ViewVersionRef}, +SchemaId, SchemaRef, +}; +use crate::catalog::ViewCreation; +use crate::error::Result; + +use _serde::ViewMetadataEnum; + +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; + +/// Reference to [`ViewMetadata`]. +pub type ViewMetadataRef = Arc; + +#[derive(Debug, PartialEq, Deserialize, Eq, Clone)] +#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")] +/// Fields for the version 1 of the view metadata. +/// +/// We assume that this data structure is always valid, so we will panic when invalid error happens. +/// We check the validity of this data structure when constructing. +pub struct ViewMetadata { +/// Integer Version for the format. +pub(crate) format_version: ViewFormatVersion, +/// A UUID that identifies the view, generated when the view is created. +pub(crate) view_uuid: Uuid, +/// The view's base location; used to create metadata file locations +pub(crate) location: String, +/// ID of the current version of the view (version-id) +pub(crate) current_version_id: i64, +/// A list of known versions of the view +pub(crate) versions: HashMap, +/// A list of version log entries with the timestamp and version-id for every +/// change to current-version-id +pub(crate) version_log: Vec, Review Comment: Hm, in the iceberg spec its called versions & version-log https://iceberg.apache.org/view-spec/#view-metadata. I implemented the same way `TableMetadata` is currently implemented: Calling the struct field identically to the spec, but then creating a history accessor. TableMetadata: https://github.com/c-thiel/iceberg-rust/blob/ca9de89ac9d95683c8fe9191f72ab922dc4c7672/crates/iceberg/src/spec/table_metadata.rs#L208-L211 ViewMetadata: https://github.com/c-thiel/iceberg-rust/blob/44630160be1bcf48249c31006b76a7150a029619/crates/iceberg/src/spec/view_metadata.rs#L151-L156 As the `versions` field is not public, it actually implements almost the same interface as java. @ZENOTME or some other rust dev, it would be great to get some opinion on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core:support http lock-manager [iceberg]
snazy commented on PR #10688: URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226834838 It seems there are lot of individual points that should be discussed. @BsoBird do you mind adding a point to the "Discussions" section on the [agenda for the next community sync on July 31st](https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit)? Let me comment on some of the points. TL;DR I think it's not an easy problem to solve. > balance this with the use of lockManager (which in itself is not a difficult task). Locking is one of the most difficult problems in distributed computing. (And actually in non-distributed computing, too.) Just very few things that come to my mind (the real list is much longer): * Evil users lock all your tables (-> authorization) * Retry/back-off -> piling up waiting requests -> threading issues * Crashed clients leave stuff locked * "all the things" about race conditions > ... a large number of users use GlueCatalog ... want to access both iceberg and non-iceberg tables in Glue. That's IMO a fair point. > 2.You might think that users could solve this problem by implementing rest-catalog. However, many users do not have the ability to implement rest-catalog.Also, rest-catalog is still evolving, and many of the specifications have not yet been finalised. I suspect, this wouldn't be an issue, if Glue would have a rest-catalog? > 3.Regarding the fact that all clients need to be involved in the locking in order for it to work. I think this is a problem that users need to solve themselves. Similar to my statement above, users should not be forced into implementing/providing a proper distributed locking mechanism. Also isolation levels play a role here. Some use cases are probably fine with "dirty reads", some with "read committed" and others require full "serializeable" guarantees. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676795328 ## crates/iceberg/src/spec/view_metadata.rs: ## @@ -0,0 +1,682 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Defines the [view metadata](https://iceberg.apache.org/view-spec/#view-metadata). +//! The main struct here is [ViewMetadata] which defines the data for a view. + +use serde::{Deserialize, Serialize}; +use serde_repr::{Deserialize_repr, Serialize_repr}; +use std::cmp::Ordering; +use std::fmt::{Display, Formatter}; +use std::{collections::HashMap, sync::Arc}; +use uuid::Uuid; + +use super::{ +view_version::{ViewVersion, ViewVersionRef}, +SchemaId, SchemaRef, +}; +use crate::catalog::ViewCreation; +use crate::error::Result; + +use _serde::ViewMetadataEnum; + +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; + +/// Reference to [`ViewMetadata`]. +pub type ViewMetadataRef = Arc; + +#[derive(Debug, PartialEq, Deserialize, Eq, Clone)] +#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")] +/// Fields for the version 1 of the view metadata. +/// +/// We assume that this data structure is always valid, so we will panic when invalid error happens. +/// We check the validity of this data structure when constructing. +pub struct ViewMetadata { +/// Integer Version for the format. +pub(crate) format_version: ViewFormatVersion, +/// A UUID that identifies the view, generated when the view is created. +pub(crate) view_uuid: Uuid, +/// The view's base location; used to create metadata file locations +pub(crate) location: String, +/// ID of the current version of the view (version-id) +pub(crate) current_version_id: i64, +/// A list of known versions of the view +pub(crate) versions: HashMap, +/// A list of version log entries with the timestamp and version-id for every +/// change to current-version-id +pub(crate) version_log: Vec, +/// A list of schemas, stored as objects with schema-id. +pub(crate) schemas: HashMap, +/// A string to string map of view properties. +/// Properties are used for metadata such as comment and for settings that +/// affect view maintenance. This is not intended to be used for arbitrary metadata. +pub(crate) properties: HashMap, +} + +impl ViewMetadata { +/// Returns format version of this metadata. +#[inline] +pub fn format_version(&self) -> ViewFormatVersion { +self.format_version +} + +/// Returns uuid of current view. +#[inline] +pub fn uuid(&self) -> Uuid { +self.view_uuid +} + +/// Returns view location. +#[inline] +pub fn location(&self) -> &str { +self.location.as_str() +} + +/// Returns the current version id. +#[inline] +pub fn current_version_id(&self) -> i64 { +self.current_version_id +} + +/// Returns all view versions. +#[inline] +pub fn versions(&self) -> impl Iterator { +self.versions.values() +} + +/// Lookup a view version by id. +#[inline] +pub fn version_by_id(&self, version_id: i64) -> Option<&ViewVersionRef> { +self.versions.get(&version_id) +} + +/// Returns the current view version. +#[inline] +pub fn current_version(&self) -> &ViewVersionRef { +self.versions +.get(&self.current_version_id) +.expect("Current version id set, but not found in view versions") +} + +/// Returns schemas +#[inline] +pub fn schemas_iter(&self) -> impl Iterator { +self.schemas.values() +} + +/// Lookup schema by id. +#[inline] +pub fn schema_by_id(&self, schema_id: SchemaId) -> Option<&SchemaRef> { +self.schemas.get(&schema_id) +} + +/// Get current schema +#[inline] +pub fn current_schema(&self) -> &SchemaRef { +let schema_id = self.current_version().schema_id(); +self.schema_by_id(schema_id) +.expect("Current schema id set, but not found in view metadata") +} + +/// Returns properties of the view. +#[inline] +pub fn properties(&self) -> &HashMap { +&self.properties +} + +/// A
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676795561 ## crates/iceberg/src/spec/view_metadata.rs: ## @@ -0,0 +1,682 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Defines the [view metadata](https://iceberg.apache.org/view-spec/#view-metadata). +//! The main struct here is [ViewMetadata] which defines the data for a view. + +use serde::{Deserialize, Serialize}; +use serde_repr::{Deserialize_repr, Serialize_repr}; +use std::cmp::Ordering; +use std::fmt::{Display, Formatter}; +use std::{collections::HashMap, sync::Arc}; +use uuid::Uuid; + +use super::{ +view_version::{ViewVersion, ViewVersionRef}, +SchemaId, SchemaRef, +}; +use crate::catalog::ViewCreation; +use crate::error::Result; + +use _serde::ViewMetadataEnum; + +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; + +/// Reference to [`ViewMetadata`]. +pub type ViewMetadataRef = Arc; + +#[derive(Debug, PartialEq, Deserialize, Eq, Clone)] +#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")] +/// Fields for the version 1 of the view metadata. +/// +/// We assume that this data structure is always valid, so we will panic when invalid error happens. +/// We check the validity of this data structure when constructing. +pub struct ViewMetadata { +/// Integer Version for the format. +pub(crate) format_version: ViewFormatVersion, +/// A UUID that identifies the view, generated when the view is created. +pub(crate) view_uuid: Uuid, +/// The view's base location; used to create metadata file locations +pub(crate) location: String, +/// ID of the current version of the view (version-id) +pub(crate) current_version_id: i64, +/// A list of known versions of the view +pub(crate) versions: HashMap, +/// A list of version log entries with the timestamp and version-id for every +/// change to current-version-id +pub(crate) version_log: Vec, +/// A list of schemas, stored as objects with schema-id. +pub(crate) schemas: HashMap, +/// A string to string map of view properties. +/// Properties are used for metadata such as comment and for settings that +/// affect view maintenance. This is not intended to be used for arbitrary metadata. +pub(crate) properties: HashMap, +} + +impl ViewMetadata { +/// Returns format version of this metadata. +#[inline] +pub fn format_version(&self) -> ViewFormatVersion { +self.format_version +} + +/// Returns uuid of current view. +#[inline] +pub fn uuid(&self) -> Uuid { +self.view_uuid +} + +/// Returns view location. +#[inline] +pub fn location(&self) -> &str { +self.location.as_str() +} + +/// Returns the current version id. +#[inline] +pub fn current_version_id(&self) -> i64 { +self.current_version_id +} + +/// Returns all view versions. +#[inline] +pub fn versions(&self) -> impl Iterator { +self.versions.values() +} + +/// Lookup a view version by id. +#[inline] +pub fn version_by_id(&self, version_id: i64) -> Option<&ViewVersionRef> { +self.versions.get(&version_id) +} + +/// Returns the current view version. +#[inline] +pub fn current_version(&self) -> &ViewVersionRef { +self.versions +.get(&self.current_version_id) +.expect("Current version id set, but not found in view versions") +} + +/// Returns schemas +#[inline] +pub fn schemas_iter(&self) -> impl Iterator { +self.schemas.values() +} + +/// Lookup schema by id. +#[inline] +pub fn schema_by_id(&self, schema_id: SchemaId) -> Option<&SchemaRef> { +self.schemas.get(&schema_id) +} + +/// Get current schema +#[inline] +pub fn current_schema(&self) -> &SchemaRef { +let schema_id = self.current_version().schema_id(); +self.schema_by_id(schema_id) +.expect("Current schema id set, but not found in view metadata") +} + +/// Returns properties of the view. +#[inline] +pub fn properties(&self) -> &HashMap { +&self.properties +} + +/// A
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794013 ## crates/iceberg/testdata/view_metadata/ViewMetadataV2Valid.json: ## @@ -0,0 +1,58 @@ +{ Review Comment: Ok, thanks Eduard for the Feedback. My preferred way of going forward would be: 1. Merge this PR which just contains the structs & (de)serialization (very much like the `TableMetadaData` in its current state) 2. Figure out a good way for the `TableMetadataBuilder` including partition binding. We have a first shot ready that we can create a PR for. 3. Use the same pattern we use for the `TableMetadataBuilder` for the much lighter `ViewMetadataBuilder` which then includes all the tests Eduard mentioned. I am aware that a lot of things are missing in the builder. It was a deliberate decision from me to get views up-to-speed with tables first, and then in a second step extend both tables and views features to the java level. This way we have a handleable PR here that just takes care or (ser)de and can build on it in a next step. @ZENOTME , @nastra would that be OK for you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676796496 ## crates/iceberg/src/spec/view_version.rs: ## @@ -0,0 +1,347 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/*! + * View Versions! +*/ +use crate::error::Result; +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; +use std::sync::Arc; +use typed_builder::TypedBuilder; + +use super::view_metadata::ViewVersionLog; +use crate::catalog::NamespaceIdent; +use crate::spec::{SchemaId, SchemaRef, ViewMetadata}; +use crate::{Error, ErrorKind}; +use _serde::ViewVersionV1; + +/// Reference to [`ViewVersion`]. +pub type ViewVersionRef = Arc; + +#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)] +#[serde(from = "ViewVersionV1", into = "ViewVersionV1")] +#[builder(field_defaults(setter(prefix = "with_")))] +/// A view versions represents the definition of a view at a specific point in time. +pub struct ViewVersion { +/// A unique long ID +version_id: i64, +/// ID of the schema for the view version +schema_id: SchemaId, +/// Timestamp when the version was created (ms from epoch) +timestamp_ms: i64, +/// A string to string map of summary metadata about the version +summary: HashMap, +/// A list of representations for the view definition. +representations: ViewRepresentations, +/// Catalog name to use when a reference in the SELECT does not contain a catalog +#[builder(default = None)] +default_catalog: Option, +/// Namespace to use when a reference in the SELECT is a single identifier +default_namespace: NamespaceIdent, +} + +impl ViewVersion { +/// Get the version id of this view version. +#[inline] +pub fn version_id(&self) -> i64 { +self.version_id +} + +/// Get the schema id of this view version. +#[inline] +pub fn schema_id(&self) -> SchemaId { +self.schema_id +} + +/// Get the timestamp of when the view version was created +#[inline] +pub fn timestamp(&self) -> MappedLocalTime> { +Utc.timestamp_millis_opt(self.timestamp_ms) +} + +/// Get the timestamp of when the view version was created in milliseconds since epoch +#[inline] +pub fn timestamp_ms(&self) -> i64 { +self.timestamp_ms +} + +/// Get summary of the view version +#[inline] +pub fn summary(&self) -> &HashMap { +&self.summary +} + +/// Get this views representations +#[inline] +pub fn representations(&self) -> &ViewRepresentations { +&self.representations +} + +/// Get the default catalog for this view version +#[inline] +pub fn default_catalog(&self) -> Option<&String> { +self.default_catalog.as_ref() +} + +/// Get the default namespace to use when a reference in the SELECT is a single identifier +#[inline] +pub fn default_namespace(&self) -> &NamespaceIdent { +&self.default_namespace +} + +/// Get the schema of this snapshot. +pub fn schema(&self, view_metadata: &ViewMetadata) -> Result { +let r = view_metadata +.schema_by_id(self.schema_id()) +.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +format!("Schema with id {} not found", self.schema_id()), +) +}) +.cloned(); +r +} + +pub(crate) fn log(&self) -> ViewVersionLog { +ViewVersionLog::new(self.version_id, self.timestamp_ms) +} +} + +/// A list of view representations. +pub type ViewRepresentations = Vec; Review Comment: Normally not. This is looking forward to the ViewMetadataBuilder where we want to implement methods on those. The motivation is that multiple `ViewRepresentations` are more than just a Vec - there are many rules in constructing it (i.e. no duplicate dialects). We shouldn't just expose the Vec interface externally because then users could just add an item to the list without going through any of our validation functions (which are not there yet, will come with the V
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676800028 ## crates/iceberg/src/spec/view_version.rs: ## @@ -0,0 +1,347 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/*! + * View Versions! +*/ +use crate::error::Result; +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; +use std::sync::Arc; +use typed_builder::TypedBuilder; + +use super::view_metadata::ViewVersionLog; +use crate::catalog::NamespaceIdent; +use crate::spec::{SchemaId, SchemaRef, ViewMetadata}; +use crate::{Error, ErrorKind}; +use _serde::ViewVersionV1; + +/// Reference to [`ViewVersion`]. +pub type ViewVersionRef = Arc; + +#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)] +#[serde(from = "ViewVersionV1", into = "ViewVersionV1")] +#[builder(field_defaults(setter(prefix = "with_")))] +/// A view versions represents the definition of a view at a specific point in time. +pub struct ViewVersion { +/// A unique long ID +version_id: i64, +/// ID of the schema for the view version +schema_id: SchemaId, +/// Timestamp when the version was created (ms from epoch) +timestamp_ms: i64, +/// A string to string map of summary metadata about the version +summary: HashMap, +/// A list of representations for the view definition. +representations: ViewRepresentations, +/// Catalog name to use when a reference in the SELECT does not contain a catalog +#[builder(default = None)] +default_catalog: Option, +/// Namespace to use when a reference in the SELECT is a single identifier +default_namespace: NamespaceIdent, +} + +impl ViewVersion { +/// Get the version id of this view version. +#[inline] +pub fn version_id(&self) -> i64 { +self.version_id +} + +/// Get the schema id of this view version. +#[inline] +pub fn schema_id(&self) -> SchemaId { +self.schema_id +} + +/// Get the timestamp of when the view version was created +#[inline] +pub fn timestamp(&self) -> MappedLocalTime> { +Utc.timestamp_millis_opt(self.timestamp_ms) +} + +/// Get the timestamp of when the view version was created in milliseconds since epoch +#[inline] +pub fn timestamp_ms(&self) -> i64 { +self.timestamp_ms +} + +/// Get summary of the view version +#[inline] +pub fn summary(&self) -> &HashMap { +&self.summary +} + +/// Get this views representations +#[inline] +pub fn representations(&self) -> &ViewRepresentations { +&self.representations +} + +/// Get the default catalog for this view version +#[inline] +pub fn default_catalog(&self) -> Option<&String> { +self.default_catalog.as_ref() +} + +/// Get the default namespace to use when a reference in the SELECT is a single identifier +#[inline] +pub fn default_namespace(&self) -> &NamespaceIdent { +&self.default_namespace +} + +/// Get the schema of this snapshot. +pub fn schema(&self, view_metadata: &ViewMetadata) -> Result { +let r = view_metadata +.schema_by_id(self.schema_id()) +.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +format!("Schema with id {} not found", self.schema_id()), +) +}) +.cloned(); +r +} + +pub(crate) fn log(&self) -> ViewVersionLog { +ViewVersionLog::new(self.version_id, self.timestamp_ms) +} +} + +/// A list of view representations. +pub type ViewRepresentations = Vec; Review Comment: Fix: https://github.com/c-thiel/iceberg-rust/commit/2b427e511dbeb3dfa6292af6983ee8d0b8bc48ad -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Core:support http lock-manager [iceberg]
BsoBird commented on PR #10688: URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226845799 @snazy Hello.I'm so glad you could get back to me. I'm just saying out loud the first thing that comes to mind: 1. Is everything done using rest-catalog? In many cases, users have their own server rooms, they are not using cloud products, and they do not intend to expose iceberg directly to untrusted users. In a network-isolated environment, there seems to be no need to introduce a separate rest-catalog to add complexity. 2. For more complex access environments, rest-catalog is effective.I agree. 3. Currently, the use of lockManager is tied to the implementation of a certain catalog.Similar to aws-iceberg, we don't actually force the user to use lockManager. The user knows what he's doing when he chooses a catalog. Basically, catalog's author has solved the concurrency problem and the race condition in the code of the catalog that uses lockManager. After all, he wouldn't need to use lockManager if he didn't solve these problems. 4. Regarding the security of lockManager. This is something that should actually be considered within some kind of extension module.For example, iceberg-aws provides some implementations of lockManager, and the security of such lockManager must be considered by them.iceberg-core shouldn't care about such issues. After all, safety standards are not uniform.The details are varied. From what I've seen so far, the opinion of many people in the community is that the lock-manager should be extended and maintained by users if they want it, just like the aws-iceberg module. In iceberg-core, the implementation of lock-manager should only stop at InMemLockManager. But isn't that too little? Wouldn't it be nice to simply extend something like http-lock-manager for users?After all, it's small and doesn't introduce any other dependencies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676801198 ## crates/iceberg/src/spec/view_version.rs: ## @@ -0,0 +1,347 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/*! + * View Versions! +*/ +use crate::error::Result; +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; +use std::sync::Arc; +use typed_builder::TypedBuilder; + +use super::view_metadata::ViewVersionLog; +use crate::catalog::NamespaceIdent; +use crate::spec::{SchemaId, SchemaRef, ViewMetadata}; +use crate::{Error, ErrorKind}; +use _serde::ViewVersionV1; + +/// Reference to [`ViewVersion`]. +pub type ViewVersionRef = Arc; + +#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)] +#[serde(from = "ViewVersionV1", into = "ViewVersionV1")] +#[builder(field_defaults(setter(prefix = "with_")))] +/// A view versions represents the definition of a view at a specific point in time. +pub struct ViewVersion { +/// A unique long ID +version_id: i64, +/// ID of the schema for the view version +schema_id: SchemaId, +/// Timestamp when the version was created (ms from epoch) +timestamp_ms: i64, +/// A string to string map of summary metadata about the version +summary: HashMap, +/// A list of representations for the view definition. +representations: ViewRepresentations, +/// Catalog name to use when a reference in the SELECT does not contain a catalog +#[builder(default = None)] +default_catalog: Option, +/// Namespace to use when a reference in the SELECT is a single identifier +default_namespace: NamespaceIdent, +} + +impl ViewVersion { +/// Get the version id of this view version. +#[inline] +pub fn version_id(&self) -> i64 { +self.version_id +} + +/// Get the schema id of this view version. +#[inline] +pub fn schema_id(&self) -> SchemaId { +self.schema_id +} + +/// Get the timestamp of when the view version was created +#[inline] +pub fn timestamp(&self) -> MappedLocalTime> { +Utc.timestamp_millis_opt(self.timestamp_ms) +} + +/// Get the timestamp of when the view version was created in milliseconds since epoch +#[inline] +pub fn timestamp_ms(&self) -> i64 { +self.timestamp_ms +} + +/// Get summary of the view version +#[inline] +pub fn summary(&self) -> &HashMap { +&self.summary +} + +/// Get this views representations +#[inline] +pub fn representations(&self) -> &ViewRepresentations { +&self.representations +} + +/// Get the default catalog for this view version +#[inline] +pub fn default_catalog(&self) -> Option<&String> { +self.default_catalog.as_ref() +} + +/// Get the default namespace to use when a reference in the SELECT is a single identifier +#[inline] +pub fn default_namespace(&self) -> &NamespaceIdent { +&self.default_namespace +} + +/// Get the schema of this snapshot. +pub fn schema(&self, view_metadata: &ViewMetadata) -> Result { +let r = view_metadata +.schema_by_id(self.schema_id()) +.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +format!("Schema with id {} not found", self.schema_id()), +) +}) +.cloned(); +r +} + +pub(crate) fn log(&self) -> ViewVersionLog { +ViewVersionLog::new(self.version_id, self.timestamp_ms) +} +} + +/// A list of view representations. +pub type ViewRepresentations = Vec; + +/// A builder for [`ViewRepresentations`]. +pub struct ViewRepresentationsBuilder(ViewRepresentations); + +impl ViewRepresentationsBuilder { +/// Create a new builder. +pub fn new() -> Self { +Self(Vec::new()) +} + +/// Add a representation to the list. +/// +/// SQL representations dialects must be unique. If a representation with the same Review Comment: No, it did not. I fixed it here: https://github.com/c-thiel/iceberg-rust/commit/bf95eb7c5c458e0
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676802744 ## crates/iceberg/src/spec/view_version.rs: ## @@ -0,0 +1,347 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +/*! + * View Versions! +*/ +use crate::error::Result; +use chrono::{DateTime, MappedLocalTime, TimeZone, Utc}; +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; +use std::sync::Arc; +use typed_builder::TypedBuilder; + +use super::view_metadata::ViewVersionLog; +use crate::catalog::NamespaceIdent; +use crate::spec::{SchemaId, SchemaRef, ViewMetadata}; +use crate::{Error, ErrorKind}; +use _serde::ViewVersionV1; + +/// Reference to [`ViewVersion`]. +pub type ViewVersionRef = Arc; + +#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)] +#[serde(from = "ViewVersionV1", into = "ViewVersionV1")] +#[builder(field_defaults(setter(prefix = "with_")))] +/// A view versions represents the definition of a view at a specific point in time. +pub struct ViewVersion { +/// A unique long ID +version_id: i64, +/// ID of the schema for the view version +schema_id: SchemaId, +/// Timestamp when the version was created (ms from epoch) +timestamp_ms: i64, +/// A string to string map of summary metadata about the version +summary: HashMap, +/// A list of representations for the view definition. +representations: ViewRepresentations, +/// Catalog name to use when a reference in the SELECT does not contain a catalog +#[builder(default = None)] +default_catalog: Option, +/// Namespace to use when a reference in the SELECT is a single identifier +default_namespace: NamespaceIdent, +} + +impl ViewVersion { +/// Get the version id of this view version. +#[inline] +pub fn version_id(&self) -> i64 { +self.version_id +} + +/// Get the schema id of this view version. +#[inline] +pub fn schema_id(&self) -> SchemaId { +self.schema_id +} + +/// Get the timestamp of when the view version was created +#[inline] +pub fn timestamp(&self) -> MappedLocalTime> { +Utc.timestamp_millis_opt(self.timestamp_ms) +} + +/// Get the timestamp of when the view version was created in milliseconds since epoch +#[inline] +pub fn timestamp_ms(&self) -> i64 { +self.timestamp_ms +} + +/// Get summary of the view version +#[inline] +pub fn summary(&self) -> &HashMap { +&self.summary +} + +/// Get this views representations +#[inline] +pub fn representations(&self) -> &ViewRepresentations { +&self.representations +} + +/// Get the default catalog for this view version +#[inline] +pub fn default_catalog(&self) -> Option<&String> { +self.default_catalog.as_ref() +} + +/// Get the default namespace to use when a reference in the SELECT is a single identifier +#[inline] +pub fn default_namespace(&self) -> &NamespaceIdent { +&self.default_namespace +} + +/// Get the schema of this snapshot. +pub fn schema(&self, view_metadata: &ViewMetadata) -> Result { +let r = view_metadata +.schema_by_id(self.schema_id()) +.ok_or_else(|| { +Error::new( +ErrorKind::DataInvalid, +format!("Schema with id {} not found", self.schema_id()), +) +}) +.cloned(); +r +} + +pub(crate) fn log(&self) -> ViewVersionLog { +ViewVersionLog::new(self.version_id, self.timestamp_ms) +} +} + +/// A list of view representations. +pub type ViewRepresentations = Vec; + +/// A builder for [`ViewRepresentations`]. +pub struct ViewRepresentationsBuilder(ViewRepresentations); + +impl ViewRepresentationsBuilder { +/// Create a new builder. +pub fn new() -> Self { +Self(Vec::new()) +} + +/// Add a representation to the list. +/// +/// SQL representations dialects must be unique. If a representation with the same +/// dialect already exists, it will be overwritten. +pub fn add_or_overwrite_representation(mut self, represen
Re: [PR] View Spec implementation [iceberg-rust]
c-thiel commented on PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2226849477 @nastra I hope I addressed all your issues regarding functionality. If you are happy with the implementations, feel free to close our discussions :). @ZENOTME it would now be great to get some Feedback on the rust side of things. Please check my discussions with Eduard above. There are a few open points regarding naming and also the introduction of types. It would be great to get a review from you or some other Rust Maintainer for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Prevent dropping column which is referenced by active partition⦠[iceberg]
advancedxy commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676818347 ## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } +Map> specToDeletes = Maps.newHashMap(); Review Comment: > One aspect is on ordering of these. There are two ways of going about this: I don't think the ordering of getting these two functionality into the master really matter that much? As long as they are both merged before the next release(assuming we are saying Iceberg 1.7 release). I imagine customer should use official released versions. > Here's a trivial case: Imagine a user just creates the table and they don't write any data. Then they realize they want to drop a partition column, but the procedure will fail unexpectedly whereas before it would work. If preventing dropping active partition source field and RemoveUnusedPartitionSpec are both landed. It's still possible to drop the just created table but with some additional but necessary steps: 1. first remove the unwanted partition field, which will create a new PartitionSpec 2. Call RemoveUnusedPartitionSpec, which should be able to remove the previous wrongly partition spec 3. remove the wrongly partition source field. > First get in the RemovedUnusedPartitionSpec procedure and then prevent dropping if it's part of a spec. The downside of this is, it may take some more time in to get the whole API in? I think we can parallelize these two PRs if others all agree that's in the right direction. BTW, I can help to work on https://github.com/apache/iceberg/pull/3462/files to get it merged in case @RussellSpitzer is busy and cannot work on that recently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core: Prevent dropping column which is referenced by active partition⦠[iceberg]
advancedxy commented on PR #10352: URL: https://github.com/apache/iceberg/pull/10352#issuecomment-2226883163 > Sorry about the delay on this, got busy and forgot I had this open! I've seen more related issue reports to this, so I'm going to prioritize it. Well understood, and thanks for your effort on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Set boolean values in table properties to lowercase string [iceberg-python]
soumya-ghosh opened a new pull request, #924: URL: https://github.com/apache/iceberg-python/pull/924 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Set properties boolean value to lowercase string [iceberg-python]
soumya-ghosh commented on issue #919: URL: https://github.com/apache/iceberg-python/issues/919#issuecomment-2226888124 @Fokko @kevinjqliu PR https://github.com/apache/iceberg-python/pull/924 is ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Add Snowflake catalog [iceberg-python]
prabodh1194 commented on PR #687: URL: https://github.com/apache/iceberg-python/pull/687#issuecomment-2226959566 @Fokko , considering that snowflake's rest catalog docs are still not released to public, we can only wait for it. However, I believe that this PR will continue to be relevant considering this catalog has been GA for quite sometime and won't go away any time soon π . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core:support http lock-manager [iceberg]
BsoBird commented on code in PR #10688: URL: https://github.com/apache/iceberg/pull/10688#discussion_r1675607925 ## core/src/main/java/org/apache/iceberg/lock/ServerSideHttpLockManager.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.lock; + +import java.net.URI; +import java.net.URISyntaxException; +import java.nio.charset.StandardCharsets; +import java.util.Base64; +import java.util.Map; +import org.apache.http.HttpResponse; +import org.apache.http.StatusLine; +import org.apache.http.client.HttpClient; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.client.protocol.RequestAcceptEncoding; +import org.apache.http.client.protocol.ResponseContentEncoding; +import org.apache.http.client.utils.URIBuilder; +import org.apache.http.impl.client.DefaultHttpClient; +import org.apache.http.params.BasicHttpParams; +import org.apache.http.params.CoreConnectionPNames; +import org.apache.http.params.HttpParams; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.util.LockManagers; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The user is required to provide an API interface with distributed locking functionality as + * agreed. + */ +public class ServerSideHttpLockManager extends LockManagers.BaseLockManager { Review Comment: In this implementation, we only use a very simple GET request and do not consider anything related to authentication.This is because I don't see the need to protect this distributed locking service.After all, it only does locking/unlocking. I don't think it's too difficult for users to provide a rest-api service that meets the requirements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Core:support http lock-manager [iceberg]
BsoBird commented on code in PR #10688: URL: https://github.com/apache/iceberg/pull/10688#discussion_r1676850504 ## core/src/main/java/org/apache/iceberg/lock/ServerSideHttpLockManager.java: ## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.iceberg.lock; + +import java.net.URI; +import java.net.URISyntaxException; +import java.nio.charset.StandardCharsets; +import java.util.Base64; +import java.util.Map; +import org.apache.http.HttpResponse; +import org.apache.http.StatusLine; +import org.apache.http.client.HttpClient; +import org.apache.http.client.methods.HttpGet; +import org.apache.http.client.protocol.RequestAcceptEncoding; +import org.apache.http.client.protocol.ResponseContentEncoding; +import org.apache.http.client.utils.URIBuilder; +import org.apache.http.impl.client.DefaultHttpClient; +import org.apache.http.params.CoreConnectionPNames; +import org.apache.http.util.EntityUtils; +import org.apache.iceberg.common.DynConstructors; +import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap; +import org.apache.iceberg.util.LockManagers; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * The user is required to provide an API interface with distributed locking functionality as + * agreed. + */ +public class ServerSideHttpLockManager extends LockManagers.BaseLockManager { + private static final Logger LOG = LoggerFactory.getLogger(ServerSideHttpLockManager.class); + private String httpUrl = null; + private static final String OPERATOR = "operator"; + private static final String LOCK = "lock"; + private static final String UNLOCK = "unlock"; + private static final String ENTITY_ID = "entityId"; + private static final String OWNER_ID = "ownerId"; + private static final int REQUEST_SUCCESS = 200; + public static final String REQUEST_URL = "lock.http.conf.request.url"; + public static final String REQUEST_AUTH = "lock.http.conf.request.auth.impl"; + private HttpClient httpClient = null; + private HttpAuthentication httpAuthentication = null; Review Comment: Users can add custom auth implementations when initiating tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Standardize AWS credential names [iceberg-python]
syun64 commented on code in PR #922: URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676852897 ## pyiceberg/io/__init__.py: ## @@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: Properties) -> Optional[Fi return None +def _get_first_property_value(properties: Properties, property_names: Tuple[str, ...]) -> Optional[Any]: Review Comment: Nit: Duplicate function defined here and in pyiceberg/catalog/__init__.py could be consolidated to one ## pyiceberg/catalog/__init__.py: ## @@ -838,6 +855,18 @@ def _get_default_warehouse_location(self, database_name: str, table_name: str) - raise ValueError("No default path is set, please specify a location when creating a table") +def _get_first_property_value(self, property_names: Tuple[str, ...]) -> Optional[Any]: +for property_name in property_names: +if property_value := self.properties.get(property_name): +if property_name in DEPRECATED_PROPERTY_NAMES: +deprecated( +deprecated_in="0.7.0", +removed_in="0.8.0", +help_message=f"The property {property_name} is deprecated. Please use properties start with aws., glue., and dynamo. instead", Review Comment: ```suggestion help_message=f"The property {property_name} is deprecated. Please use properties that start with aws., glue., and dynamo. instead", ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Standardize AWS credential names [iceberg-python]
syun64 commented on code in PR #922: URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676852897 ## pyiceberg/io/__init__.py: ## @@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: Properties) -> Optional[Fi return None +def _get_first_property_value(properties: Properties, property_names: Tuple[str, ...]) -> Optional[Any]: Review Comment: Nit: Duplicate function defined here and in `pyiceberg/catalog/__init__.py` could be consolidated to one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: don't include slf4j-api in bundled JARs [iceberg]
bryanck commented on PR #10665: URL: https://github.com/apache/iceberg/pull/10665#issuecomment-2226992307 Thanks for the contribution @devinrsmith ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Repair manifest action [iceberg]
szehon-ho commented on PR #10445: URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2227007559 yea i think that makes sense, let's do it in a way that we can add more functionality later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Repair manifest action [iceberg]
szehon-ho commented on PR #10445: URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2227037227 @danielcweeks @tabmatfournier while we are discussing, do you guys think it makes sense later to integrate my functionality into this Repair action? its been awhile, but iirc it was about fixing the manifest metadata (ie, file sizes, and metrics). File sizes to fix a bug https://github.com/apache/iceberg/issues/1980 that at the time seemed important, but its probably rare. Re-calculating metrics based on new table configs is a more common case. Ive seen a lot of users with huge metadatas that OOM planning, and realize they need to tune to reduce metrics collected, and so far there's no mechanism to do this. Another option for these is additonal flag on RewriteManifests, but I think there were some opinions in #2608 that it should be a separate action as rewrite indicates, rewrite-as-is. If we agree, I can add these funcitons in a follow up. Maybe it makes sense to have a bit array of options, as it seems Repair has many options then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Standardize AWS credential names [iceberg-python]
Fokko commented on code in PR #922: URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676882347 ## pyiceberg/io/__init__.py: ## @@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: Properties) -> Optional[Fi return None +def _get_first_property_value(properties: Properties, property_names: Tuple[str, ...]) -> Optional[Any]: Review Comment: I find this PR very hard to read with all the constants. I would like to do a suggestion, how about changing the signature to: ```suggestion def _get_first_property_value(properties: Properties, *property_names: str) -> Optional[Any]: ``` And avoid the additional constant with the grouped values: ```python _get_first_property_value(self.properties, S3_ACCESS_KEY_ID_PROPERTIES) ``` Instead, write: ```python _get_first_property_value(self.properties, S3_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID) ``` The `S3_ACCESS_KEY_ID_PROPERTIES` seemed to be used just once. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Standardize AWS credential names [iceberg-python]
Fokko commented on code in PR #922: URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676882383 ## pyiceberg/io/__init__.py: ## @@ -46,6 +48,10 @@ logger = logging.getLogger(__name__) +AWS_REGION = "client.region" Review Comment: Nice, that's a good find π -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
kevinjqliu commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676880923 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = table_schema.field_names - task_schema.field_names +fields_in_both = task_schema.field_names.intersection(table_schema.field_names) + +from rich.console import Console +from rich.table import Table as RichTable + +console = Console(record=True) + +rich_table = RichTable(show_header=True, header_style="bold") +rich_table.add_column("Field Name") +rich_table.add_column("Category") +rich_table.add_column("Table field") +rich_table.add_column("Dataframe field") + +def print_nullability(required: bool) -> str: +return "required" if required else "optional" + +for field_name in fields_in_both: Review Comment: just want to check my understanding, this works for nested fields because nested fields are "flattened" by `. field_names` and then fetched by `.find_field`. For example: a `df` schema like ``` task_schema = pa.field( "person", pa.struct([ pa.field("name", pa.string(), nullable=True), ]), nullable=True, ) ``` `task_schema.field_names` will produce `{"person", "person.name"}`. `task_schema.find_field("person")` and `task_schema.find_field("person.name")` will fetch the corresponding fields ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = table_schema.field_names - task_schema.field_names +fields_in_both = task_schema.field_names.intersection(table_schema.field_names) + +from rich.console import Console +from rich.table import Table as RichTable + +console = Console(record=True) + +rich_table = RichTable(show_header=True, header_style="bold") +rich_table.add_column("Field Name") +rich_table.add_column("Category") +rich_table.add_column("Table field") +rich_table.add_column("Dataframe field") + +def print_nullability(required: bool) -> str: +return "required" if required else "optional" + +for field_name in fields_in_both: +lhs = table_schema.find_field(field_name) +rhs = task_schema.find_field(field_name) +# Check nullability +if lhs.required != rhs.required: Review Comment: to make this a bit more complicated... an optional field can be written to by a required field. the schema
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
kevinjqliu commented on PR #921: URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227047760 @pdpark does this solution resolve your usecase? https://github.com/apache/iceberg-python/pull/829#issuecomment-2218116050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Set boolean values in table properties to lowercase string [iceberg-python]
Fokko merged PR #924: URL: https://github.com/apache/iceberg-python/pull/924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Set properties boolean value to lowercase string [iceberg-python]
kevinjqliu commented on issue #919: URL: https://github.com/apache/iceberg-python/issues/919#issuecomment-2227049015 Resolved in #924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create table properties does not support boolean value [iceberg-python]
kevinjqliu closed issue #895: Create table properties does not support boolean value URL: https://github.com/apache/iceberg-python/issues/895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Set properties boolean value to lowercase string [iceberg-python]
kevinjqliu closed issue #919: Set properties boolean value to lowercase string URL: https://github.com/apache/iceberg-python/issues/919 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Create table properties does not support boolean value [iceberg-python]
kevinjqliu commented on issue #895: URL: https://github.com/apache/iceberg-python/issues/895#issuecomment-2227049076 Resolved in #924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]
spock-abadai opened a new issue, #925: URL: https://github.com/apache/iceberg-python/issues/925 ### Apache Iceberg version main (development) ### Please describe the bug π On one of my iceberg tables, when I load a table and scan it, during the parsing of the name mapping in the table properties, pydantic issues the following `ValidationError`: ``` def parse_mapping_from_json(mapping: str) -> NameMapping: > return NameMapping.model_validate_json(mapping) E pydantic_core._pydantic_core.ValidationError: 1 validation error for NameMapping E 9.names E Value error, At least one mapped name must be provided for the field [type=value_error, input_value=[], input_type=list] E For further information visit https://errors.pydantic.dev/2.8/v/value_error ``` This seems to be a result of the code in `table/name_mapping.py` in the method `check_at_least_one`, which (if I understand correctly) checks that all fields in the name mapping have at least one `name`. However, if I'm reading the [Iceberg spec](https://iceberg.apache.org/spec/#column-projection) correctly, it states that:  I'm not 100% sure what scenario lead to this but I can say that the name mapping we have indeed has a field with id 10 that has an empty list of names. This field existed at one point in the schema but it seems like it was removed. In any case, it doesn't seem like requiring that the list of names contain at least one value is in line with the spec (and it seems that situations where this isn't the case do happen). Note that the said iceberg table was never created, written to or modified using `pyiceberg` (only using spark and trino). `pyiceberg` is only used to read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on PR #921: URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227049610 Thank you for the review comments @kevinjqliu π > Porting over Fokko's comment from the other PR [#829 (review)](https://github.com/apache/iceberg-python/pull/829#pullrequestreview-2164061387) Do you know if the current change supports "re-aligning" the table? I think this test from your PR covers that case where the fields are re-aligned according to the Iceberg Table spec: https://github.com/apache/iceberg-python/pull/921/files#diff-7f3dd1244d08ce27c003cd091da10aa049f7bb0c7d5397acb4ec69767036accdR982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676884492 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = table_schema.field_names - task_schema.field_names +fields_in_both = task_schema.field_names.intersection(table_schema.field_names) + +from rich.console import Console +from rich.table import Table as RichTable + +console = Console(record=True) + +rich_table = RichTable(show_header=True, header_style="bold") +rich_table.add_column("Field Name") +rich_table.add_column("Category") +rich_table.add_column("Table field") +rich_table.add_column("Dataframe field") + +def print_nullability(required: bool) -> str: +return "required" if required else "optional" + +for field_name in fields_in_both: +lhs = table_schema.find_field(field_name) +rhs = task_schema.find_field(field_name) +# Check nullability +if lhs.required != rhs.required: Review Comment: Thank you for pointing this out! This might be simpler to deal with than type promotion, so let me give this a go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676885325 ## pyiceberg/io/pyarrow.py: ## @@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: Optional[pa.Array], field_id: int, _: st except ValueError: return None -if isinstance(partner_struct, pa.StructArray): -return partner_struct.field(name) -elif isinstance(partner_struct, pa.Table): -return partner_struct.column(name).combine_chunks() -elif isinstance(partner_struct, pa.RecordBatch): -return partner_struct.column(name) -else: -raise ValueError(f"Cannot find {name} in expected partner_struct type {type(partner_struct)}") +try: +if isinstance(partner_struct, pa.StructArray): +return partner_struct.field(name) +elif isinstance(partner_struct, pa.Table): +return partner_struct.column(name).combine_chunks() +elif isinstance(partner_struct, pa.RecordBatch): +return partner_struct.column(name) +else: +raise ValueError(f"Cannot find {name} in expected partner_struct type {type(partner_struct)}") +except KeyError: Review Comment: Yes, that's right - the `ArrowProjectionVisitor` is responsible for detecting that the `field_partner` is `None` and then checking if the table field is also optional before filling it in with a null array. This change is necessary so that the `ArrowAccessor` doesn't throw an exception if the field can't be found in the arrow component, and enables `ArrowProjectionVisitor` to make use of a code pathway it wasn't able to make use of before: https://github.com/apache/iceberg-python/blob/b11cdb54b1a05cce0ade34af4ce81a94c34b2650/pyiceberg/io/pyarrow.py#L1388-L1395 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676885362 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = table_schema.field_names - task_schema.field_names +fields_in_both = task_schema.field_names.intersection(table_schema.field_names) + +from rich.console import Console +from rich.table import Table as RichTable + +console = Console(record=True) + +rich_table = RichTable(show_header=True, header_style="bold") +rich_table.add_column("Field Name") +rich_table.add_column("Category") +rich_table.add_column("Table field") +rich_table.add_column("Dataframe field") + +def print_nullability(required: bool) -> str: +return "required" if required else "optional" + +for field_name in fields_in_both: Review Comment: That's consistent with my understanding -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
kevinjqliu commented on PR #921: URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227052294 My understanding of the "re-aligning" problem is the alignment between parquet and iceberg. similar to what Fokko mentioned here https://github.com/apache/iceberg-python/pull/829#issuecomment-2219874876 I think this PR currently only addresses schema compatibility and it is still possible to write parquet files with out-of-order schema. The `test_table_write_out_of_order_schema` test shows we are allowing out-of-order schema on write, but we want to sort the schema order before writing. Can probably punt this to a subsequent PR as an optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
Fokko commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676911616 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( Review Comment: Nit: This naming is still from when we only used it at the read-path, probably we make it more generic. Maybe `provided_schema` and `requested_schema`? Open for suggestions! ## tests/integration/test_add_files.py: ## @@ -501,14 +501,11 @@ def test_add_files_fails_on_schema_mismatch(spark: SparkSession, session_catalog ) expected = """Mismatch in fields: -ββ³βββ³βββ -ββ Table field β Dataframe field β -β‘βββββββ© -β β β 1: foo: optional boolean β 1: foo: optional boolean β -| β β 2: bar: optional string β 2: bar: optional string β -β β β 3: baz: optional int β 3: baz: optional string β -β β β 4: qux: optional dateβ 4: qux: optional dateβ -ββ΄βββ΄βββ +ββ³βββ³βββ³ββ Review Comment: Is it just me, or is the left easier to read? π ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( Review Comment: I'm debating myself if the API is the most extensible here. I think we should re-use `_check_schema_compatible(requested_schema: Schema, provided_schema: Schema)` instead of reimplementing the logic in Arrow here. This nicely splits `pyarrow_to_schema` and `_check_schema_compatible`. ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = tabl
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
Fokko commented on PR #921: URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227093971 Just to expand on top of https://github.com/apache/iceberg-python/pull/921#issuecomment-2227052294 This is the situation where you already have two Iceberg schemas with the IDs assigned. I think an easier approach to solve this problem is to first convert the Arrow schema to an Iceberg schema (similar to the logic with the name-mapping), and then compare the two Iceberg schemas. The process would look like: - Convert the Arrow schema into Iceberg using name-mapping, similar to [here](https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/io/pyarrow.py#L2082-L2086) - Compare the two schemas using a `SchemaWithPartnerVisitor` if they are compatible. If not, we should generate a pretty error. - If they are compatible, we should be able to just push the schema through `to_requested_schema` to align/cast the fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]
Fokko commented on issue #925: URL: https://github.com/apache/iceberg-python/issues/925#issuecomment-2227097367 @spock-abadai Thanks for reporting this. I agree that this seems to be incorrect. Are you interested in providing a PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]
spock-abadai commented on issue #925: URL: https://github.com/apache/iceberg-python/issues/925#issuecomment-2227109344 > @spock-abadai Thanks for reporting this. I agree that this seems to be incorrect. Are you interested in providing a PR? Sure, see #927 for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Concurrent table scans [iceberg-rust]
sdd commented on code in PR #373: URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1676960798 ## crates/iceberg/src/scan.rs: ## @@ -389,12 +333,158 @@ impl FileScanStreamContext { file_io, bound_filter, case_sensitive, +sender, +manifest_evaluator_cache: ManifestEvaluatorCache::new(), +partition_filter_cache: PartitionFilterCache::new(), +expression_evaluator_cache: Arc::new(Mutex::new(ExpressionEvaluatorCache::new())), }) } -/// Returns a reference to the [`BoundPredicate`] filter. -fn bound_filter(&self) -> Option<&BoundPredicate> { -self.bound_filter.as_ref() +async fn run(&mut self, manifest_list: ManifestList) -> Result<()> { +let file_io = self.file_io.clone(); +let sender = self.sender.clone(); + +// This whole Vec-and-for-loop approach feels sub-optimally structured. +// I've tried structuring this in multiple ways but run into +// issues with ownership. Ideally I'd like to structure this +// with a functional programming approach: extracting +// sections 1, 2, and 3 out into different methods on Self, +// and then use some iterator combinators to orchestrate it all. +// Section 1 is pretty trivially refactorable into a static method +// that can be used in a closure that can be used with Iterator::filter. +// Similarly, section 3 seems easily factor-able into a method that can +// be passed into Iterator::map. +// Section 2 turns out trickier - we want to exit the entire `run` method early +// if the eval fails, and filter out any manifest_files from the iterator / stream +// if the eval succeeds but returns true. We bump into ownership issues due +// to needing to pass mut self as the caches need to be able to mutate. +// Section 3 runs into ownership issues when trying to refactor its closure to be +// a static or non-static method. + +// 1 +let filtered_manifest_files = manifest_list +.entries() +.iter() +.filter(Self::reject_unsupported_content_types); + +// 2 +let mut filtered_manifest_files2 = vec![]; +for manifest_file in filtered_manifest_files { +if !self.apply_evaluator(manifest_file)? { +continue; +} + +filtered_manifest_files2.push(manifest_file); +} + +// 3 +let filtered_manifest_files = filtered_manifest_files2.into_iter().map(|manifest_file| { +Ok(ManifestFileProcessingContext { +manifest_file, +file_io: file_io.clone(), +sender: sender.clone(), +filter: &self.bound_filter, +expression_evaluator_cache: self.expression_evaluator_cache.clone(), +}) +}); + +futures::stream::iter(filtered_manifest_files) +.try_for_each_concurrent( Review Comment: Aah. You're right. `try_for_each_concurrent` is concurrent but not parallel. What I have should still be an improvement on what we have already, as we'll at least be able to have multiple fileIO file requests in-flight at the same time, but this is not fully parallelised. I'll work on making this parallel as well as concurrent. Some initial experimentation shows that it is tricky to please the borrow checker, but I'll keep trying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676966897 ## tests/integration/test_add_files.py: ## @@ -501,14 +501,11 @@ def test_add_files_fails_on_schema_mismatch(spark: SparkSession, session_catalog ) expected = """Mismatch in fields: -ββ³βββ³βββ -ββ Table field β Dataframe field β -β‘βββββββ© -β β β 1: foo: optional boolean β 1: foo: optional boolean β -| β β 2: bar: optional string β 2: bar: optional string β -β β β 3: baz: optional int β 3: baz: optional string β -β β β 4: qux: optional dateβ 4: qux: optional dateβ -ββ΄βββ΄βββ +ββ³βββ³βββ³ββ Review Comment: I opted for this approach because I wanted to group the Extra Fields in the dataframe also into the table. But if we are taking the approach of using the `name_mapping` to generate the Iceberg Schema with consistent IDs after first checking that there are no extra fields, I think we can go back to the old way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]
syun64 commented on code in PR #921: URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676967257 ## pyiceberg/io/pyarrow.py: ## @@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, other_schema: pa.Schema, down Raises: ValueError: If the schemas are not compatible. """ -name_mapping = table_schema.name_mapping -try: -task_schema = pyarrow_to_schema( -other_schema, name_mapping=name_mapping, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us -) -except ValueError as e: -other_schema = _pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) -additional_names = set(other_schema.column_names) - set(table_schema.column_names) -raise ValueError( -f"PyArrow table contains more columns: {', '.join(sorted(additional_names))}. Update the schema first (hint, use union_by_name)." -) from e - -if table_schema.as_struct() != task_schema.as_struct(): -from rich.console import Console -from rich.table import Table as RichTable +task_schema = assign_fresh_schema_ids( +_pyarrow_to_schema_without_ids(other_schema, downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us) +) -console = Console(record=True) +extra_fields = task_schema.field_names - table_schema.field_names +missing_fields = table_schema.field_names - task_schema.field_names +fields_in_both = task_schema.field_names.intersection(table_schema.field_names) + +from rich.console import Console +from rich.table import Table as RichTable + +console = Console(record=True) + +rich_table = RichTable(show_header=True, header_style="bold") +rich_table.add_column("Field Name") +rich_table.add_column("Category") +rich_table.add_column("Table field") +rich_table.add_column("Dataframe field") + +def print_nullability(required: bool) -> str: +return "required" if required else "optional" + +for field_name in fields_in_both: +lhs = table_schema.find_field(field_name) +rhs = task_schema.find_field(field_name) +# Check nullability +if lhs.required != rhs.required: +rich_table.add_row( +field_name, +"Nullability", +f"{print_nullability(lhs.required)} {str(lhs.field_type)}", +f"{print_nullability(rhs.required)} {str(rhs.field_type)}", +) +# Check if type is consistent +if any( +(isinstance(lhs.field_type, container_type) and isinstance(rhs.field_type, container_type)) +for container_type in {StructType, MapType, ListType} Review Comment: Hmmm isn't the nullability check for parents and child nodes covered from L2111 to L2117? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]
kevinjqliu commented on code in PR #927: URL: https://github.com/apache/iceberg-python/pull/927#discussion_r1676974835 ## tests/table/test_name_mapping.py: ## @@ -247,11 +247,6 @@ def test_mapping_lookup_by_name(table_name_mapping_nested: NameMapping) -> None: table_name_mapping_nested.find("boom") -def test_invalid_mapped_field() -> None: Review Comment: nit: can we add a test to check for `permits an empty list of names in the default name mapping` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Core: Kryo serialization error with DataFile after copying [iceberg]
fengjiajie opened a new pull request, #10695: URL: https://github.com/apache/iceberg/pull/10695 In my use case with Iceberg 1.3, I have a Flink `function-1` that outputs a `DataStream`, which is then processed by the next function. The simplified code for `function-1` is as follows: ```java // Inside function-1: Map columnSizes = new HashMap<>(); columnSizes.put(1, 234L); DataFile dataFile = DataFiles.builder(icebergTable.spec()) .withMetrics(new Metrics(123L, columnSizes, ...)) ... .build(); // Move file to new path, then rebuild DataFile DataFile newDataFile = DataFiles.builder(icebergTable.spec()) .copy(dataFile) .withPath("file:///new_path") .build(); ``` If I return `dataFile`, Flink's Kryo framework can deserialize it correctly in the next function. However, if I return `newDataFile` (reconstructed with `copy`), Kryo fails with the following exception: ``` Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:176) ... at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) ... 4 more Caused by: com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: columnSizes (org.apache.iceberg.GenericDataFile) at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125) ... Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableMap.put(Collections.java:1459) ... ``` This issue arises in Iceberg 1.15 but not in 1.13. The root cause lies in the `toReadableMap` method of `org.apache.iceberg.BaseFile`: ```java // Iceberg 1.13: private static Map toReadableMap(Map map) { return map instanceof SerializableMap ? ((SerializableMap)map).immutableMap() : map; } // Iceberg 1.15: private static Map toReadableMap(Map map) { if (map == null) { return null; } else if (map instanceof SerializableMap) { return ((SerializableMap) map).immutableMap(); } else { return Collections.unmodifiableMap(map); } } ``` In Iceberg 1.15, `toReadableMap` wraps the map with `Collections.unmodifiableMap`, resulting in an `UnsupportedOperationException` during deserialization. While using `unmodifiableMap` seems correct, the `copy` operation might need to reconstruct these maps as regular mutable maps to avoid this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Bump net.snowflake:snowflake-jdbc from 3.16.1 to 3.17.0 [iceberg]
dependabot[bot] opened a new pull request, #10696: URL: https://github.com/apache/iceberg/pull/10696 Bumps [net.snowflake:snowflake-jdbc](https://github.com/snowflakedb/snowflake-jdbc) from 3.16.1 to 3.17.0. Release notes Sourced from https://github.com/snowflakedb/snowflake-jdbc/releases";>net.snowflake:snowflake-jdbc's releases. v3.17.0 Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc Changelog Sourced from https://github.com/snowflakedb/snowflake-jdbc/blob/master/CHANGELOG.rst";>net.snowflake:snowflake-jdbc's changelog. JDBC Driver 3.17.0 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.16.1 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.16.0 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.15.1 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.15.0 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.5 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.4 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.3 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.2 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.1 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.14.0 ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.13.33 || Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc JDBC Driver 3.13.32 ... (truncated) Commits https://github.com/snowflakedb/snowflake-jdbc/commit/356852c65c31c92f31a875b2f185095564a227b9";>356852c Bump version to 3.17.0 for release (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1812";>#1812) https://github.com/snowflakedb/snowflake-jdbc/commit/f6548d570dfe5b9736c309c319ccb054a92e05c4";>f6548d5 SNOW-1369651: Do not fail file pattern expanding on file not found in differe... https://github.com/snowflakedb/snowflake-jdbc/commit/5f840231e128945105d8e1c02df3b7e20947080c";>5f84023 SNOW-1514498 - support for host when use file configuration (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1809";>#1809) https://github.com/snowflakedb/snowflake-jdbc/commit/997002b86897944fbcf529c6b6a4a0a71eaeb078";>997002b SNOW-731500: Add JDBC Connectivity Diagnostics mode (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1789";>#1789) https://github.com/snowflakedb/snowflake-jdbc/commit/8e916ae62af6fae3c6bd7fabdca609129fbbe18a";>8e916ae SNOW-1465374: Return consistent timestamps_ltz between JSON and ARROW result ... https://github.com/snowflakedb/snowflake-jdbc/commit/3fee6f9240599049c0d17960623b79f3bd10";>3fee6f9 SNOW-1196082: FIx inserting and reading timestamps not symetric if too much c... https://github.com/snowflakedb/snowflake-jdbc/commit/f39eff42290bfda27edb46d2cef0181f12ee52c3";>f39eff4 SNOW-957747: Easy logging improvements (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1730";>#1730) https://github.com/snowflakedb/snowflake-jdbc/commit/d4504f8b7780a2504712b21569bb29ccaf755073";>d4504f8 SNOW-1163203: Increased Max LOB size in metadata (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1806";>#1806) https://github.com/snowflakedb/snowflake-jdbc/commit/a4db3096c3282eb0c8aa7b86229d063a4
[PR] Build: Bump nessie from 0.92.0 to 0.92.1 [iceberg]
dependabot[bot] opened a new pull request, #10697: URL: https://github.com/apache/iceberg/pull/10697 Bumps `nessie` from 0.92.0 to 0.92.1. Updates `org.projectnessie.nessie:nessie-client` from 0.92.0 to 0.92.1 Updates `org.projectnessie.nessie:nessie-jaxrs-testextension` from 0.92.0 to 0.92.1 Updates `org.projectnessie.nessie:nessie-versioned-storage-inmemory-tests` from 0.92.0 to 0.92.1 Updates `org.projectnessie.nessie:nessie-versioned-storage-testextension` from 0.92.0 to 0.92.1 Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Bump org.assertj:assertj-core from 3.26.0 to 3.26.3 [iceberg]
dependabot[bot] opened a new pull request, #10698: URL: https://github.com/apache/iceberg/pull/10698 Bumps [org.assertj:assertj-core](https://github.com/assertj/assertj) from 3.26.0 to 3.26.3. Release notes Sourced from https://github.com/assertj/assertj/releases";>org.assertj:assertj-core's releases. v3.26.3 :jigsaw: Binary Compatibility The release is: Binary compatible with the previous minor version. Binary incompatible with the previous patch version. :boom: Breaking Changes Core Replace assertThat(Temporal) with assertThatTemporal(Temporal) https://redirect.github.com/assertj/assertj/issues/3519";>#3519 :bug: Bug Fixes Core Fix Javadoc rendering on FactoryBasedNavigableListAssert::assertThat Allow ComparingNormalizedFields instances to be reused across different assertions https://redirect.github.com/assertj/assertj/issues/3493";>#3493 :hammer: Dependency Upgrades Core Upgrade to Byte Buddy 1.14.18 https://redirect.github.com/assertj/assertj/issues/3531";>#3531 Upgrade to JUnit BOM 5.10.3 https://redirect.github.com/assertj/assertj/issues/3525";>#3525 Guava Upgrade to Guava 33.2.1-jre https://redirect.github.com/assertj/assertj/issues/3499";>#3499 :heart: Contributors Thanks to all the contributors who worked on this release: https://github.com/genuss";>@βgenuss Commits https://github.com/assertj/assertj/commit/8e97f90d62782a5fe2e49f739164361ecb54738b";>8e97f90 [maven-release-plugin] prepare release assertj-build-3.26.3 https://github.com/assertj/assertj/commit/d1afefca4e99fe6476e04d61d670fe24c25b7f4d";>d1afefc chore(deps): bump com.github.spotbugs:spotbugs-maven-plugin from 4.8.6.1 to 4... https://github.com/assertj/assertj/commit/2dc2cbfa4070d2bbeb1a748c0833c277047fd93d";>2dc2cbf chore(deps): bump byte-buddy.version from 1.14.17 to 1.14.18 (https://redirect.github.com/assertj/assertj/issues/3531";>#3531) https://github.com/assertj/assertj/commit/2541d3c1722a923c9a1cf3b33255e3fc8ee07135";>2541d3c chore(deps-dev): bump com.fasterxml.jackson.core:jackson-databind from 2.17.1... https://github.com/assertj/assertj/commit/cdb906fb0d9ee0dfddb2925f15a46a5b7b16c5da";>cdb906f [maven-release-plugin] prepare for next development iteration https://github.com/assertj/assertj/commit/c3b1f4a5eef1284e484e43374e9d5ba8ba33707d";>c3b1f4a [maven-release-plugin] prepare release assertj-build-3.26.2 https://github.com/assertj/assertj/commit/d5b52abe63cca2b84b6abc38dd9b15a0e63f9b9d";>d5b52ab [maven-release-plugin] prepare for next development iteration https://github.com/assertj/assertj/commit/17ea711f63eea0c3081be2e36f1d089edaba1f07";>17ea711 [maven-release-plugin] prepare release assertj-build-3.26.1 https://github.com/assertj/assertj/commit/8cf054db3230522155c5b637e332e485622c8b82";>8cf054d chore(deps): bump org.codehaus.mojo:versions-maven-plugin from 2.16.2 to 2.17... https://github.com/assertj/assertj/commit/5e708b48ef69f4d8c35ec86690ba6bc491cb579a";>5e708b4 chore(deps-dev): bump org.apache.groovy:groovy from 4.0.21 to 4.0.22 (https://redirect.github.com/assertj/assertj/issues/3527";>#3527) Additional commits viewable in https://github.com/assertj/assertj/compare/assertj-build-3.26.0...assertj-build-3.26.3";>compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it you
[PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.43.0 [iceberg]
dependabot[bot] opened a new pull request, #10699: URL: https://github.com/apache/iceberg/pull/10699 Bumps [com.google.cloud:libraries-bom](https://github.com/googleapis/java-cloud-bom) from 26.28.0 to 26.43.0. Release notes Sourced from https://github.com/googleapis/java-cloud-bom/releases";>com.google.cloud:libraries-bom's releases. v26.43.0 GCP Libraries BOM 26.43.0 Here are the differences from the previous version (26.42.0) New Addition com.google.cloud:google-cloud-gdchardwaremanagement:0.1.0 The group ID of the following artifacts is com.google.cloud. Notable Changes google-cloud-bigquery 2.41.0 (prev: 2.40.3) Add columnNameCharacterMap to LoadJobConfiguration (https://redirect.github.com/googleapis/java-bigquery/issues/3356";>#3356) (https://github.com/googleapis/java-bigquery/commit/2f3cbe39619bcc93cb7d504417accd84b418dd41";>2f3cbe3) Add MetadataCacheMode to ExternalTableDefinition (https://redirect.github.com/googleapis/java-bigquery/issues/3351";>#3351) (https://github.com/googleapis/java-bigquery/commit/2814dc49dfdd5671257b6a9933a5dd381d889dd1";>2814dc4) Add clustering value to ListTables result (https://redirect.github.com/googleapis/java-bigquery/issues/3359";>#3359) (https://github.com/googleapis/java-bigquery/commit/5d52bc9f4ef93f84200335685901c6ac0256b769";>5d52bc9) google-cloud-bigtable 2.40.0 (prev: 2.39.5) Add String type with Utf8Raw encoding to Bigtable API (https://redirect.github.com/googleapis/java-bigtable/issues/2191";>#2191) (https://github.com/googleapis/java-bigtable/commit/e7f03fc7d252a7ff6c76a8e6e0a9e6ad3dcbd9d5";>e7f03fc) Add getServiceName() to EnhancedBigTableStubSettings (https://redirect.github.com/googleapis/java-bigtable/issues/2256";>#2256) (https://github.com/googleapis/java-bigtable/commit/da703db25f6702b263dbd8ded0cb0fd3422efe31";>da703db) Remove grpclb (https://redirect.github.com/googleapis/java-bigtable/issues/2033";>#2033) (https://github.com/googleapis/java-bigtable/commit/735537571a147bfdd2a986664ff7905c8f5dc3db";>7355375) google-cloud-firestore 3.22.0 (prev: 3.21.4) Add bulk delete api (https://redirect.github.com/googleapis/java-firestore/issues/1704";>#1704) (https://github.com/googleapis/java-firestore/commit/5ef625456afbee781951d2a5c6a9c2548feea92e";>5ef6254) google-cloud-logging 3.19.0 (prev: 3.18.0) logging: OpenTelemetry trace/span ID integration for Java logging library (https://redirect.github.com/googleapis/java-logging/issues/1596";>#1596) (https://github.com/googleapis/java-logging/commit/67db829621fd1c4a876d158fe1afb4927821fa54";>67db829) google-cloud-pubsub 1.131.0 (prev: 1.130.0) Add use_topic_schema for Cloud Storage Subscriptions (https://redirect.github.com/googleapis/java-pubsub/issues/2082";>#2082) (https://github.com/googleapis/java-pubsub/commit/11d67d44152ccca008dda071683d9932c59af41d";>11d67d4) google-cloud-spanner 6.71.0 (prev: 6.69.0) Add field order_by in spanner.proto (https://redirect.github.com/googleapis/java-spanner/issues/3064";>#3064) (https://github.com/googleapis/java-spanner/commit/52ee1967ee3a37fb0482ad8b51c6e77e28b79844";>52ee196) Do not end transaction span when rolling back to savepoint (https://redirect.github.com/googleapis/java-spanner/issues/3167";>#3167) (https://github.com/googleapis/java-spanner/commit/8ec0cf2032dece545c9e4d8a794b80d06550b710";>8ec0cf2) Remove unused DmlBatch span (https://redirect.github.com/googleapis/java-spanner/issues/3147";>#3147) (https://github.com/googleapis/java-spanner/commit/f7891c1ca42727c775cdbe91bff8d55191a3d799";>f7891c1) google-cloud-spanner-jdbc 2.20.1 (prev: 2.19.3) Add OpenTelemetry tracing (https://redirect.github.com/googleapis/java-spanner-jdbc/issues/1568";>#1568) (https://github.com/googleapis/java-spanner-jdbc/commit/1485a04272c270851468254bebffe4f7d846f17c";>1485a04) Other libraries [aiplatform] add enum value MALFORMED_FUNCTION_CALL to .google.cloud.aiplatform.v1beta1.content.Candidate.FinishReason (https://github.com/googleapis/google-cloud-java/commit/666a3ac8cd0cb45da3555a1bef8dfe3edf57d257";>666a3ac) [aiplatform] add MALFORMED_FUNCTION_CALL to FinishReason (https://github.com/googleapis/google-cloud-java/commit/666a3ac8cd0cb45da3555a1bef8dfe3edf57d257";>666a3ac) [batch] add a install_ops_agent field to InstancePolicyOrTemplate for Ops Agent support (https://github.com/googleapis/google-cloud-java/commit/564c3692bc8d22f4f5acdcbfb60be0582bd08d53";>564c369) [container] A new message HugepagesConfig is added (https://github.com/googleapis/google-cloud-java/commit/4be1db5cfca842d85e167b8ed033ebcbcbd758dd";>4be1db5) [container] A new method_signature parent is added to method ListOperations in service ClusterManager (https://github.com/googleapis/google-cloud-java/commit/4be1db5cfca842d85e167
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.42.0 [iceberg]
dependabot[bot] commented on PR #10556: URL: https://github.com/apache/iceberg/pull/10556#issuecomment-2227195410 Superseded by #10699. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.42.0 [iceberg]
dependabot[bot] closed pull request #10556: Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.42.0 URL: https://github.com/apache/iceberg/pull/10556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[PR] Build: Bump software.amazon.awssdk:bom from 2.26.16 to 2.26.20 [iceberg]
dependabot[bot] opened a new pull request, #10700: URL: https://github.com/apache/iceberg/pull/10700 Bumps software.amazon.awssdk:bom from 2.26.16 to 2.26.20. [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]
spock-abadai commented on code in PR #927: URL: https://github.com/apache/iceberg-python/pull/927#discussion_r1677034401 ## tests/table/test_name_mapping.py: ## @@ -247,11 +247,6 @@ def test_mapping_lookup_by_name(table_name_mapping_nested: NameMapping) -> None: table_name_mapping_nested.find("boom") -def test_invalid_mapped_field() -> None: Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]
spock-abadai commented on PR #927: URL: https://github.com/apache/iceberg-python/pull/927#issuecomment-2227208143 > Thanks for raising this PR @spock-abadai > > I think we should update line 40 as well: > > https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/table/name_mapping.py#L40 Good catch, fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]
Fokko commented on PR #927: URL: https://github.com/apache/iceberg-python/pull/927#issuecomment-2227217603 @spock-abadai Thanks again for working on this. I think we're almost there. Can you run `make lint` to fix the formatting issue? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org