Re: [PR] mr:Fix issues 10639 [iceberg]

2024-07-13 Thread via GitHub


pvary commented on PR #10661:
URL: https://github.com/apache/iceberg/pull/10661#issuecomment-2226801595

   > compile jdk8, run jdk11, but we are actively working on jdk11 compile now, 
that should be in Hive-4.1.0 release
   
   When is Hive 4.1.0 planned?
   You might want to find and comment on the deprecation thread for Hive when 
it is ready:
   https://lists.apache.org/thread/bn2c480wdbzr089p88n1003zhw0nj9kv
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] mr:Fix issues 10639 [iceberg]

2024-07-13 Thread via GitHub


lurnagao-dahua commented on code in PR #10661:
URL: https://github.com/apache/iceberg/pull/10661#discussion_r1674923132


##
mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java:
##
@@ -381,6 +386,56 @@ public void testCustomCatalog() throws IOException {
 testInputFormat.create(builder.conf()).validate(expectedRecords);
   }
 
+  @TestTemplate
+  public void testWorkerPool() throws Exception {
+Table table = helper.createUnpartitionedTable();
+List records = helper.generateRandomRecords(1, 0L);
+helper.appendToTable(null, records);

Review Comment:
   Thank you for your guidance. I have further optimized the unit test cases!
   Could you please take a review when you have time again?
   I would be very grateful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] refine: move binary serialize in literal to datum [iceberg-rust]

2024-07-13 Thread via GitHub


ZENOTME commented on PR #456:
URL: https://github.com/apache/iceberg-rust/pull/456#issuecomment-2226828202

   cc @liurenjie1024 @Xuanwo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794013


##
crates/iceberg/testdata/view_metadata/ViewMetadataV2Valid.json:
##
@@ -0,0 +1,58 @@
+{

Review Comment:
   Ok, thanks Eduard for the Feedback.
   
   My preferred way of going forward would be:
   
   1. Merge this PR which just contains the structs & (de)serialization (very 
much like the `TableMetadaData` in its current state)
   2. Figure out a good way for the `TableMetadataBuilder` including partition 
binding. We have a first shot ready that we can create a PR for.
   3. Use the same pattern we use for the `TableMetadataBuilder` for the much 
lighter `ViewMetadataBuilder` which then includes all the tests Eduard 
mentioned.
   
   I am aware that a lot of things are missing in the builder. It was a 
deliberate decision from me to get views up-to-speed with tables first, and 
then in a second step extend both tables and views features to the java level.
   
   @ZENOTME , @nastra would that be OK for you?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794680


##
crates/iceberg/src/spec/view_metadata.rs:
##
@@ -0,0 +1,682 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines the [view 
metadata](https://iceberg.apache.org/view-spec/#view-metadata).
+//! The main struct here is [ViewMetadata] which defines the data for a view.
+
+use serde::{Deserialize, Serialize};
+use serde_repr::{Deserialize_repr, Serialize_repr};
+use std::cmp::Ordering;
+use std::fmt::{Display, Formatter};
+use std::{collections::HashMap, sync::Arc};
+use uuid::Uuid;
+
+use super::{
+view_version::{ViewVersion, ViewVersionRef},
+SchemaId, SchemaRef,
+};
+use crate::catalog::ViewCreation;
+use crate::error::Result;
+
+use _serde::ViewMetadataEnum;
+
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+
+/// Reference to [`ViewMetadata`].
+pub type ViewMetadataRef = Arc;
+
+#[derive(Debug, PartialEq, Deserialize, Eq, Clone)]
+#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")]
+/// Fields for the version 1 of the view metadata.
+///
+/// We assume that this data structure is always valid, so we will panic when 
invalid error happens.
+/// We check the validity of this data structure when constructing.
+pub struct ViewMetadata {
+/// Integer Version for the format.
+pub(crate) format_version: ViewFormatVersion,
+/// A UUID that identifies the view, generated when the view is created.
+pub(crate) view_uuid: Uuid,
+/// The view's base location; used to create metadata file locations
+pub(crate) location: String,
+/// ID of the current version of the view (version-id)
+pub(crate) current_version_id: i64,
+/// A list of known versions of the view
+pub(crate) versions: HashMap,
+/// A list of version log entries with the timestamp and version-id for 
every
+/// change to current-version-id
+pub(crate) version_log: Vec,

Review Comment:
   Hm, in the iceberg spec its called versions & version-log 
https://iceberg.apache.org/view-spec/#view-metadata.
   I implemented the same way `TableMetadata` is currently implemented: Calling 
the struct field identically to the spec, but then creating a history accessor.
   
   TableMetadata:
   
https://github.com/c-thiel/iceberg-rust/blob/ca9de89ac9d95683c8fe9191f72ab922dc4c7672/crates/iceberg/src/spec/table_metadata.rs#L208-L211
   
   ViewMetadata:
   
https://github.com/c-thiel/iceberg-rust/blob/44630160be1bcf48249c31006b76a7150a029619/crates/iceberg/src/spec/view_metadata.rs#L151-L156
   
   As the `versions` field is not public, it actually implements almost the 
same interface as java.
   
   @ZENOTME or some other rust dev, it would be great to get some opinion on 
this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core:support http lock-manager [iceberg]

2024-07-13 Thread via GitHub


snazy commented on PR #10688:
URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226834838

   It seems there are lot of individual points that should be discussed. 
@BsoBird do you mind adding a point to the "Discussions" section on the [agenda 
for the next community sync on July 
31st](https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit)?
   
   Let me comment on some of the points. TL;DR I think it's not an easy problem 
to solve.
   
   > balance this with the use of lockManager (which in itself is not a 
difficult task).
   
   Locking is one of the most difficult problems in distributed computing. (And 
actually in non-distributed computing, too.)  Just very few things that come to 
my mind (the real list is much longer):
   
   * Evil users lock all your tables (-> authorization)
   * Retry/back-off -> piling up waiting requests -> threading issues
   * Crashed clients leave stuff locked
   * "all the things" about race conditions
   
   > ... a large number of users use GlueCatalog ... want to access both 
iceberg and non-iceberg tables in Glue.
   
   That's IMO a fair point.
   
   > 2.You might think that users could solve this problem by implementing 
rest-catalog. However, many users do not have the ability to implement 
rest-catalog.Also, rest-catalog is still evolving, and many of the 
specifications have not yet been finalised.
   
   I suspect, this wouldn't be an issue, if Glue would have a rest-catalog?
   
   > 3.Regarding the fact that all clients need to be involved in the locking 
in order for it to work. I think this is a problem that users need to solve 
themselves.
   
   Similar to my statement above, users should not be forced into 
implementing/providing a proper distributed locking mechanism.
   
   Also isolation levels play a role here. Some use cases are probably fine 
with "dirty reads", some with "read committed" and others require full 
"serializeable" guarantees.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676795328


##
crates/iceberg/src/spec/view_metadata.rs:
##
@@ -0,0 +1,682 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines the [view 
metadata](https://iceberg.apache.org/view-spec/#view-metadata).
+//! The main struct here is [ViewMetadata] which defines the data for a view.
+
+use serde::{Deserialize, Serialize};
+use serde_repr::{Deserialize_repr, Serialize_repr};
+use std::cmp::Ordering;
+use std::fmt::{Display, Formatter};
+use std::{collections::HashMap, sync::Arc};
+use uuid::Uuid;
+
+use super::{
+view_version::{ViewVersion, ViewVersionRef},
+SchemaId, SchemaRef,
+};
+use crate::catalog::ViewCreation;
+use crate::error::Result;
+
+use _serde::ViewMetadataEnum;
+
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+
+/// Reference to [`ViewMetadata`].
+pub type ViewMetadataRef = Arc;
+
+#[derive(Debug, PartialEq, Deserialize, Eq, Clone)]
+#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")]
+/// Fields for the version 1 of the view metadata.
+///
+/// We assume that this data structure is always valid, so we will panic when 
invalid error happens.
+/// We check the validity of this data structure when constructing.
+pub struct ViewMetadata {
+/// Integer Version for the format.
+pub(crate) format_version: ViewFormatVersion,
+/// A UUID that identifies the view, generated when the view is created.
+pub(crate) view_uuid: Uuid,
+/// The view's base location; used to create metadata file locations
+pub(crate) location: String,
+/// ID of the current version of the view (version-id)
+pub(crate) current_version_id: i64,
+/// A list of known versions of the view
+pub(crate) versions: HashMap,
+/// A list of version log entries with the timestamp and version-id for 
every
+/// change to current-version-id
+pub(crate) version_log: Vec,
+/// A list of schemas, stored as objects with schema-id.
+pub(crate) schemas: HashMap,
+/// A string to string map of view properties.
+/// Properties are used for metadata such as comment and for settings that
+/// affect view maintenance. This is not intended to be used for arbitrary 
metadata.
+pub(crate) properties: HashMap,
+}
+
+impl ViewMetadata {
+/// Returns format version of this metadata.
+#[inline]
+pub fn format_version(&self) -> ViewFormatVersion {
+self.format_version
+}
+
+/// Returns uuid of current view.
+#[inline]
+pub fn uuid(&self) -> Uuid {
+self.view_uuid
+}
+
+/// Returns view location.
+#[inline]
+pub fn location(&self) -> &str {
+self.location.as_str()
+}
+
+/// Returns the current version id.
+#[inline]
+pub fn current_version_id(&self) -> i64 {
+self.current_version_id
+}
+
+/// Returns all view versions.
+#[inline]
+pub fn versions(&self) -> impl Iterator {
+self.versions.values()
+}
+
+/// Lookup a view version by id.
+#[inline]
+pub fn version_by_id(&self, version_id: i64) -> Option<&ViewVersionRef> {
+self.versions.get(&version_id)
+}
+
+/// Returns the current view version.
+#[inline]
+pub fn current_version(&self) -> &ViewVersionRef {
+self.versions
+.get(&self.current_version_id)
+.expect("Current version id set, but not found in view versions")
+}
+
+/// Returns schemas
+#[inline]
+pub fn schemas_iter(&self) -> impl Iterator {
+self.schemas.values()
+}
+
+/// Lookup schema by id.
+#[inline]
+pub fn schema_by_id(&self, schema_id: SchemaId) -> Option<&SchemaRef> {
+self.schemas.get(&schema_id)
+}
+
+/// Get current schema
+#[inline]
+pub fn current_schema(&self) -> &SchemaRef {
+let schema_id = self.current_version().schema_id();
+self.schema_by_id(schema_id)
+.expect("Current schema id set, but not found in view metadata")
+}
+
+/// Returns properties of the view.
+#[inline]
+pub fn properties(&self) -> &HashMap {
+&self.properties
+}
+
+/// A

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676795561


##
crates/iceberg/src/spec/view_metadata.rs:
##
@@ -0,0 +1,682 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines the [view 
metadata](https://iceberg.apache.org/view-spec/#view-metadata).
+//! The main struct here is [ViewMetadata] which defines the data for a view.
+
+use serde::{Deserialize, Serialize};
+use serde_repr::{Deserialize_repr, Serialize_repr};
+use std::cmp::Ordering;
+use std::fmt::{Display, Formatter};
+use std::{collections::HashMap, sync::Arc};
+use uuid::Uuid;
+
+use super::{
+view_version::{ViewVersion, ViewVersionRef},
+SchemaId, SchemaRef,
+};
+use crate::catalog::ViewCreation;
+use crate::error::Result;
+
+use _serde::ViewMetadataEnum;
+
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+
+/// Reference to [`ViewMetadata`].
+pub type ViewMetadataRef = Arc;
+
+#[derive(Debug, PartialEq, Deserialize, Eq, Clone)]
+#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")]
+/// Fields for the version 1 of the view metadata.
+///
+/// We assume that this data structure is always valid, so we will panic when 
invalid error happens.
+/// We check the validity of this data structure when constructing.
+pub struct ViewMetadata {
+/// Integer Version for the format.
+pub(crate) format_version: ViewFormatVersion,
+/// A UUID that identifies the view, generated when the view is created.
+pub(crate) view_uuid: Uuid,
+/// The view's base location; used to create metadata file locations
+pub(crate) location: String,
+/// ID of the current version of the view (version-id)
+pub(crate) current_version_id: i64,
+/// A list of known versions of the view
+pub(crate) versions: HashMap,
+/// A list of version log entries with the timestamp and version-id for 
every
+/// change to current-version-id
+pub(crate) version_log: Vec,
+/// A list of schemas, stored as objects with schema-id.
+pub(crate) schemas: HashMap,
+/// A string to string map of view properties.
+/// Properties are used for metadata such as comment and for settings that
+/// affect view maintenance. This is not intended to be used for arbitrary 
metadata.
+pub(crate) properties: HashMap,
+}
+
+impl ViewMetadata {
+/// Returns format version of this metadata.
+#[inline]
+pub fn format_version(&self) -> ViewFormatVersion {
+self.format_version
+}
+
+/// Returns uuid of current view.
+#[inline]
+pub fn uuid(&self) -> Uuid {
+self.view_uuid
+}
+
+/// Returns view location.
+#[inline]
+pub fn location(&self) -> &str {
+self.location.as_str()
+}
+
+/// Returns the current version id.
+#[inline]
+pub fn current_version_id(&self) -> i64 {
+self.current_version_id
+}
+
+/// Returns all view versions.
+#[inline]
+pub fn versions(&self) -> impl Iterator {
+self.versions.values()
+}
+
+/// Lookup a view version by id.
+#[inline]
+pub fn version_by_id(&self, version_id: i64) -> Option<&ViewVersionRef> {
+self.versions.get(&version_id)
+}
+
+/// Returns the current view version.
+#[inline]
+pub fn current_version(&self) -> &ViewVersionRef {
+self.versions
+.get(&self.current_version_id)
+.expect("Current version id set, but not found in view versions")
+}
+
+/// Returns schemas
+#[inline]
+pub fn schemas_iter(&self) -> impl Iterator {
+self.schemas.values()
+}
+
+/// Lookup schema by id.
+#[inline]
+pub fn schema_by_id(&self, schema_id: SchemaId) -> Option<&SchemaRef> {
+self.schemas.get(&schema_id)
+}
+
+/// Get current schema
+#[inline]
+pub fn current_schema(&self) -> &SchemaRef {
+let schema_id = self.current_version().schema_id();
+self.schema_by_id(schema_id)
+.expect("Current schema id set, but not found in view metadata")
+}
+
+/// Returns properties of the view.
+#[inline]
+pub fn properties(&self) -> &HashMap {
+&self.properties
+}
+
+/// A

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794013


##
crates/iceberg/testdata/view_metadata/ViewMetadataV2Valid.json:
##
@@ -0,0 +1,58 @@
+{

Review Comment:
   Ok, thanks Eduard for the Feedback.
   
   My preferred way of going forward would be:
   
   1. Merge this PR which just contains the structs & (de)serialization (very 
much like the `TableMetadaData` in its current state)
   2. Figure out a good way for the `TableMetadataBuilder` including partition 
binding. We have a first shot ready that we can create a PR for.
   3. Use the same pattern we use for the `TableMetadataBuilder` for the much 
lighter `ViewMetadataBuilder` which then includes all the tests Eduard 
mentioned.
   
   I am aware that a lot of things are missing in the builder. It was a 
deliberate decision from me to get views up-to-speed with tables first, and 
then in a second step extend both tables and views features to the java level. 
This way we have a handleable PR here that just takes care or (ser)de and can 
build on it in a next step.
   
   @ZENOTME , @nastra would that be OK for you?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676796496


##
crates/iceberg/src/spec/view_version.rs:
##
@@ -0,0 +1,347 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/*!
+ * View Versions!
+*/
+use crate::error::Result;
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::sync::Arc;
+use typed_builder::TypedBuilder;
+
+use super::view_metadata::ViewVersionLog;
+use crate::catalog::NamespaceIdent;
+use crate::spec::{SchemaId, SchemaRef, ViewMetadata};
+use crate::{Error, ErrorKind};
+use _serde::ViewVersionV1;
+
+/// Reference to [`ViewVersion`].
+pub type ViewVersionRef = Arc;
+
+#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)]
+#[serde(from = "ViewVersionV1", into = "ViewVersionV1")]
+#[builder(field_defaults(setter(prefix = "with_")))]
+/// A view versions represents the definition of a view at a specific point in 
time.
+pub struct ViewVersion {
+/// A unique long ID
+version_id: i64,
+/// ID of the schema for the view version
+schema_id: SchemaId,
+/// Timestamp when the version was created (ms from epoch)
+timestamp_ms: i64,
+/// A string to string map of summary metadata about the version
+summary: HashMap,
+/// A list of representations for the view definition.
+representations: ViewRepresentations,
+/// Catalog name to use when a reference in the SELECT does not contain a 
catalog
+#[builder(default = None)]
+default_catalog: Option,
+/// Namespace to use when a reference in the SELECT is a single identifier
+default_namespace: NamespaceIdent,
+}
+
+impl ViewVersion {
+/// Get the version id of this view version.
+#[inline]
+pub fn version_id(&self) -> i64 {
+self.version_id
+}
+
+/// Get the schema id of this view version.
+#[inline]
+pub fn schema_id(&self) -> SchemaId {
+self.schema_id
+}
+
+/// Get the timestamp of when the view version was created
+#[inline]
+pub fn timestamp(&self) -> MappedLocalTime> {
+Utc.timestamp_millis_opt(self.timestamp_ms)
+}
+
+/// Get the timestamp of when the view version was created in milliseconds 
since epoch
+#[inline]
+pub fn timestamp_ms(&self) -> i64 {
+self.timestamp_ms
+}
+
+/// Get summary of the view version
+#[inline]
+pub fn summary(&self) -> &HashMap {
+&self.summary
+}
+
+/// Get this views representations
+#[inline]
+pub fn representations(&self) -> &ViewRepresentations {
+&self.representations
+}
+
+/// Get the default catalog for this view version
+#[inline]
+pub fn default_catalog(&self) -> Option<&String> {
+self.default_catalog.as_ref()
+}
+
+/// Get the default namespace to use when a reference in the SELECT is a 
single identifier
+#[inline]
+pub fn default_namespace(&self) -> &NamespaceIdent {
+&self.default_namespace
+}
+
+/// Get the schema of this snapshot.
+pub fn schema(&self, view_metadata: &ViewMetadata) -> Result {
+let r = view_metadata
+.schema_by_id(self.schema_id())
+.ok_or_else(|| {
+Error::new(
+ErrorKind::DataInvalid,
+format!("Schema with id {} not found", self.schema_id()),
+)
+})
+.cloned();
+r
+}
+
+pub(crate) fn log(&self) -> ViewVersionLog {
+ViewVersionLog::new(self.version_id, self.timestamp_ms)
+}
+}
+
+/// A list of view representations.
+pub type ViewRepresentations = Vec;

Review Comment:
   Normally not. This is looking forward to the ViewMetadataBuilder where we 
want to implement methods on those.
   The motivation is that multiple `ViewRepresentations` are more than just a 
Vec - there are many rules in constructing it (i.e. no 
duplicate dialects). We shouldn't just expose the Vec interface externally 
because then users could just add an item to the list without going through any 
of our validation functions (which are not there yet, will come with the 
V

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676800028


##
crates/iceberg/src/spec/view_version.rs:
##
@@ -0,0 +1,347 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/*!
+ * View Versions!
+*/
+use crate::error::Result;
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::sync::Arc;
+use typed_builder::TypedBuilder;
+
+use super::view_metadata::ViewVersionLog;
+use crate::catalog::NamespaceIdent;
+use crate::spec::{SchemaId, SchemaRef, ViewMetadata};
+use crate::{Error, ErrorKind};
+use _serde::ViewVersionV1;
+
+/// Reference to [`ViewVersion`].
+pub type ViewVersionRef = Arc;
+
+#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)]
+#[serde(from = "ViewVersionV1", into = "ViewVersionV1")]
+#[builder(field_defaults(setter(prefix = "with_")))]
+/// A view versions represents the definition of a view at a specific point in 
time.
+pub struct ViewVersion {
+/// A unique long ID
+version_id: i64,
+/// ID of the schema for the view version
+schema_id: SchemaId,
+/// Timestamp when the version was created (ms from epoch)
+timestamp_ms: i64,
+/// A string to string map of summary metadata about the version
+summary: HashMap,
+/// A list of representations for the view definition.
+representations: ViewRepresentations,
+/// Catalog name to use when a reference in the SELECT does not contain a 
catalog
+#[builder(default = None)]
+default_catalog: Option,
+/// Namespace to use when a reference in the SELECT is a single identifier
+default_namespace: NamespaceIdent,
+}
+
+impl ViewVersion {
+/// Get the version id of this view version.
+#[inline]
+pub fn version_id(&self) -> i64 {
+self.version_id
+}
+
+/// Get the schema id of this view version.
+#[inline]
+pub fn schema_id(&self) -> SchemaId {
+self.schema_id
+}
+
+/// Get the timestamp of when the view version was created
+#[inline]
+pub fn timestamp(&self) -> MappedLocalTime> {
+Utc.timestamp_millis_opt(self.timestamp_ms)
+}
+
+/// Get the timestamp of when the view version was created in milliseconds 
since epoch
+#[inline]
+pub fn timestamp_ms(&self) -> i64 {
+self.timestamp_ms
+}
+
+/// Get summary of the view version
+#[inline]
+pub fn summary(&self) -> &HashMap {
+&self.summary
+}
+
+/// Get this views representations
+#[inline]
+pub fn representations(&self) -> &ViewRepresentations {
+&self.representations
+}
+
+/// Get the default catalog for this view version
+#[inline]
+pub fn default_catalog(&self) -> Option<&String> {
+self.default_catalog.as_ref()
+}
+
+/// Get the default namespace to use when a reference in the SELECT is a 
single identifier
+#[inline]
+pub fn default_namespace(&self) -> &NamespaceIdent {
+&self.default_namespace
+}
+
+/// Get the schema of this snapshot.
+pub fn schema(&self, view_metadata: &ViewMetadata) -> Result {
+let r = view_metadata
+.schema_by_id(self.schema_id())
+.ok_or_else(|| {
+Error::new(
+ErrorKind::DataInvalid,
+format!("Schema with id {} not found", self.schema_id()),
+)
+})
+.cloned();
+r
+}
+
+pub(crate) fn log(&self) -> ViewVersionLog {
+ViewVersionLog::new(self.version_id, self.timestamp_ms)
+}
+}
+
+/// A list of view representations.
+pub type ViewRepresentations = Vec;

Review Comment:
   Fix: 
https://github.com/c-thiel/iceberg-rust/commit/2b427e511dbeb3dfa6292af6983ee8d0b8bc48ad



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Core:support http lock-manager [iceberg]

2024-07-13 Thread via GitHub


BsoBird commented on PR #10688:
URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226845799

   @snazy 
   Hello.I'm so glad you could get back to me.
   I'm just saying out loud the first thing that comes to mind:
   
   1. Is everything done using rest-catalog?  In many cases, users have their 
own server rooms, they are not using cloud products, and they do not intend to 
expose iceberg directly to untrusted users. In a network-isolated environment, 
there seems to be no need to introduce a separate rest-catalog to add 
complexity. 
   2. For more complex access environments, rest-catalog is effective.I agree.
   3. Currently, the use of lockManager is tied to the implementation of a 
certain catalog.Similar to aws-iceberg, we don't actually force the user to use 
lockManager. The user knows what he's doing when he chooses a catalog. 
Basically, catalog's author has solved the concurrency problem and the race 
condition in the code of the catalog that uses lockManager. After all, he 
wouldn't need to use lockManager if he didn't solve these problems.
   4. Regarding the security of lockManager. This is something that should 
actually be considered within some kind of extension module.For example, 
iceberg-aws provides some implementations of lockManager, and the security of 
such lockManager must be considered by them.iceberg-core shouldn't care about 
such issues. After all, safety standards are not uniform.The details are varied.
   
   From what I've seen so far, the opinion of many people in the community is 
that the lock-manager should be extended and maintained by users if they want 
it, just like the aws-iceberg module. In iceberg-core, the implementation of 
lock-manager should only stop at InMemLockManager. But isn't that too little? 
Wouldn't it be nice to simply extend something like http-lock-manager for 
users?After all, it's small and doesn't introduce any other dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676801198


##
crates/iceberg/src/spec/view_version.rs:
##
@@ -0,0 +1,347 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/*!
+ * View Versions!
+*/
+use crate::error::Result;
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::sync::Arc;
+use typed_builder::TypedBuilder;
+
+use super::view_metadata::ViewVersionLog;
+use crate::catalog::NamespaceIdent;
+use crate::spec::{SchemaId, SchemaRef, ViewMetadata};
+use crate::{Error, ErrorKind};
+use _serde::ViewVersionV1;
+
+/// Reference to [`ViewVersion`].
+pub type ViewVersionRef = Arc;
+
+#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)]
+#[serde(from = "ViewVersionV1", into = "ViewVersionV1")]
+#[builder(field_defaults(setter(prefix = "with_")))]
+/// A view versions represents the definition of a view at a specific point in 
time.
+pub struct ViewVersion {
+/// A unique long ID
+version_id: i64,
+/// ID of the schema for the view version
+schema_id: SchemaId,
+/// Timestamp when the version was created (ms from epoch)
+timestamp_ms: i64,
+/// A string to string map of summary metadata about the version
+summary: HashMap,
+/// A list of representations for the view definition.
+representations: ViewRepresentations,
+/// Catalog name to use when a reference in the SELECT does not contain a 
catalog
+#[builder(default = None)]
+default_catalog: Option,
+/// Namespace to use when a reference in the SELECT is a single identifier
+default_namespace: NamespaceIdent,
+}
+
+impl ViewVersion {
+/// Get the version id of this view version.
+#[inline]
+pub fn version_id(&self) -> i64 {
+self.version_id
+}
+
+/// Get the schema id of this view version.
+#[inline]
+pub fn schema_id(&self) -> SchemaId {
+self.schema_id
+}
+
+/// Get the timestamp of when the view version was created
+#[inline]
+pub fn timestamp(&self) -> MappedLocalTime> {
+Utc.timestamp_millis_opt(self.timestamp_ms)
+}
+
+/// Get the timestamp of when the view version was created in milliseconds 
since epoch
+#[inline]
+pub fn timestamp_ms(&self) -> i64 {
+self.timestamp_ms
+}
+
+/// Get summary of the view version
+#[inline]
+pub fn summary(&self) -> &HashMap {
+&self.summary
+}
+
+/// Get this views representations
+#[inline]
+pub fn representations(&self) -> &ViewRepresentations {
+&self.representations
+}
+
+/// Get the default catalog for this view version
+#[inline]
+pub fn default_catalog(&self) -> Option<&String> {
+self.default_catalog.as_ref()
+}
+
+/// Get the default namespace to use when a reference in the SELECT is a 
single identifier
+#[inline]
+pub fn default_namespace(&self) -> &NamespaceIdent {
+&self.default_namespace
+}
+
+/// Get the schema of this snapshot.
+pub fn schema(&self, view_metadata: &ViewMetadata) -> Result {
+let r = view_metadata
+.schema_by_id(self.schema_id())
+.ok_or_else(|| {
+Error::new(
+ErrorKind::DataInvalid,
+format!("Schema with id {} not found", self.schema_id()),
+)
+})
+.cloned();
+r
+}
+
+pub(crate) fn log(&self) -> ViewVersionLog {
+ViewVersionLog::new(self.version_id, self.timestamp_ms)
+}
+}
+
+/// A list of view representations.
+pub type ViewRepresentations = Vec;
+
+/// A builder for [`ViewRepresentations`].
+pub struct ViewRepresentationsBuilder(ViewRepresentations);
+
+impl ViewRepresentationsBuilder {
+/// Create a new builder.
+pub fn new() -> Self {
+Self(Vec::new())
+}
+
+/// Add a representation to the list.
+///
+/// SQL representations dialects must be unique. If a representation with 
the same

Review Comment:
   No, it did not. I fixed it here:
   
https://github.com/c-thiel/iceberg-rust/commit/bf95eb7c5c458e0

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676802744


##
crates/iceberg/src/spec/view_version.rs:
##
@@ -0,0 +1,347 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+/*!
+ * View Versions!
+*/
+use crate::error::Result;
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::sync::Arc;
+use typed_builder::TypedBuilder;
+
+use super::view_metadata::ViewVersionLog;
+use crate::catalog::NamespaceIdent;
+use crate::spec::{SchemaId, SchemaRef, ViewMetadata};
+use crate::{Error, ErrorKind};
+use _serde::ViewVersionV1;
+
+/// Reference to [`ViewVersion`].
+pub type ViewVersionRef = Arc;
+
+#[derive(Debug, PartialEq, Eq, Clone, Serialize, Deserialize, TypedBuilder)]
+#[serde(from = "ViewVersionV1", into = "ViewVersionV1")]
+#[builder(field_defaults(setter(prefix = "with_")))]
+/// A view versions represents the definition of a view at a specific point in 
time.
+pub struct ViewVersion {
+/// A unique long ID
+version_id: i64,
+/// ID of the schema for the view version
+schema_id: SchemaId,
+/// Timestamp when the version was created (ms from epoch)
+timestamp_ms: i64,
+/// A string to string map of summary metadata about the version
+summary: HashMap,
+/// A list of representations for the view definition.
+representations: ViewRepresentations,
+/// Catalog name to use when a reference in the SELECT does not contain a 
catalog
+#[builder(default = None)]
+default_catalog: Option,
+/// Namespace to use when a reference in the SELECT is a single identifier
+default_namespace: NamespaceIdent,
+}
+
+impl ViewVersion {
+/// Get the version id of this view version.
+#[inline]
+pub fn version_id(&self) -> i64 {
+self.version_id
+}
+
+/// Get the schema id of this view version.
+#[inline]
+pub fn schema_id(&self) -> SchemaId {
+self.schema_id
+}
+
+/// Get the timestamp of when the view version was created
+#[inline]
+pub fn timestamp(&self) -> MappedLocalTime> {
+Utc.timestamp_millis_opt(self.timestamp_ms)
+}
+
+/// Get the timestamp of when the view version was created in milliseconds 
since epoch
+#[inline]
+pub fn timestamp_ms(&self) -> i64 {
+self.timestamp_ms
+}
+
+/// Get summary of the view version
+#[inline]
+pub fn summary(&self) -> &HashMap {
+&self.summary
+}
+
+/// Get this views representations
+#[inline]
+pub fn representations(&self) -> &ViewRepresentations {
+&self.representations
+}
+
+/// Get the default catalog for this view version
+#[inline]
+pub fn default_catalog(&self) -> Option<&String> {
+self.default_catalog.as_ref()
+}
+
+/// Get the default namespace to use when a reference in the SELECT is a 
single identifier
+#[inline]
+pub fn default_namespace(&self) -> &NamespaceIdent {
+&self.default_namespace
+}
+
+/// Get the schema of this snapshot.
+pub fn schema(&self, view_metadata: &ViewMetadata) -> Result {
+let r = view_metadata
+.schema_by_id(self.schema_id())
+.ok_or_else(|| {
+Error::new(
+ErrorKind::DataInvalid,
+format!("Schema with id {} not found", self.schema_id()),
+)
+})
+.cloned();
+r
+}
+
+pub(crate) fn log(&self) -> ViewVersionLog {
+ViewVersionLog::new(self.version_id, self.timestamp_ms)
+}
+}
+
+/// A list of view representations.
+pub type ViewRepresentations = Vec;
+
+/// A builder for [`ViewRepresentations`].
+pub struct ViewRepresentationsBuilder(ViewRepresentations);
+
+impl ViewRepresentationsBuilder {
+/// Create a new builder.
+pub fn new() -> Self {
+Self(Vec::new())
+}
+
+/// Add a representation to the list.
+///
+/// SQL representations dialects must be unique. If a representation with 
the same
+/// dialect already exists, it will be overwritten.
+pub fn add_or_overwrite_representation(mut self, represen

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub


c-thiel commented on PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2226849477

   @nastra I hope I addressed all your issues regarding functionality. If you 
are happy with the implementations, feel free to close our discussions :).
   @ZENOTME it would now be great to get some Feedback on the rust side of 
things. Please check my discussions with Eduard above. There are a few open 
points regarding naming and also the introduction of types. It would be great 
to get a review from you or some other Rust Maintainer for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-13 Thread via GitHub


advancedxy commented on code in PR #10352:
URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676818347


##
core/src/main/java/org/apache/iceberg/SchemaUpdate.java:
##
@@ -533,6 +537,34 @@ private static Schema applyChanges(
   }
 }
 
+Map> specToDeletes = Maps.newHashMap();

Review Comment:
   > One aspect is on ordering of these. There are two ways of going about this:
   
   I don't think the ordering of getting these two functionality into the 
master really matter that much? As long as they are both merged before the next 
release(assuming we are saying Iceberg 1.7 release).  I imagine customer should 
use official released versions.
   
   > Here's a trivial case: Imagine a user just creates the table and they 
don't write any data. Then they realize they want to drop a partition column, 
but the procedure will fail unexpectedly whereas before it would work.
   
   If preventing dropping active partition source field and 
RemoveUnusedPartitionSpec are both landed. It's still possible to drop the just 
created table but with some additional but necessary steps:
   1. first remove the unwanted partition field, which will create a new 
PartitionSpec
   2. Call RemoveUnusedPartitionSpec, which should be able to remove the 
previous wrongly partition spec
   3. remove the wrongly partition source field.
   
   >  First get in the RemovedUnusedPartitionSpec procedure and then prevent 
dropping if it's part of a spec. The downside of this is, it may take some more 
time in to get the whole API in? 
   
   I think we can parallelize these two PRs if others all agree that's in the 
right direction.  BTW, I can help to work on 
https://github.com/apache/iceberg/pull/3462/files to get it merged in case 
@RussellSpitzer is busy and cannot work on that recently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Prevent dropping column which is referenced by active partition… [iceberg]

2024-07-13 Thread via GitHub


advancedxy commented on PR #10352:
URL: https://github.com/apache/iceberg/pull/10352#issuecomment-2226883163

   > Sorry about the delay on this, got busy and forgot I had this open! I've 
seen more related issue reports to this, so I'm going to prioritize it.
   
   Well understood, and thanks for your effort on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Set boolean values in table properties to lowercase string [iceberg-python]

2024-07-13 Thread via GitHub


soumya-ghosh opened a new pull request, #924:
URL: https://github.com/apache/iceberg-python/pull/924

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Set properties boolean value to lowercase string [iceberg-python]

2024-07-13 Thread via GitHub


soumya-ghosh commented on issue #919:
URL: https://github.com/apache/iceberg-python/issues/919#issuecomment-2226888124

   @Fokko @kevinjqliu PR https://github.com/apache/iceberg-python/pull/924 is 
ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snowflake catalog [iceberg-python]

2024-07-13 Thread via GitHub


prabodh1194 commented on PR #687:
URL: https://github.com/apache/iceberg-python/pull/687#issuecomment-2226959566

   @Fokko , considering that snowflake's rest catalog docs are still not 
released to public, we can only wait for it. However, I believe that this PR 
will continue to be relevant considering this catalog has been GA for quite 
sometime and won't go away any time soon πŸ˜„ .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core:support http lock-manager [iceberg]

2024-07-13 Thread via GitHub


BsoBird commented on code in PR #10688:
URL: https://github.com/apache/iceberg/pull/10688#discussion_r1675607925


##
core/src/main/java/org/apache/iceberg/lock/ServerSideHttpLockManager.java:
##
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.lock;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.charset.StandardCharsets;
+import java.util.Base64;
+import java.util.Map;
+import org.apache.http.HttpResponse;
+import org.apache.http.StatusLine;
+import org.apache.http.client.HttpClient;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.protocol.RequestAcceptEncoding;
+import org.apache.http.client.protocol.ResponseContentEncoding;
+import org.apache.http.client.utils.URIBuilder;
+import org.apache.http.impl.client.DefaultHttpClient;
+import org.apache.http.params.BasicHttpParams;
+import org.apache.http.params.CoreConnectionPNames;
+import org.apache.http.params.HttpParams;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.util.LockManagers;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The user is required to provide an API interface with distributed locking 
functionality as
+ * agreed.
+ */
+public class ServerSideHttpLockManager extends LockManagers.BaseLockManager {

Review Comment:
   In this implementation, we only use a very simple GET request and do not 
consider anything related to authentication.This is because I don't see the 
need to protect this distributed locking service.After all, it only does 
locking/unlocking. I don't think it's too difficult for users to provide a 
rest-api service that meets the requirements.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core:support http lock-manager [iceberg]

2024-07-13 Thread via GitHub


BsoBird commented on code in PR #10688:
URL: https://github.com/apache/iceberg/pull/10688#discussion_r1676850504


##
core/src/main/java/org/apache/iceberg/lock/ServerSideHttpLockManager.java:
##
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.lock;
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.charset.StandardCharsets;
+import java.util.Base64;
+import java.util.Map;
+import org.apache.http.HttpResponse;
+import org.apache.http.StatusLine;
+import org.apache.http.client.HttpClient;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.protocol.RequestAcceptEncoding;
+import org.apache.http.client.protocol.ResponseContentEncoding;
+import org.apache.http.client.utils.URIBuilder;
+import org.apache.http.impl.client.DefaultHttpClient;
+import org.apache.http.params.CoreConnectionPNames;
+import org.apache.http.util.EntityUtils;
+import org.apache.iceberg.common.DynConstructors;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableMap;
+import org.apache.iceberg.util.LockManagers;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * The user is required to provide an API interface with distributed locking 
functionality as
+ * agreed.
+ */
+public class ServerSideHttpLockManager extends LockManagers.BaseLockManager {
+  private static final Logger LOG = 
LoggerFactory.getLogger(ServerSideHttpLockManager.class);
+  private String httpUrl = null;
+  private static final String OPERATOR = "operator";
+  private static final String LOCK = "lock";
+  private static final String UNLOCK = "unlock";
+  private static final String ENTITY_ID = "entityId";
+  private static final String OWNER_ID = "ownerId";
+  private static final int REQUEST_SUCCESS = 200;
+  public static final String REQUEST_URL = "lock.http.conf.request.url";
+  public static final String REQUEST_AUTH = "lock.http.conf.request.auth.impl";
+  private HttpClient httpClient = null;
+  private HttpAuthentication httpAuthentication = null;

Review Comment:
   Users can add custom auth implementations when initiating tasks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #922:
URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676852897


##
pyiceberg/io/__init__.py:
##
@@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: 
Properties) -> Optional[Fi
 return None
 
 
+def _get_first_property_value(properties: Properties, property_names: 
Tuple[str, ...]) -> Optional[Any]:

Review Comment:
   Nit: Duplicate function defined here and in pyiceberg/catalog/__init__.py 
could be consolidated to one



##
pyiceberg/catalog/__init__.py:
##
@@ -838,6 +855,18 @@ def _get_default_warehouse_location(self, database_name: 
str, table_name: str) -
 
 raise ValueError("No default path is set, please specify a location 
when creating a table")
 
+def _get_first_property_value(self, property_names: Tuple[str, ...]) -> 
Optional[Any]:
+for property_name in property_names:
+if property_value := self.properties.get(property_name):
+if property_name in DEPRECATED_PROPERTY_NAMES:
+deprecated(
+deprecated_in="0.7.0",
+removed_in="0.8.0",
+help_message=f"The property {property_name} is 
deprecated. Please use properties start with aws., glue., and dynamo. instead",

Review Comment:
   ```suggestion
   help_message=f"The property {property_name} is 
deprecated. Please use properties that start with aws., glue., and dynamo. 
instead",
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #922:
URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676852897


##
pyiceberg/io/__init__.py:
##
@@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: 
Properties) -> Optional[Fi
 return None
 
 
+def _get_first_property_value(properties: Properties, property_names: 
Tuple[str, ...]) -> Optional[Any]:

Review Comment:
   Nit: Duplicate function defined here and in `pyiceberg/catalog/__init__.py` 
could be consolidated to one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: don't include slf4j-api in bundled JARs [iceberg]

2024-07-13 Thread via GitHub


bryanck commented on PR #10665:
URL: https://github.com/apache/iceberg/pull/10665#issuecomment-2226992307

   Thanks for the contribution @devinrsmith !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Repair manifest action [iceberg]

2024-07-13 Thread via GitHub


szehon-ho commented on PR #10445:
URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2227007559

   yea i think that makes sense, let's do it in a way that we can add more 
functionality later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Repair manifest action [iceberg]

2024-07-13 Thread via GitHub


szehon-ho commented on PR #10445:
URL: https://github.com/apache/iceberg/pull/10445#issuecomment-2227037227

   @danielcweeks @tabmatfournier while we are discussing, do you guys think it 
makes sense later to integrate my functionality into this Repair action?  its 
been awhile, but iirc it was about fixing the manifest metadata (ie, file 
sizes, and metrics).  
   
   File sizes to fix a bug https://github.com/apache/iceberg/issues/1980 that 
at the time seemed important, but its probably rare.  
   
   Re-calculating metrics based on new table configs is a more common case.  
Ive seen a lot of users with huge metadatas that OOM planning, and realize they 
need to tune to reduce metrics collected, and so far there's no mechanism to do 
this.  
   
   Another option for these is additonal flag on RewriteManifests, but I think 
there were some opinions in #2608 that it should be a separate action as 
rewrite indicates, rewrite-as-is.
   
   If we agree, I can add these funcitons in a follow up.  Maybe it makes sense 
to have a bit array of options, as it seems Repair has many options then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on code in PR #922:
URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676882347


##
pyiceberg/io/__init__.py:
##
@@ -320,6 +331,13 @@ def _infer_file_io_from_scheme(path: str, properties: 
Properties) -> Optional[Fi
 return None
 
 
+def _get_first_property_value(properties: Properties, property_names: 
Tuple[str, ...]) -> Optional[Any]:

Review Comment:
   I find this PR very hard to read with all the constants. I would like to do 
a suggestion, how about changing the signature to:
   ```suggestion
   def _get_first_property_value(properties: Properties, *property_names: str) 
-> Optional[Any]:
   ```
   
   And avoid the additional constant with the grouped values:
   
   ```python
   _get_first_property_value(self.properties, S3_ACCESS_KEY_ID_PROPERTIES)
   ```
   
   Instead, write:
   
   ```python
   _get_first_property_value(self.properties, S3_ACCESS_KEY_ID, 
AWS_ACCESS_KEY_ID)
   ```
   
   The `S3_ACCESS_KEY_ID_PROPERTIES` seemed to be used just once. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Standardize AWS credential names [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on code in PR #922:
URL: https://github.com/apache/iceberg-python/pull/922#discussion_r1676882383


##
pyiceberg/io/__init__.py:
##
@@ -46,6 +48,10 @@
 
 logger = logging.getLogger(__name__)
 
+AWS_REGION = "client.region"

Review Comment:
   Nice, that's a good find πŸ‘ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676880923


##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = table_schema.field_names - task_schema.field_names
+fields_in_both = 
task_schema.field_names.intersection(table_schema.field_names)
+
+from rich.console import Console
+from rich.table import Table as RichTable
+
+console = Console(record=True)
+
+rich_table = RichTable(show_header=True, header_style="bold")
+rich_table.add_column("Field Name")
+rich_table.add_column("Category")
+rich_table.add_column("Table field")
+rich_table.add_column("Dataframe field")
+
+def print_nullability(required: bool) -> str:
+return "required" if required else "optional"
+
+for field_name in fields_in_both:

Review Comment:
   just want to check my understanding, this works for nested fields because 
nested fields are "flattened" by `. field_names` and then fetched by 
`.find_field`.
   
   For example: a `df` schema like
   ```
   task_schema = pa.field(
   "person",
   pa.struct([
   pa.field("name", pa.string(), nullable=True),
   ]),
   nullable=True,
   )
   ```
   
   `task_schema.field_names` will produce `{"person", "person.name"}`. 
   `task_schema.find_field("person")` and 
`task_schema.find_field("person.name")` will fetch the corresponding fields



##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = table_schema.field_names - task_schema.field_names
+fields_in_both = 
task_schema.field_names.intersection(table_schema.field_names)
+
+from rich.console import Console
+from rich.table import Table as RichTable
+
+console = Console(record=True)
+
+rich_table = RichTable(show_header=True, header_style="bold")
+rich_table.add_column("Field Name")
+rich_table.add_column("Category")
+rich_table.add_column("Table field")
+rich_table.add_column("Dataframe field")
+
+def print_nullability(required: bool) -> str:
+return "required" if required else "optional"
+
+for field_name in fields_in_both:
+lhs = table_schema.find_field(field_name)
+rhs = task_schema.find_field(field_name)
+# Check nullability
+if lhs.required != rhs.required:

Review Comment:
   to make this a bit more complicated... 
   an optional field can be written to by a required field. the schema 

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227047760

   @pdpark does this solution resolve your usecase? 
   
   https://github.com/apache/iceberg-python/pull/829#issuecomment-2218116050


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Set boolean values in table properties to lowercase string [iceberg-python]

2024-07-13 Thread via GitHub


Fokko merged PR #924:
URL: https://github.com/apache/iceberg-python/pull/924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Set properties boolean value to lowercase string [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on issue #919:
URL: https://github.com/apache/iceberg-python/issues/919#issuecomment-2227049015

   Resolved in #924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create table properties does not support boolean value [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu closed issue #895: Create table properties does not support boolean 
value
URL: https://github.com/apache/iceberg-python/issues/895


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Set properties boolean value to lowercase string [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu closed issue #919: Set properties boolean value to lowercase string
URL: https://github.com/apache/iceberg-python/issues/919


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Create table properties does not support boolean value [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on issue #895:
URL: https://github.com/apache/iceberg-python/issues/895#issuecomment-2227049076

   Resolved in #924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]

2024-07-13 Thread via GitHub


spock-abadai opened a new issue, #925:
URL: https://github.com/apache/iceberg-python/issues/925

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   On one of my iceberg tables, when I load a table and scan it, during the 
parsing of the name mapping in the table properties, pydantic issues the 
following `ValidationError`:
   
   ```
   def parse_mapping_from_json(mapping: str) -> NameMapping:
   >   return NameMapping.model_validate_json(mapping)
   E   pydantic_core._pydantic_core.ValidationError: 1 validation error for 
NameMapping
   E   9.names
   E Value error, At least one mapped name must be provided for the 
field [type=value_error, input_value=[], input_type=list]
   E   For further information visit 
https://errors.pydantic.dev/2.8/v/value_error
   ```
   
   This seems to be a result of the code in `table/name_mapping.py` in the 
method `check_at_least_one`, which (if I understand correctly) checks that all 
fields in the name mapping have at least one `name`. However, if I'm reading 
the [Iceberg spec](https://iceberg.apache.org/spec/#column-projection) 
correctly, it states that:
   
   
![image](https://github.com/user-attachments/assets/849de6b3-ccfd-4845-9d6a-58b81883784e)
   
   I'm not 100% sure what scenario lead to this but I can say that the name 
mapping we have indeed has a field with id 10 that has an empty list of names. 
This field existed at one point in the schema but it seems like it was removed. 
In any case, it doesn't seem like requiring that the list of names contain at 
least one value is in line with the spec (and it seems that situations where 
this isn't the case do happen). 
   
   Note that the said iceberg table was never created, written to or modified 
using `pyiceberg` (only using spark and trino). `pyiceberg` is only used to 
read. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227049610

   Thank you for the review comments @kevinjqliu πŸ™ 
   
   > Porting over Fokko's comment from the other PR [#829 
(review)](https://github.com/apache/iceberg-python/pull/829#pullrequestreview-2164061387)
 Do you know if the current change supports "re-aligning" the table?
   
   I think this test from your PR covers that case where the fields are 
re-aligned according to the Iceberg Table spec: 
https://github.com/apache/iceberg-python/pull/921/files#diff-7f3dd1244d08ce27c003cd091da10aa049f7bb0c7d5397acb4ec69767036accdR982


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676884492


##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = table_schema.field_names - task_schema.field_names
+fields_in_both = 
task_schema.field_names.intersection(table_schema.field_names)
+
+from rich.console import Console
+from rich.table import Table as RichTable
+
+console = Console(record=True)
+
+rich_table = RichTable(show_header=True, header_style="bold")
+rich_table.add_column("Field Name")
+rich_table.add_column("Category")
+rich_table.add_column("Table field")
+rich_table.add_column("Dataframe field")
+
+def print_nullability(required: bool) -> str:
+return "required" if required else "optional"
+
+for field_name in fields_in_both:
+lhs = table_schema.find_field(field_name)
+rhs = task_schema.find_field(field_name)
+# Check nullability
+if lhs.required != rhs.required:

Review Comment:
   Thank you for pointing this out! This might be simpler to deal with than 
type promotion, so let me give this a go



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676885325


##
pyiceberg/io/pyarrow.py:
##
@@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: 
Optional[pa.Array], field_id: int, _: st
 except ValueError:
 return None
 
-if isinstance(partner_struct, pa.StructArray):
-return partner_struct.field(name)
-elif isinstance(partner_struct, pa.Table):
-return partner_struct.column(name).combine_chunks()
-elif isinstance(partner_struct, pa.RecordBatch):
-return partner_struct.column(name)
-else:
-raise ValueError(f"Cannot find {name} in expected 
partner_struct type {type(partner_struct)}")
+try:
+if isinstance(partner_struct, pa.StructArray):
+return partner_struct.field(name)
+elif isinstance(partner_struct, pa.Table):
+return partner_struct.column(name).combine_chunks()
+elif isinstance(partner_struct, pa.RecordBatch):
+return partner_struct.column(name)
+else:
+raise ValueError(f"Cannot find {name} in expected 
partner_struct type {type(partner_struct)}")
+except KeyError:

Review Comment:
   Yes, that's right - the `ArrowProjectionVisitor` is responsible for 
detecting that the `field_partner` is `None` and then checking if the table 
field is also optional before filling it in with a null array. This change is 
necessary so that the `ArrowAccessor` doesn't throw an exception if the field 
can't be found in the arrow component, and enables `ArrowProjectionVisitor` to 
make use of a code pathway it wasn't able to make use of before:
   
   
https://github.com/apache/iceberg-python/blob/b11cdb54b1a05cce0ade34af4ce81a94c34b2650/pyiceberg/io/pyarrow.py#L1388-L1395



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676885362


##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = table_schema.field_names - task_schema.field_names
+fields_in_both = 
task_schema.field_names.intersection(table_schema.field_names)
+
+from rich.console import Console
+from rich.table import Table as RichTable
+
+console = Console(record=True)
+
+rich_table = RichTable(show_header=True, header_style="bold")
+rich_table.add_column("Field Name")
+rich_table.add_column("Category")
+rich_table.add_column("Table field")
+rich_table.add_column("Dataframe field")
+
+def print_nullability(required: bool) -> str:
+return "required" if required else "optional"
+
+for field_name in fields_in_both:

Review Comment:
   That's consistent with my understanding



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227052294

   My understanding of the "re-aligning" problem is the alignment between 
parquet and iceberg.
   
   similar to what Fokko mentioned here 
   https://github.com/apache/iceberg-python/pull/829#issuecomment-2219874876
   
   
   I think this PR currently only addresses schema compatibility and it is 
still possible to write parquet files with out-of-order schema. 
   The `test_table_write_out_of_order_schema` test shows we are allowing 
out-of-order schema on write, but we want to sort the schema order before 
writing.
   
   Can probably punt this to a subsequent PR as an optimization 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676911616


##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(

Review Comment:
   Nit: This naming is still from when we only used it at the read-path, 
probably we make it more generic. Maybe `provided_schema` and 
`requested_schema`? Open for suggestions!



##
tests/integration/test_add_files.py:
##
@@ -501,14 +501,11 @@ def test_add_files_fails_on_schema_mismatch(spark: 
SparkSession, session_catalog
 )
 
 expected = """Mismatch in fields:
-┏┳━━┳━━┓
-┃┃ Table field  ┃ Dataframe field  ┃
-┑╇━━╇━━┩
-β”‚ βœ… β”‚ 1: foo: optional boolean β”‚ 1: foo: optional boolean β”‚
-| βœ… β”‚ 2: bar: optional string  β”‚ 2: bar: optional string  β”‚
-β”‚ ❌ β”‚ 3: baz: optional int β”‚ 3: baz: optional string  β”‚
-β”‚ βœ… β”‚ 4: qux: optional dateβ”‚ 4: qux: optional dateβ”‚
-β””β”΄β”€β”€β”΄β”€β”€β”˜
+┏┳━━┳━━┳━┓

Review Comment:
   Is it just me, or is the left easier to read? πŸ˜… 



##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2082,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(

Review Comment:
   I'm debating myself if the API is the most extensible here. I think we 
should re-use `_check_schema_compatible(requested_schema: Schema, 
provided_schema: Schema)` instead of reimplementing the logic in Arrow here. 
This nicely splits `pyarrow_to_schema` and `_check_schema_compatible`.



##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = tabl

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#issuecomment-2227093971

   Just to expand on top of 
https://github.com/apache/iceberg-python/pull/921#issuecomment-2227052294
   
   This is the situation where you already have two Iceberg schemas with the 
IDs assigned. I think an easier approach to solve this problem is to first 
convert the Arrow schema to an Iceberg schema (similar to the logic with the 
name-mapping), and then compare the two Iceberg schemas.
   
   The process would look like:
   
   - Convert the Arrow schema into Iceberg using name-mapping, similar to 
[here](https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/io/pyarrow.py#L2082-L2086)
   - Compare the two schemas using a `SchemaWithPartnerVisitor` if they are 
compatible. If not, we should generate a pretty error.
   - If they are compatible, we should be able to just push the schema through 
`to_requested_schema` to align/cast the fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on issue #925:
URL: https://github.com/apache/iceberg-python/issues/925#issuecomment-2227097367

   @spock-abadai Thanks for reporting this. I agree that this seems to be 
incorrect. Are you interested in providing a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Calling load_table().scan().to_arrow() emits error on empty "names" field in name mapping [iceberg-python]

2024-07-13 Thread via GitHub


spock-abadai commented on issue #925:
URL: https://github.com/apache/iceberg-python/issues/925#issuecomment-2227109344

   > @spock-abadai Thanks for reporting this. I agree that this seems to be 
incorrect. Are you interested in providing a PR?
   
   Sure, see #927 for review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Concurrent table scans [iceberg-rust]

2024-07-13 Thread via GitHub


sdd commented on code in PR #373:
URL: https://github.com/apache/iceberg-rust/pull/373#discussion_r1676960798


##
crates/iceberg/src/scan.rs:
##
@@ -389,12 +333,158 @@ impl FileScanStreamContext {
 file_io,
 bound_filter,
 case_sensitive,
+sender,
+manifest_evaluator_cache: ManifestEvaluatorCache::new(),
+partition_filter_cache: PartitionFilterCache::new(),
+expression_evaluator_cache: 
Arc::new(Mutex::new(ExpressionEvaluatorCache::new())),
 })
 }
 
-/// Returns a reference to the [`BoundPredicate`] filter.
-fn bound_filter(&self) -> Option<&BoundPredicate> {
-self.bound_filter.as_ref()
+async fn run(&mut self, manifest_list: ManifestList) -> Result<()> {
+let file_io = self.file_io.clone();
+let sender = self.sender.clone();
+
+// This whole Vec-and-for-loop approach feels sub-optimally structured.
+// I've tried structuring this in multiple ways but run into
+// issues with ownership. Ideally I'd like to structure this
+// with a functional programming approach: extracting
+// sections 1, 2, and 3 out into different methods on Self,
+// and then use some iterator combinators to orchestrate it all.
+// Section 1 is pretty trivially refactorable into a static method
+// that can be used in a closure that can be used with 
Iterator::filter.
+// Similarly, section 3 seems easily factor-able into a method that can
+// be passed into Iterator::map.
+// Section 2 turns out trickier - we want to exit the entire `run` 
method early
+// if the eval fails, and filter out any manifest_files from the 
iterator / stream
+// if the eval succeeds but returns true. We bump into ownership 
issues due
+// to needing to pass mut self as the caches need to be able to mutate.
+// Section 3 runs into ownership issues when trying to refactor its 
closure to be
+// a static or non-static method.
+
+// 1
+let filtered_manifest_files = manifest_list
+.entries()
+.iter()
+.filter(Self::reject_unsupported_content_types);
+
+// 2
+let mut filtered_manifest_files2 = vec![];
+for manifest_file in filtered_manifest_files {
+if !self.apply_evaluator(manifest_file)? {
+continue;
+}
+
+filtered_manifest_files2.push(manifest_file);
+}
+
+// 3
+let filtered_manifest_files = 
filtered_manifest_files2.into_iter().map(|manifest_file| {
+Ok(ManifestFileProcessingContext {
+manifest_file,
+file_io: file_io.clone(),
+sender: sender.clone(),
+filter: &self.bound_filter,
+expression_evaluator_cache: 
self.expression_evaluator_cache.clone(),
+})
+});
+
+futures::stream::iter(filtered_manifest_files)
+.try_for_each_concurrent(

Review Comment:
   Aah. You're right. `try_for_each_concurrent` is concurrent but not parallel. 
What I have should still be an improvement on what we have already, as we'll at 
least be able to have multiple fileIO file requests in-flight at the same time, 
but this is not fully parallelised.
   
   I'll work on making this parallel as well as concurrent. Some initial 
experimentation shows that it is tricky to please the borrow checker, but I'll 
keep trying.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676966897


##
tests/integration/test_add_files.py:
##
@@ -501,14 +501,11 @@ def test_add_files_fails_on_schema_mismatch(spark: 
SparkSession, session_catalog
 )
 
 expected = """Mismatch in fields:
-┏┳━━┳━━┓
-┃┃ Table field  ┃ Dataframe field  ┃
-┑╇━━╇━━┩
-β”‚ βœ… β”‚ 1: foo: optional boolean β”‚ 1: foo: optional boolean β”‚
-| βœ… β”‚ 2: bar: optional string  β”‚ 2: bar: optional string  β”‚
-β”‚ ❌ β”‚ 3: baz: optional int β”‚ 3: baz: optional string  β”‚
-β”‚ βœ… β”‚ 4: qux: optional dateβ”‚ 4: qux: optional dateβ”‚
-β””β”΄β”€β”€β”΄β”€β”€β”˜
+┏┳━━┳━━┳━┓

Review Comment:
   I opted for this approach because I wanted to group the Extra Fields in the 
dataframe also into the table. But if we are taking the approach of using the 
`name_mapping` to generate the Iceberg Schema with consistent IDs after first 
checking that there are no extra fields, I think we can go back to the old way



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

2024-07-13 Thread via GitHub


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676967257


##
pyiceberg/io/pyarrow.py:
##
@@ -2079,36 +2083,63 @@ def _check_schema_compatible(table_schema: Schema, 
other_schema: pa.Schema, down
 Raises:
 ValueError: If the schemas are not compatible.
 """
-name_mapping = table_schema.name_mapping
-try:
-task_schema = pyarrow_to_schema(
-other_schema, name_mapping=name_mapping, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us
-)
-except ValueError as e:
-other_schema = _pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
-additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
-raise ValueError(
-f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
-) from e
-
-if table_schema.as_struct() != task_schema.as_struct():
-from rich.console import Console
-from rich.table import Table as RichTable
+task_schema = assign_fresh_schema_ids(
+_pyarrow_to_schema_without_ids(other_schema, 
downcast_ns_timestamp_to_us=downcast_ns_timestamp_to_us)
+)
 
-console = Console(record=True)
+extra_fields = task_schema.field_names - table_schema.field_names
+missing_fields = table_schema.field_names - task_schema.field_names
+fields_in_both = 
task_schema.field_names.intersection(table_schema.field_names)
+
+from rich.console import Console
+from rich.table import Table as RichTable
+
+console = Console(record=True)
+
+rich_table = RichTable(show_header=True, header_style="bold")
+rich_table.add_column("Field Name")
+rich_table.add_column("Category")
+rich_table.add_column("Table field")
+rich_table.add_column("Dataframe field")
+
+def print_nullability(required: bool) -> str:
+return "required" if required else "optional"
+
+for field_name in fields_in_both:
+lhs = table_schema.find_field(field_name)
+rhs = task_schema.find_field(field_name)
+# Check nullability
+if lhs.required != rhs.required:
+rich_table.add_row(
+field_name,
+"Nullability",
+f"{print_nullability(lhs.required)} {str(lhs.field_type)}",
+f"{print_nullability(rhs.required)} {str(rhs.field_type)}",
+)
+# Check if type is consistent
+if any(
+(isinstance(lhs.field_type, container_type) and 
isinstance(rhs.field_type, container_type))
+for container_type in {StructType, MapType, ListType}

Review Comment:
   Hmmm isn't the nullability check for parents and child nodes covered from 
L2111 to L2117?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]

2024-07-13 Thread via GitHub


kevinjqliu commented on code in PR #927:
URL: https://github.com/apache/iceberg-python/pull/927#discussion_r1676974835


##
tests/table/test_name_mapping.py:
##
@@ -247,11 +247,6 @@ def test_mapping_lookup_by_name(table_name_mapping_nested: 
NameMapping) -> None:
 table_name_mapping_nested.find("boom")
 
 
-def test_invalid_mapped_field() -> None:

Review Comment:
   nit: can we add a test to check for `permits an empty list of names in the 
default name mapping`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Core: Kryo serialization error with DataFile after copying [iceberg]

2024-07-13 Thread via GitHub


fengjiajie opened a new pull request, #10695:
URL: https://github.com/apache/iceberg/pull/10695

   In my use case with Iceberg 1.3, I have a Flink `function-1` that outputs a 
`DataStream`, which is then processed by the next function. The 
simplified code for `function-1` is as follows:
   
   ```java
   // Inside function-1:
   
   Map columnSizes  = new HashMap<>();
   columnSizes.put(1, 234L);
   DataFile dataFile = DataFiles.builder(icebergTable.spec())
   .withMetrics(new Metrics(123L, columnSizes, ...))
   ...
   .build();
   
   // Move file to new path, then rebuild DataFile
   DataFile newDataFile = DataFiles.builder(icebergTable.spec())
   .copy(dataFile)
   .withPath("file:///new_path")
   .build();
   ```
   
   If I return `dataFile`, Flink's Kryo framework can deserialize it correctly 
in the next function. However, if I return `newDataFile` (reconstructed with 
`copy`), Kryo fails with the following exception:
   
   ```
   Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:176)
 ...
at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
... 4 more
   Caused by: com.esotericsoftware.kryo.KryoException: 
java.lang.UnsupportedOperationException
   Serialization trace:
   columnSizes (org.apache.iceberg.GenericDataFile)
at 
com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
...
   Caused by: java.lang.UnsupportedOperationException
at java.util.Collections$UnmodifiableMap.put(Collections.java:1459)
...
   ```
   
   This issue arises in Iceberg 1.15 but not in 1.13. The root cause lies in 
the `toReadableMap` method of `org.apache.iceberg.BaseFile`:
   
   ```java
   // Iceberg 1.13:
 private static  Map toReadableMap(Map map) {
   return map instanceof SerializableMap ? 
((SerializableMap)map).immutableMap() : map;
 }
   
   // Iceberg 1.15:
 private static  Map toReadableMap(Map map) {
   if (map == null) {
 return null;
   } else if (map instanceof SerializableMap) {
 return ((SerializableMap) map).immutableMap();
   } else {
 return Collections.unmodifiableMap(map);
   }
 }
   ```
   
   In Iceberg 1.15, `toReadableMap` wraps the map with 
`Collections.unmodifiableMap`, resulting in an `UnsupportedOperationException` 
during deserialization. While using `unmodifiableMap` seems correct, the `copy` 
operation might need to reconstruct these maps as regular mutable maps to avoid 
this issue. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump net.snowflake:snowflake-jdbc from 3.16.1 to 3.17.0 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] opened a new pull request, #10696:
URL: https://github.com/apache/iceberg/pull/10696

   Bumps 
[net.snowflake:snowflake-jdbc](https://github.com/snowflakedb/snowflake-jdbc) 
from 3.16.1 to 3.17.0.
   
   Release notes
   Sourced from https://github.com/snowflakedb/snowflake-jdbc/releases";>net.snowflake:snowflake-jdbc's
 releases.
   
   v3.17.0
   
   Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   
   
   
   Changelog
   Sourced from https://github.com/snowflakedb/snowflake-jdbc/blob/master/CHANGELOG.rst";>net.snowflake:snowflake-jdbc's
 changelog.
   
   JDBC Driver 3.17.0
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.16.1
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.16.0
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.15.1
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.15.0
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.5
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.4
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.3
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.2
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.1
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.14.0
   
   ||Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.13.33
   
   || Please Refer to Release Notes at https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc";>https://docs.snowflake.com/en/release-notes/clients-drivers/jdbc
   
   JDBC Driver 3.13.32
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/snowflakedb/snowflake-jdbc/commit/356852c65c31c92f31a875b2f185095564a227b9";>356852c
 Bump version to 3.17.0 for release (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1812";>#1812)
   https://github.com/snowflakedb/snowflake-jdbc/commit/f6548d570dfe5b9736c309c319ccb054a92e05c4";>f6548d5
 SNOW-1369651: Do not fail file pattern expanding on file not found in 
differe...
   https://github.com/snowflakedb/snowflake-jdbc/commit/5f840231e128945105d8e1c02df3b7e20947080c";>5f84023
 SNOW-1514498 - support for host when use file configuration (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1809";>#1809)
   https://github.com/snowflakedb/snowflake-jdbc/commit/997002b86897944fbcf529c6b6a4a0a71eaeb078";>997002b
 SNOW-731500: Add JDBC Connectivity Diagnostics mode (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1789";>#1789)
   https://github.com/snowflakedb/snowflake-jdbc/commit/8e916ae62af6fae3c6bd7fabdca609129fbbe18a";>8e916ae
 SNOW-1465374: Return consistent timestamps_ltz between JSON and ARROW result 
...
   https://github.com/snowflakedb/snowflake-jdbc/commit/3fee6f9240599049c0d17960623b79f3bd10";>3fee6f9
 SNOW-1196082: FIx inserting and reading timestamps not symetric if too much 
c...
   https://github.com/snowflakedb/snowflake-jdbc/commit/f39eff42290bfda27edb46d2cef0181f12ee52c3";>f39eff4
 SNOW-957747: Easy logging improvements (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1730";>#1730)
   https://github.com/snowflakedb/snowflake-jdbc/commit/d4504f8b7780a2504712b21569bb29ccaf755073";>d4504f8
 SNOW-1163203: Increased Max LOB size in metadata (https://redirect.github.com/snowflakedb/snowflake-jdbc/issues/1806";>#1806)
   https://github.com/snowflakedb/snowflake-jdbc/commit/a4db3096c3282eb0c8aa7b86229d063a4

[PR] Build: Bump nessie from 0.92.0 to 0.92.1 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] opened a new pull request, #10697:
URL: https://github.com/apache/iceberg/pull/10697

   Bumps `nessie` from 0.92.0 to 0.92.1.
   Updates `org.projectnessie.nessie:nessie-client` from 0.92.0 to 0.92.1
   
   Updates `org.projectnessie.nessie:nessie-jaxrs-testextension` from 0.92.0 to 
0.92.1
   
   Updates `org.projectnessie.nessie:nessie-versioned-storage-inmemory-tests` 
from 0.92.0 to 0.92.1
   
   Updates `org.projectnessie.nessie:nessie-versioned-storage-testextension` 
from 0.92.0 to 0.92.1
   
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump org.assertj:assertj-core from 3.26.0 to 3.26.3 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] opened a new pull request, #10698:
URL: https://github.com/apache/iceberg/pull/10698

   Bumps [org.assertj:assertj-core](https://github.com/assertj/assertj) from 
3.26.0 to 3.26.3.
   
   Release notes
   Sourced from https://github.com/assertj/assertj/releases";>org.assertj:assertj-core's 
releases.
   
   v3.26.3
   :jigsaw: Binary Compatibility
   The release is:
   
   Binary compatible with the previous minor version.
   Binary incompatible with the previous patch version.
   
   :boom: Breaking Changes
   Core
   
   Replace assertThat(Temporal) with 
assertThatTemporal(Temporal) https://redirect.github.com/assertj/assertj/issues/3519";>#3519
   
   :bug: Bug Fixes
   Core
   
   Fix Javadoc rendering on 
FactoryBasedNavigableListAssert::assertThat
   Allow ComparingNormalizedFields instances to be reused 
across different assertions https://redirect.github.com/assertj/assertj/issues/3493";>#3493
   
   :hammer: Dependency Upgrades
   Core
   
   Upgrade to Byte Buddy 1.14.18 https://redirect.github.com/assertj/assertj/issues/3531";>#3531
   Upgrade to JUnit BOM 5.10.3 https://redirect.github.com/assertj/assertj/issues/3525";>#3525
   
   Guava
   
   Upgrade to Guava 33.2.1-jre https://redirect.github.com/assertj/assertj/issues/3499";>#3499
   
   :heart: Contributors
   Thanks to all the contributors who worked on this release:
   https://github.com/genuss";>@​genuss
   
   
   
   Commits
   
   https://github.com/assertj/assertj/commit/8e97f90d62782a5fe2e49f739164361ecb54738b";>8e97f90
 [maven-release-plugin] prepare release assertj-build-3.26.3
   https://github.com/assertj/assertj/commit/d1afefca4e99fe6476e04d61d670fe24c25b7f4d";>d1afefc
 chore(deps): bump com.github.spotbugs:spotbugs-maven-plugin from 4.8.6.1 to 
4...
   https://github.com/assertj/assertj/commit/2dc2cbfa4070d2bbeb1a748c0833c277047fd93d";>2dc2cbf
 chore(deps): bump byte-buddy.version from 1.14.17 to 1.14.18 (https://redirect.github.com/assertj/assertj/issues/3531";>#3531)
   https://github.com/assertj/assertj/commit/2541d3c1722a923c9a1cf3b33255e3fc8ee07135";>2541d3c
 chore(deps-dev): bump com.fasterxml.jackson.core:jackson-databind from 
2.17.1...
   https://github.com/assertj/assertj/commit/cdb906fb0d9ee0dfddb2925f15a46a5b7b16c5da";>cdb906f
 [maven-release-plugin] prepare for next development iteration
   https://github.com/assertj/assertj/commit/c3b1f4a5eef1284e484e43374e9d5ba8ba33707d";>c3b1f4a
 [maven-release-plugin] prepare release assertj-build-3.26.2
   https://github.com/assertj/assertj/commit/d5b52abe63cca2b84b6abc38dd9b15a0e63f9b9d";>d5b52ab
 [maven-release-plugin] prepare for next development iteration
   https://github.com/assertj/assertj/commit/17ea711f63eea0c3081be2e36f1d089edaba1f07";>17ea711
 [maven-release-plugin] prepare release assertj-build-3.26.1
   https://github.com/assertj/assertj/commit/8cf054db3230522155c5b637e332e485622c8b82";>8cf054d
 chore(deps): bump org.codehaus.mojo:versions-maven-plugin from 2.16.2 to 
2.17...
   https://github.com/assertj/assertj/commit/5e708b48ef69f4d8c35ec86690ba6bc491cb579a";>5e708b4
 chore(deps-dev): bump org.apache.groovy:groovy from 4.0.21 to 4.0.22 (https://redirect.github.com/assertj/assertj/issues/3527";>#3527)
   Additional commits viewable in https://github.com/assertj/assertj/compare/assertj-build-3.26.0...assertj-build-3.26.3";>compare
 view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.assertj:assertj-core&package-manager=gradle&previous-version=3.26.0&new-version=3.26.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it you

[PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.43.0 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] opened a new pull request, #10699:
URL: https://github.com/apache/iceberg/pull/10699

   Bumps 
[com.google.cloud:libraries-bom](https://github.com/googleapis/java-cloud-bom) 
from 26.28.0 to 26.43.0.
   
   Release notes
   Sourced from https://github.com/googleapis/java-cloud-bom/releases";>com.google.cloud:libraries-bom's
 releases.
   
   v26.43.0
   GCP Libraries BOM 26.43.0
   Here are the differences from the previous version (26.42.0)
   New Addition
   
   com.google.cloud:google-cloud-gdchardwaremanagement:0.1.0
   
   The group ID of the following artifacts is 
com.google.cloud.
   Notable Changes
   google-cloud-bigquery 2.41.0 (prev: 2.40.3)
   
   
   Add columnNameCharacterMap to LoadJobConfiguration (https://redirect.github.com/googleapis/java-bigquery/issues/3356";>#3356)
 (https://github.com/googleapis/java-bigquery/commit/2f3cbe39619bcc93cb7d504417accd84b418dd41";>2f3cbe3)
   
   
   Add MetadataCacheMode to ExternalTableDefinition (https://redirect.github.com/googleapis/java-bigquery/issues/3351";>#3351)
 (https://github.com/googleapis/java-bigquery/commit/2814dc49dfdd5671257b6a9933a5dd381d889dd1";>2814dc4)
   
   
   Add clustering value to ListTables result (https://redirect.github.com/googleapis/java-bigquery/issues/3359";>#3359)
 (https://github.com/googleapis/java-bigquery/commit/5d52bc9f4ef93f84200335685901c6ac0256b769";>5d52bc9)
   
   
   google-cloud-bigtable 2.40.0 (prev: 2.39.5)
   
   
   Add String type with Utf8Raw encoding to Bigtable API (https://redirect.github.com/googleapis/java-bigtable/issues/2191";>#2191)
 (https://github.com/googleapis/java-bigtable/commit/e7f03fc7d252a7ff6c76a8e6e0a9e6ad3dcbd9d5";>e7f03fc)
   
   
   Add getServiceName() to EnhancedBigTableStubSettings (https://redirect.github.com/googleapis/java-bigtable/issues/2256";>#2256)
 (https://github.com/googleapis/java-bigtable/commit/da703db25f6702b263dbd8ded0cb0fd3422efe31";>da703db)
   
   
   Remove grpclb (https://redirect.github.com/googleapis/java-bigtable/issues/2033";>#2033)
 (https://github.com/googleapis/java-bigtable/commit/735537571a147bfdd2a986664ff7905c8f5dc3db";>7355375)
   
   
   google-cloud-firestore 3.22.0 (prev: 3.21.4)
   
   Add bulk delete api (https://redirect.github.com/googleapis/java-firestore/issues/1704";>#1704)
 (https://github.com/googleapis/java-firestore/commit/5ef625456afbee781951d2a5c6a9c2548feea92e";>5ef6254)
   
   google-cloud-logging 3.19.0 (prev: 3.18.0)
   
   logging: OpenTelemetry trace/span ID integration for 
Java logging library (https://redirect.github.com/googleapis/java-logging/issues/1596";>#1596)
 (https://github.com/googleapis/java-logging/commit/67db829621fd1c4a876d158fe1afb4927821fa54";>67db829)
   
   google-cloud-pubsub 1.131.0 (prev: 1.130.0)
   
   Add use_topic_schema for Cloud Storage Subscriptions (https://redirect.github.com/googleapis/java-pubsub/issues/2082";>#2082)
 (https://github.com/googleapis/java-pubsub/commit/11d67d44152ccca008dda071683d9932c59af41d";>11d67d4)
   
   google-cloud-spanner 6.71.0 (prev: 6.69.0)
   
   
   Add field order_by in spanner.proto (https://redirect.github.com/googleapis/java-spanner/issues/3064";>#3064)
 (https://github.com/googleapis/java-spanner/commit/52ee1967ee3a37fb0482ad8b51c6e77e28b79844";>52ee196)
   
   
   Do not end transaction span when rolling back to savepoint (https://redirect.github.com/googleapis/java-spanner/issues/3167";>#3167)
 (https://github.com/googleapis/java-spanner/commit/8ec0cf2032dece545c9e4d8a794b80d06550b710";>8ec0cf2)
   
   
   Remove unused DmlBatch span (https://redirect.github.com/googleapis/java-spanner/issues/3147";>#3147)
 (https://github.com/googleapis/java-spanner/commit/f7891c1ca42727c775cdbe91bff8d55191a3d799";>f7891c1)
   
   
   google-cloud-spanner-jdbc 2.20.1 (prev: 2.19.3)
   
   Add OpenTelemetry tracing (https://redirect.github.com/googleapis/java-spanner-jdbc/issues/1568";>#1568)
 (https://github.com/googleapis/java-spanner-jdbc/commit/1485a04272c270851468254bebffe4f7d846f17c";>1485a04)
   
   Other libraries
   
   [aiplatform] add enum value MALFORMED_FUNCTION_CALL to 
.google.cloud.aiplatform.v1beta1.content.Candidate.FinishReason 
(https://github.com/googleapis/google-cloud-java/commit/666a3ac8cd0cb45da3555a1bef8dfe3edf57d257";>666a3ac)
   [aiplatform] add MALFORMED_FUNCTION_CALL to FinishReason (https://github.com/googleapis/google-cloud-java/commit/666a3ac8cd0cb45da3555a1bef8dfe3edf57d257";>666a3ac)
   [batch] add a install_ops_agent field to InstancePolicyOrTemplate for 
Ops Agent support (https://github.com/googleapis/google-cloud-java/commit/564c3692bc8d22f4f5acdcbfb60be0582bd08d53";>564c369)
   [container] A new message HugepagesConfig is added (https://github.com/googleapis/google-cloud-java/commit/4be1db5cfca842d85e167b8ed033ebcbcbd758dd";>4be1db5)
   [container] A new method_signature parent is added to 
method ListOperations in service ClusterManager (https://github.com/googleapis/google-cloud-java/commit/4be1db5cfca842d85e167

Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.42.0 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] commented on PR #10556:
URL: https://github.com/apache/iceberg/pull/10556#issuecomment-2227195410

   Superseded by #10699.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Build: Bump com.google.cloud:libraries-bom from 26.28.0 to 26.42.0 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] closed pull request #10556: Build: Bump 
com.google.cloud:libraries-bom from 26.28.0 to 26.42.0
URL: https://github.com/apache/iceberg/pull/10556


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] Build: Bump software.amazon.awssdk:bom from 2.26.16 to 2.26.20 [iceberg]

2024-07-13 Thread via GitHub


dependabot[bot] opened a new pull request, #10700:
URL: https://github.com/apache/iceberg/pull/10700

   Bumps software.amazon.awssdk:bom from 2.26.16 to 2.26.20.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=software.amazon.awssdk:bom&package-manager=gradle&previous-version=2.26.16&new-version=2.26.20)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]

2024-07-13 Thread via GitHub


spock-abadai commented on code in PR #927:
URL: https://github.com/apache/iceberg-python/pull/927#discussion_r1677034401


##
tests/table/test_name_mapping.py:
##
@@ -247,11 +247,6 @@ def test_mapping_lookup_by_name(table_name_mapping_nested: 
NameMapping) -> None:
 table_name_mapping_nested.find("boom")
 
 
-def test_invalid_mapped_field() -> None:

Review Comment:
   Done. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]

2024-07-13 Thread via GitHub


spock-abadai commented on PR #927:
URL: https://github.com/apache/iceberg-python/pull/927#issuecomment-2227208143

   > Thanks for raising this PR @spock-abadai
   > 
   > I think we should update line 40 as well:
   > 
   > 
https://github.com/apache/iceberg-python/blob/3f44dfe711e96beda6aa8622cf5b0baffa6eb0f2/pyiceberg/table/name_mapping.py#L40
   
   Good catch, fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Allow empty `names` in mapped field of Name Mapping [iceberg-python]

2024-07-13 Thread via GitHub


Fokko commented on PR #927:
URL: https://github.com/apache/iceberg-python/pull/927#issuecomment-2227217603

   @spock-abadai Thanks again for working on this. I think we're almost there. 
Can you run `make lint` to fix the formatting issue? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org