date:20240713

Re: [PR] mr：Fix issues 10639 [iceberg]

2024-07-13 Thread via GitHub



pvary commented on PR #10661:
URL: https://github.com/apache/iceberg/pull/10661#issuecomment-2226801595

   > compile jdk8, run jdk11, but we are actively working on jdk11 compile now, 
that should be in Hive-4.1.0 release
   
   When is Hive 4.1.0 planned?
   You might want to find and comment on the deprecation thread for Hive when 
it is ready:
   https://lists.apache.org/thread/bn2c480wdbzr089p88n1003zhw0nj9kv
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] mr：Fix issues 10639 [iceberg]

2024-07-13 Thread via GitHub



lurnagao-dahua commented on code in PR #10661:
URL: https://github.com/apache/iceberg/pull/10661#discussion_r1674923132


##
mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java:
##
@@ -381,6 +386,56 @@ public void testCustomCatalog() throws IOException {
 testInputFormat.create(builder.conf()).validate(expectedRecords);
   }
 
+  @TestTemplate
+  public void testWorkerPool() throws Exception {
+Table table = helper.createUnpartitionedTable();
+List records = helper.generateRandomRecords(1, 0L);
+helper.appendToTable(null, records);

Review Comment:
   Thank you for your guidance. I have further optimized the unit test cases!
   Could you please take a review when you have time again?
   I would be very grateful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] refine: move binary serialize in literal to datum [iceberg-rust]

2024-07-13 Thread via GitHub



ZENOTME commented on PR #456:
URL: https://github.com/apache/iceberg-rust/pull/456#issuecomment-2226828202

   cc @liurenjie1024 @Xuanwo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub



c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794013


##
crates/iceberg/testdata/view_metadata/ViewMetadataV2Valid.json:
##
@@ -0,0 +1,58 @@
+{

Review Comment:
   Ok, thanks Eduard for the Feedback.
   
   My preferred way of going forward would be:
   
   1. Merge this PR which just contains the structs & (de)serialization (very 
much like the `TableMetadaData` in its current state)
   2. Figure out a good way for the `TableMetadataBuilder` including partition 
binding. We have a first shot ready that we can create a PR for.
   3. Use the same pattern we use for the `TableMetadataBuilder` for the much 
lighter `ViewMetadataBuilder` which then includes all the tests Eduard 
mentioned.
   
   I am aware that a lot of things are missing in the builder. It was a 
deliberate decision from me to get views up-to-speed with tables first, and 
then in a second step extend both tables and views features to the java level.
   
   @ZENOTME , @nastra would that be OK for you?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-13 Thread via GitHub



c-thiel commented on code in PR #331:
URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1676794680


##
crates/iceberg/src/spec/view_metadata.rs:
##
@@ -0,0 +1,682 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Defines the [view 
metadata](https://iceberg.apache.org/view-spec/#view-metadata).
+//! The main struct here is [ViewMetadata] which defines the data for a view.
+
+use serde::{Deserialize, Serialize};
+use serde_repr::{Deserialize_repr, Serialize_repr};
+use std::cmp::Ordering;
+use std::fmt::{Display, Formatter};
+use std::{collections::HashMap, sync::Arc};
+use uuid::Uuid;
+
+use super::{
+view_version::{ViewVersion, ViewVersionRef},
+SchemaId, SchemaRef,
+};
+use crate::catalog::ViewCreation;
+use crate::error::Result;
+
+use _serde::ViewMetadataEnum;
+
+use chrono::{DateTime, MappedLocalTime, TimeZone, Utc};
+
+/// Reference to [`ViewMetadata`].
+pub type ViewMetadataRef = Arc;
+
+#[derive(Debug, PartialEq, Deserialize, Eq, Clone)]
+#[serde(try_from = "ViewMetadataEnum", into = "ViewMetadataEnum")]
+/// Fields for the version 1 of the view metadata.
+///
+/// We assume that this data structure is always valid, so we will panic when 
invalid error happens.
+/// We check the validity of this data structure when constructing.
+pub struct ViewMetadata {
+/// Integer Version for the format.
+pub(crate) format_version: ViewFormatVersion,
+/// A UUID that identifies the view, generated when the view is created.
+pub(crate) view_uuid: Uuid,
+/// The view's base location; used to create metadata file locations
+pub(crate) location: String,
+/// ID of the current version of the view (version-id)
+pub(crate) current_version_id: i64,
+/// A list of known versions of the view
+pub(crate) versions: HashMap,
+/// A list of version log entries with the timestamp and version-id for 
every
+/// change to current-version-id
+pub(crate) version_log: Vec,

Review Comment:
   Hm, in the iceberg spec its called versions & version-log 
https://iceberg.apache.org/view-spec/#view-metadata.
   I implemented the same way `TableMetadata` is currently implemented: Calling 
the struct field identically to the spec, but then creating a history accessor.
   
   TableMetadata:
   
https://github.com/c-thiel/iceberg-rust/blob/ca9de89ac9d95683c8fe9191f72ab922dc4c7672/crates/iceberg/src/spec/table_metadata.rs#L208-L211
   
   ViewMetadata:
   
https://github.com/c-thiel/iceberg-rust/blob/44630160be1bcf48249c31006b76a7150a029619/crates/iceberg/src/spec/view_metadata.rs#L151-L156
   
   As the `versions` field is not public, it actually implements almost the 
same interface as java.
   
   @ZENOTME or some other rust dev, it would be great to get some opinion on 
this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Core:support http lock-manager [iceberg]

2024-07-13 Thread via GitHub

snazy commented on PR #10688:
URL: https://github.com/apache/iceberg/pull/10688#issuecomment-2226834838

It seems there are lot of individual points that should be discussed.
@BsoBird do you mind adding a point to the "Discussions" section on the [agenda
for the next community sync on July
31st](https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit)?

Let me comment on some of the points. TL;DR I think it's not an easy problem
to solve.

> balance this with the use of lockManager (which in itself is not a
difficult task).

Locking is one of the most difficult problems in distributed computing. (And
actually in non-distributed computing, too.) Just very few things that come to
my mind (the real list is much longer):

* Evil users lock all your tables (-> authorization)
* Retry/back-off -> piling up waiting requests -> threading issues
* Crashed clients leave stuff locked
* "all the things" about race conditions

> ... a large number of users use GlueCatalog ... want to access both
iceberg and non-iceberg tables in Glue.

That's IMO a fair point.

> 2.You might think that users could solve this problem by implementing
rest-catalog. However, many users do not have the ability to implement
rest-catalog.Also, rest-catalog is still evolving, and many of the
specifications have not yet been finalised.

I suspect, this wouldn't be an issue, if Glue would have a rest-catalog?

> 3.Regarding the fact that all clients need to be involved in the locking
in order for it to work. I think this is a problem that users need to solve
themselves.

Similar to my statement above, users should not be forced into
implementing/providing a proper distributed locking mechanism.

Also isolation levels play a role here. Some use cases are probably fine
with "dirty reads", some with "read committed" and others require full
"serializeable" guarantees.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

61 matches

Mail list logo