date:20230908

[GitHub] [iceberg-rust] JanKaul commented on issue #52: No builder for TableMetadata and no public field

2023-09-08 Thread via GitHub



JanKaul commented on issue #52:
URL: https://github.com/apache/iceberg-rust/issues/52#issuecomment-1711180269

   I would also be in favor of using the builder pattern for the pub structs.
   
   If I'm correct all pub structs except for TableMetadata already have a 
builder. With the `derive_builder` crate it should be quite easy to implement 
the buider for TableMetadata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] Fokko commented on pull request #8521: Python: Non-Cython fallback Avro parser

2023-09-08 Thread via GitHub



Fokko commented on PR #8521:
URL: https://github.com/apache/iceberg/pull/8521#issuecomment-1711190176

   @rustyconover Yes I agree. It looks like it is pulling the wheel correctly 
but it is missing the `decoder_fast` module. Maybe still good to just add this 
fallback anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] GoGoWen opened a new issue, #8527: Why Iceberg do not support column with default value?

2023-09-08 Thread via GitHub



GoGoWen opened a new issue, #8527:
URL: https://github.com/apache/iceberg/issues/8527

   ### Query engine
   
   why  Iceberg do not support column  with default value?  like mysql "k1 INT 
DEFAULT '1'"?
   
   ### Question
   
   why  Iceberg do not support column  with default value?  like mysql "k1 INT 
DEFAULT '1'"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] Fokko commented on issue #8527: Why Iceberg do not support column with default value?

2023-09-08 Thread via GitHub



Fokko commented on issue #8527:
URL: https://github.com/apache/iceberg/issues/8527#issuecomment-1711211295

   This is actually in the works: 
https://iceberg.apache.org/spec/#default-values This will be part of Spec 
version 3 that's being finalized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] getAlexRibeiro closed issue #7537: Error reading version hint file

2023-09-08 Thread via GitHub



getAlexRibeiro closed issue #7537: Error reading version hint file
URL: https://github.com/apache/iceberg/issues/7537


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] JanKaul opened a new pull request, #57: Metadata integration tests

2023-09-08 Thread via GitHub



JanKaul opened a new pull request, #57:
URL: https://github.com/apache/iceberg-rust/pull/57

   This PR adds integration tests for reading the table metadata from files. 
Some of the tests are designed to fail. With the current design of the 
serialization/deserialization the error doesn't specify which field is missing. 
So I couldn't do a precise check for certain tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] JanKaul commented on pull request #57: Metadata integration tests

2023-09-08 Thread via GitHub



JanKaul commented on PR #57:
URL: https://github.com/apache/iceberg-rust/pull/57#issuecomment-1711222438

   @liurenjie1024, @Xuanwo , @Fokko it would be great if you could take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] ZENOTME commented on a diff in pull request #56: feat: support read Manifest List

2023-09-08 Thread via GitHub



ZENOTME commented on code in PR #56:
URL: https://github.com/apache/iceberg-rust/pull/56#discussion_r1319653288


##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -0,0 +1,881 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! ManifestList for Iceberg.
+
+use crate::{avro::schema_to_avro_schema, spec::Literal, Error};
+use apache_avro::{from_value, types::Value, Reader};
+use once_cell::sync::Lazy;
+use std::sync::Arc;
+
+use super::{FormatVersion, ListType, NestedField, NestedFieldRef, Schema, 
StructType};
+
+/// Snapshots are embedded in table metadata, but the list of manifests for a
+/// snapshot are stored in a separate manifest list file.
+///
+/// A new manifest list is written for each attempt to commit a snapshot
+/// because the list of manifests always changes to produce a new snapshot.
+/// When a manifest list is written, the (optimistic) sequence number of the
+/// snapshot is written for all new manifest files tracked by the list.
+///
+/// A manifest list includes summary metadata that can be used to avoid
+/// scanning all of the manifests in a snapshot when planning a table scan.
+/// This includes the number of added, existing, and deleted files, and a
+/// summary of values for each field of the partition spec used to write the
+/// manifest.
+#[derive(Debug, Clone)]
+pub struct ManifestList {
+/// Entries in a manifest list.
+entries: Vec,
+}
+
+impl ManifestList {
+/// Parse manifest list from bytes.
+///
+/// QUESTION: Will we have more than one manifest list in a single file?
+pub fn parse_with_version(
+bs: &[u8],
+version: FormatVersion,
+partition_type: &StructType,
+) -> Result {
+match version {
+FormatVersion::V2 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v2_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV2>(&values)?.try_into(partition_type)
+}
+FormatVersion::V1 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v1_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV1>(&values)?.try_into(partition_type)
+}
+}
+}
+
+/// Get the entries in the manifest list.
+pub fn entries(&self) -> &[ManifestListEntry] {
+&self.entries
+}
+
+const MANIFEST_PATH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+500,
+"manifest_path",
+super::Type::Primitive(super::PrimitiveType::String),
+))
+})
+};
+const MANIFEST_LENGTH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+501,
+"manifest_length",
+super::Type::Primitive(super::PrimitiveType::Long),
+))
+})
+};
+const PARTITION_SPEC_ID: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+502,
+"partition_spec_id",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const CONTENT: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+517,
+"content",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const SEQUENCE_NUMBER: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+515,
+"sequence_number",
+super::Type::Primitive(super::PrimitiveType::Long),
+))
+})
+};
+const MIN_SEQUENCE_NUMBER: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+516,
+"min_sequence_number",
+super::Type::Primitive(super::PrimitiveType::Long),
+))

[GitHub] [iceberg] andreacfm opened a new pull request, #8528: Schema Merge docs

2023-09-08 Thread via GitHub



andreacfm opened a new pull request, #8528:
URL: https://github.com/apache/iceberg/pull/8528

   Documentation about schemaMerge
   
   See #8005 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] zeddit commented on issue #8515: Python: Support vectorization read which improve read performance

2023-09-08 Thread via GitHub



zeddit commented on issue #8515:
URL: https://github.com/apache/iceberg/issues/8515#issuecomment-1711486546

   great thanks for your help. 
   I have tried a poc about `minio + hive metastore + iceberg`, and I am using 
`pyiceberg` to conduct some performance test.
   I have a poor performance about just reading data into my python 
environment. e.g. 
   1. a table which contains only 1 row and I read it out it takes me about 6 
seconds.
   2. a table with double data typle which has 100k rows and about 20 columns 
whose size is about 200MB. read it out needs about 40 seconds.
   
   I think it is a bit slow with a bandwidth about 5MB/s.
   I wonder if it is my problem so I want a benchmark results for comparison. 
great thanks.
  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] liurenjie1024 commented on a diff in pull request #57: Metadata integration tests

2023-09-08 Thread via GitHub



liurenjie1024 commented on code in PR #57:
URL: https://github.com/apache/iceberg-rust/pull/57#discussion_r1319737798


##
crates/iceberg/src/spec/table_metadata.rs:
##
@@ -346,21 +349,29 @@ pub(super) mod _serde {
 } else {
 value.current_snapshot_id
 };
+let schemas = HashMap::from_iter(
+value
+.schemas
+.into_iter()
+.map(|schema| Ok((schema.schema_id, 
Arc::new(schema.try_into()?
+.collect::, Error>>()?,
+);
 Ok(TableMetadata {
 format_version: FormatVersion::V2,
 table_uuid: value.table_uuid,
 location: value.location,
 last_sequence_number: value.last_sequence_number,
 last_updated_ms: value.last_updated_ms,
 last_column_id: value.last_column_id,
-schemas: HashMap::from_iter(
-value
-.schemas
-.into_iter()
-.map(|schema| Ok((schema.schema_id, 
Arc::new(schema.try_into()?
-.collect::, Error>>()?,
-),
-current_schema_id: value.current_schema_id,
+current_schema_id: if 
schemas.keys().contains(&value.current_schema_id) {
+Ok(value.current_schema_id)
+} else {
+Err(self::Error::new(
+ErrorKind::DataInvalid,
+"No schema exists with the current schema id.",

Review Comment:
   ```suggestion
   format!("No schema exists with the current schema 
id: {}", *value.current_schema_id),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] liurenjie1024 commented on pull request #56: feat: support read Manifest List

2023-09-08 Thread via GitHub



liurenjie1024 commented on PR #56:
URL: https://github.com/apache/iceberg-rust/pull/56#issuecomment-1711520015

   > And I find some place is inconsistent with spec.
   > 
   > 
   > 
https://iceberg.apache.org/spec/#manifests:~:text=504-,added_files_count,-int 
In partice, this field in avro is **added_data_files_count** same thing exist 
in: existing_files_count, deleted_files_count
   > 
   > 
   > > [Optional fields, **array elements**, and map values must be wrapped in 
an Avro union with null. This is the only union type allowed in Iceberg data 
files.](https://iceberg.apache.org/spec/#avro:~:text=Optional%20fields%2C%20array%20elements%2C%20and%20map%20values%20must%20be%20wrapped%20in%20an%20Avro%20union%20with%20null.%20This%20is%20the%20only%20union%20type%20allowed%20in%20Iceberg%20data%20files.)
   > 
   > ```
   > manifest_list:
   >partitions: `list<508: field_summary>`
   > ```
   > 
   > Actually this field_summary field is not a optional value.
   
   How about submitting fix to iceberg-docs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-rust] liurenjie1024 commented on a diff in pull request #56: feat: support read Manifest List

2023-09-08 Thread via GitHub



liurenjie1024 commented on code in PR #56:
URL: https://github.com/apache/iceberg-rust/pull/56#discussion_r1319745001


##
crates/iceberg/src/avro/mod.rs:
##
@@ -18,3 +18,4 @@
 //! Avro related codes.
 #[allow(dead_code)]
 mod schema;
+pub use schema::*;

Review Comment:
   ```suggestion
   pub(crate) use schema::*;
   ```
   
   Avro schema is not intended for external users.



##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -0,0 +1,881 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! ManifestList for Iceberg.
+
+use crate::{avro::schema_to_avro_schema, spec::Literal, Error};
+use apache_avro::{from_value, types::Value, Reader};
+use once_cell::sync::Lazy;
+use std::sync::Arc;
+
+use super::{FormatVersion, ListType, NestedField, NestedFieldRef, Schema, 
StructType};
+
+/// Snapshots are embedded in table metadata, but the list of manifests for a
+/// snapshot are stored in a separate manifest list file.
+///
+/// A new manifest list is written for each attempt to commit a snapshot
+/// because the list of manifests always changes to produce a new snapshot.
+/// When a manifest list is written, the (optimistic) sequence number of the
+/// snapshot is written for all new manifest files tracked by the list.
+///
+/// A manifest list includes summary metadata that can be used to avoid
+/// scanning all of the manifests in a snapshot when planning a table scan.
+/// This includes the number of added, existing, and deleted files, and a
+/// summary of values for each field of the partition spec used to write the
+/// manifest.
+#[derive(Debug, Clone)]
+pub struct ManifestList {
+/// Entries in a manifest list.
+entries: Vec,
+}
+
+impl ManifestList {
+/// Parse manifest list from bytes.
+///
+/// QUESTION: Will we have more than one manifest list in a single file?
+pub fn parse_with_version(
+bs: &[u8],
+version: FormatVersion,
+partition_type: &StructType,
+) -> Result {
+match version {
+FormatVersion::V2 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v2_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV2>(&values)?.try_into(partition_type)
+}
+FormatVersion::V1 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v1_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV1>(&values)?.try_into(partition_type)
+}
+}
+}
+
+/// Get the entries in the manifest list.
+pub fn entries(&self) -> &[ManifestListEntry] {
+&self.entries
+}
+
+const MANIFEST_PATH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+500,
+"manifest_path",
+super::Type::Primitive(super::PrimitiveType::String),
+))
+})
+};
+const MANIFEST_LENGTH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+501,
+"manifest_length",
+super::Type::Primitive(super::PrimitiveType::Long),
+))
+})
+};
+const PARTITION_SPEC_ID: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+502,
+"partition_spec_id",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const CONTENT: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+517,
+"content",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const SEQUENCE_NUMBER: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+515,
+"sequence_number",
+super::Type::Primitive(super::PrimitiveType::Long),
+

[GitHub] [iceberg] xuqi1633 commented on issue #3028: i can't import class which start with org.apache.iceberg.relocated

2023-09-08 Thread via GitHub



xuqi1633 commented on issue #3028:
URL: https://github.com/apache/iceberg/issues/3028#issuecomment-1711568897

   After compiling the project, a relocated guava jar file will be generated 
under the bundled-guava module
   ```
   ./gradlew clean build -x test -x javadoc -x integrationTest
   ```
   
![image](https://github.com/apache/iceberg/assets/70441327/51c2d53f-c50a-45ce-91eb-c44f9e03d2bb)
   
![image](https://github.com/apache/iceberg/assets/70441327/b75e709c-2238-4a91-b6f6-68cf28ac1fd8)
   
   If other modules also rely on the bundled-guava module, add the 
iceberg-bundled-guava-1.4.0-SNAPSHOT libraries to the module
   
![image](https://github.com/apache/iceberg/assets/70441327/54bbd4fa-5265-41e6-9a0a-15242c9225f5)
   
   after adding libraries
   
![image](https://github.com/apache/iceberg/assets/70441327/3eaa6bf6-bc94-4cd4-89ac-99fabd6e356f)
   
   @ahmedriza 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319799186


##
format/spec.md:
##
@@ -702,6 +703,41 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). Partition statistics are informational. A 
reader can choose to
+ignore partition statistics information. Partition statistics support is not 
required to read the table correctly.
+Each table snapshot may be associated with at most one partition statistic 
file and the table can contain many partition statistics files associated with 
different table snapshots.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be accurate and must be registered in the table metadata file to be 
considered as a valid statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:
+
+| v1 | v2 | Field name | Type | Description |
+||||--|-|
+| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg 
table's snapshot the partition statistics file is associated with. |
+| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of 
the partition statistics file. See [Partition Statistics 
file](#partition-statistics-file). |
+| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum 
data sequence number of the Iceberg table's snapshot the partition statistics 
was computed from. |
+
+ Partition Statistics file
+
+Statistics information for every partition tuple is stored as a row in the 
**table default format**.
+These rows are sorted (in ascending manner with NULL FIRST) based on all 
partition columns from `partition` in the same order

Review Comment:
   added a detailed note.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1711585128

   @RussellSpitzer, @flyrain, @szehon-ho, @rdblue: I have addressed the new 
suggestions. Please approve the PR if it is ok or comment more if we need 
further changes. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319801698


##
format/spec.md:
##
@@ -702,6 +703,41 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). Partition statistics are informational. A 
reader can choose to

Review Comment:
   Simplified



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319802967


##
format/spec.md:
##
@@ -702,6 +703,41 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). Partition statistics are informational. A 
reader can choose to
+ignore partition statistics information. Partition statistics support is not 
required to read the table correctly.
+Each table snapshot may be associated with at most one partition statistic 
file and the table can contain many partition statistics files associated with 
different table snapshots.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be accurate and must be registered in the table metadata file to be 
considered as a valid statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:
+
+| v1 | v2 | Field name | Type | Description |
+||||--|-|
+| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg 
table's snapshot the partition statistics file is associated with. |
+| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of 
the partition statistics file. See [Partition Statistics 
file](#partition-statistics-file). |
+| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum 
data sequence number of the Iceberg table's snapshot the partition statistics 
was computed from. |
+
+ Partition Statistics file
+
+Statistics information for every partition tuple is stored as a row in the 
**table default format**.
+These rows are sorted (in ascending manner with NULL FIRST) based on all 
partition columns from `partition` in the same order
+to optimize filtering rows while scanning.
+Each unique partition tuple must have exactly one corresponding row, ensuring 
that statistics for all partitions are present.
+
+A partition statistics file stores statistics as a struct with the following 
fields:

Review Comment:
   ok. Changed to 
   
   `Statistics information for each unique partition tuple is stored as a row 
in the default data file format of the table (for example, Parquet or ORC).`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] juanrondineau commented on issue #8333: Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.la

2023-09-08 Thread via GitHub



juanrondineau commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1711594714

   @chandu-1101 , thanks for your welcome
   i share 2 printscreens 
   the first simulate on a dbeaver session connected to spark the operations 
that dbt internaly executes, in this case dbt creates a temporary view from a 
select over the table where we look for new data. Then when it tries to merge 
new data to destiny we got the cast exception.
   
![image](https://github.com/apache/iceberg/assets/40765812/b52fea49-f06b-46e1-92d2-c7b574c5b9ba)
   
   in the second printscreen whe change the create temporary view for a create 
table sentence and then we save the exception and the merge operation works fine
   
![image](https://github.com/apache/iceberg/assets/40765812/b772cad0-1c25-457e-acb9-9f4f4cf21ba5)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #8528: Schema Merge docs

2023-09-08 Thread via GitHub



RussellSpitzer commented on code in PR #8528:
URL: https://github.com/apache/iceberg/pull/8528#discussion_r1319920277


##
docs/spark-writes.md:
##
@@ -313,6 +313,22 @@ data.writeTo("prod.db.table")
 .createOrReplace()
 ```
 
+### Schema Merge
+
+Iceberg support dynamic `schemaMerge` at writing time. The table must be 
configured to accept any schema.

Review Comment:
   We should probably explain what this means without reusing the same name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



szehon-ho commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319945486


##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:
+
+| v1 | v2 | Field name | Type | Description |
+||||--|-|
+| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg 
table's snapshot the partition statistics file is associated with. |
+| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of 
the partition statistics file. See [Partition Statistics 
file](#partition-statistics-file). |
+| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum 
data sequence number of the Iceberg table's snapshot the partition statistics 
was computed from. |
+
+ Partition Statistics file
+
+Statistics information for each unique partition tuple is stored as a row in 
the default data file format of the table (for example, Parquet or ORC).
+These rows are sorted (in ascending manner with NULL FIRST) based on all 
partition columns from `partition` in the same order

Review Comment:
   Nit: can we simplify to just
   
   `These rows must be sorted (in ascending manner with NULL FIRST) by 
partition to optimize...` ?



##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:

Review Comment:
   Nit: does not make too much sense, does this suffice?
   
   `Partition statistics files contain a struct `partition-statistics' with the 
following fields`



##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,

Review Comment:
   Nit: I am not too sure these two sentences add much value, it is the case 
for any file reference in Iceberg , isnt it?
   
 
   ```A writer can optionally write the partition statistics file during each 
write operation. If the statistics file is written for the specific snapshot, 
it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



szehon-ho commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1319942204


##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,

Review Comment:
   Just my opinion, I am not too sure these two sentences add much value, it is 
the case for any file reference in Iceberg , isnt it?
   
 
   ```A writer can optionally write the partition statistics file during each 
write operation. If the statistics file is written for the specific snapshot, 
it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-docs] amogh-jahagirdar merged pull request #274: Update vendors.md

2023-09-08 Thread via GitHub



amogh-jahagirdar merged PR #274:
URL: https://github.com/apache/iceberg-docs/pull/274


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1320008836


##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:

Review Comment:
   I copied from puffin statistics file statements few lines above. 
   
   changed it to
   
   `partition-statistics` field of table metadata is an optional list of struct 
with the following fields:
   



##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,

Review Comment:
   I have shortened it a bit. Even though it seems implicit, It links back to 
how it is tracked and when it is valid. I remember getting some comment to add 
this statement. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #7105: Spec: Add partition stats spec

2023-09-08 Thread via GitHub



ajantha-bhat commented on code in PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#discussion_r1320009356


##
format/spec.md:
##
@@ -702,6 +703,49 @@ Blob metadata is a struct with the following fields:
 | _optional_ | _optional_ | **`properties`** | `map` | 
Additional properties associated with the statistic. Subset of Blob properties 
in the Puffin file. |
 
 
+ Partition statistics
+
+Partition statistics files are based on [Partition Statistics file 
spec](#partition-statistics-file). 
+Partition statistics are not required for reading or planning and readers may 
ignore them.
+Each table snapshot may be associated with at most one partition statistic 
file.
+A writer can optionally write the partition statistics file during each write 
operation. If the statistics file is written for the specific snapshot,
+it must be registered in the table metadata file to be considered as a valid 
statistics file for the reader.
+
+Partition statistics files metadata within `partition-statistics` table 
metadata field is a struct with the following fields:
+
+| v1 | v2 | Field name | Type | Description |
+||||--|-|
+| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg 
table's snapshot the partition statistics file is associated with. |
+| _required_ | _required_ | **`statistics-file-path`** | `string` | Path of 
the partition statistics file. See [Partition Statistics 
file](#partition-statistics-file). |
+| _required_ | _required_ | **`max-data-sequence-number`** | `long` | Maximum 
data sequence number of the Iceberg table's snapshot the partition statistics 
was computed from. |
+
+ Partition Statistics file
+
+Statistics information for each unique partition tuple is stored as a row in 
the default data file format of the table (for example, Parquet or ORC).
+These rows are sorted (in ascending manner with NULL FIRST) based on all 
partition columns from `partition` in the same order

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar commented on pull request #8491: Python: Improved Readability and Alignment of Regex Patterns

2023-09-08 Thread via GitHub



amogh-jahagirdar commented on PR #8491:
URL: https://github.com/apache/iceberg/pull/8491#issuecomment-1711851893

   @hiteshbedre Since this is more of a cleanup, I'll merge after the checks 
pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-go] delaneyj opened a new issue, #4: Implementations?

2023-09-08 Thread via GitHub



delaneyj opened a new issue, #4:
URL: https://github.com/apache/iceberg-go/issues/4

   ### Question
   
   Iceberg has subprojects targetting arrow/orc/parquet/etc.  Is there plans to 
have adapters be part of this repo?
   
   Are there plans to have interfaces for `SchemaToDatastore`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar merged pull request #8491: Python: Improved Readability and Alignment of Regex Patterns

2023-09-08 Thread via GitHub



amogh-jahagirdar merged PR #8491:
URL: https://github.com/apache/iceberg/pull/8491


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] amogh-jahagirdar commented on pull request #8491: Python: Improved Readability and Alignment of Regex Patterns

2023-09-08 Thread via GitHub



amogh-jahagirdar commented on PR #8491:
URL: https://github.com/apache/iceberg/pull/8491#issuecomment-1711878925

   Thanks for the contribution @hiteshbedre !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-docs] melvynator opened a new pull request, #275: Update vendors.md

2023-09-08 Thread via GitHub



melvynator opened a new pull request, #275:
URL: https://github.com/apache/iceberg-docs/pull/275

   Fixed a typo


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-go] zeroshade commented on issue #4: Implementations?

2023-09-08 Thread via GitHub



zeroshade commented on issue #4:
URL: https://github.com/apache/iceberg-go/issues/4#issuecomment-1711921361

   I plan on supporting Arrow, Parquet, Avro and Orc in this repo as much as I 
can. 
   
   That said, I'm not familiar with `SchemaToDatastore`, but I want to support 
as much as possible in this library.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg-go] delaneyj commented on issue #4: Implementations?

2023-09-08 Thread via GitHub



delaneyj commented on issue #4:
URL: https://github.com/apache/iceberg-go/issues/4#issuecomment-1712023219

   Oh its not a library, I meant include an interface to be able to plugin any 
of these options or others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] kunal-nandwana opened a new issue, #5556: Feature Request: Support mergeSchema option when using Spark MERGE INTO

2023-09-08 Thread via GitHub



kunal-nandwana opened a new issue, #5556:
URL: https://github.com/apache/iceberg/issues/5556

   ### Feature Request / Improvement
   
   Hi Team,
   I am using Iceberg in my project and I found a big thing which is missing 
from Iceberg which is easily available in Apache Hudi and Deltalake that is 
"merge schema". If possible this feature need to added into the Iceberg. I am 
attaching my last ticket which is explaining the problem that I am 
facing.Please find the below ticket for the refrence.
   [https://github.com/apache/iceberg/issues/5548](#5548)
   
   @rdblue any thoughts on this?
   
   
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] vinitamaloo-asu commented on issue #2442: cannot insert value in hive command shell

2023-09-08 Thread via GitHub



vinitamaloo-asu commented on issue #2442:
URL: https://github.com/apache/iceberg/issues/2442#issuecomment-1712329399

   I created a new catalog "iceberg_catalog" using spark config like below:
   `.set("spark.sql.catalog.iceberg_catalog", 
"org.apache.iceberg.spark.SparkCatalog")
   .set("spark.sql.catalog.iceberg_catalog.type", "hive")`
   
   Now to create iceberg tables, I also initialized a hive catalog with the 
same catalog name and properties which is redundant.
   
   
   `val catalog = new HiveCatalog()
   catalog.setConf(conf)
   catalog.initialize(
"iceberg_catalog",
 JavaConverters.mapAsJavaMap(Map(
   CatalogProperties.CATALOG_IMPL -> 
"org.apache.iceberg.hive.HiveCatalog",
   CatalogProperties.URI -> "thrift://localhost:9083",
   CatalogProperties.WAREHOUSE_LOCATION -> warehouseUri
 ))`
   
   
   Is there a way to get the previously initialized catalog with spark conf?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] vinitamaloo-asu opened a new issue, #8529: CASCADE WITH Drop Namespace Gives exception

2023-09-08 Thread via GitHub



vinitamaloo-asu opened a new issue, #8529:
URL: https://github.com/apache/iceberg/issues/8529

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Running this command with: 
   `spark.sql(DROP DATABASE IF EXISTS dbname CASCADE)`
   
   Gives the below exception:
   `org.apache.iceberg.exceptions.NamespaceNotEmptyException: Namespace dbname 
is not empty. One or more tables exist.
at 
org.apache.iceberg.hive.HiveCatalog.dropNamespace(HiveCatalog.java:353)
at 
org.apache.iceberg.spark.SparkCatalog.dropNamespace(SparkCatalog.java:447)
at 
org.apache.spark.sql.execution.datasources.v2.DropNamespaceExec.run(DropNamespaceExec.scala:52)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)`
   
   
   The expectation is that with CASCADE specified, command should delete all 
tables and the db itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] closed issue #6914: change partition led to query bug

2023-09-08 Thread via GitHub



github-actions[bot] closed issue #6914: change partition led to query bug
URL: https://github.com/apache/iceberg/issues/6914


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [iceberg] github-actions[bot] commented on issue #6914: change partition led to query bug

2023-09-08 Thread via GitHub



github-actions[bot] commented on issue #6914:
URL: https://github.com/apache/iceberg/issues/6914#issuecomment-1712351531

   This issue has been closed because it has not received any activity in the 
last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

37 matches

Mail list logo