[PR] feat: glue table creation with some docs on testing [iceberg-go]

2024-02-02 Thread via GitHub
wolfeidau opened a new pull request, #59: URL: https://github.com/apache/iceberg-go/pull/59 This change adds table creation which for the most part replicates the functionality in the iceberg python library. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
wooyeong commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1475705807 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -405,15 +407,18 @@ public boolean equals(Object other) { return false;

Re: [PR] Core: Support IncrementalChangelogScan with deletes. [iceberg]

2024-02-02 Thread via GitHub
manuzhang commented on PR #6182: URL: https://github.com/apache/iceberg/pull/6182#issuecomment-1923299165 @Reo-LEI and @aokolnychyi, are you still working on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475725464 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -85,34 +83,26 @@ public static Object[][] parameter

Re: [PR] Flink: backport #9381 to 1.17 and 1.16 for Migrate subclasses of FlinkCatalogTestBase to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
pvary merged PR #9598: URL: https://github.com/apache/iceberg/pull/9598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475738347 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/ExtensionsTestBase.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475725464 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -85,34 +83,26 @@ public static Object[][] parameter

Re: [PR] Spark 3.4: Bypass Spark's ViewCatalog API when replacing a view [iceberg]

2024-02-02 Thread via GitHub
nastra merged PR #9614: URL: https://github.com/apache/iceberg/pull/9614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] only trim slash when warehouse location is not root path [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9619: URL: https://github.com/apache/iceberg/pull/9619#discussion_r1475739131 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java: ## @@ -108,7 +108,10 @@ public void initialize(String name, Map properties) { "Cannot initi

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475743107 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -171,19 +161,19 @@ public void addDataUnpartitionedOrc

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475743885 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -485,14 +475,14 @@ public void addPartitionToPartitio

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on PR #9613: URL: https://github.com/apache/iceberg/pull/9613#issuecomment-1923353317 please also add the below diff to `iceberg-spark-extensions` so that JUnit5 tests are properly executed ``` diff --git a/spark/v3.5/build.gradle b/spark/v3.5/build.gradle index eee

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475750608 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -485,14 +475,14 @@ public void addPartitionToParti

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475752328 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -48,18 +49,15 @@ import org.apache.spark.sql.types.St

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475754953 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -85,34 +83,26 @@ public static Object[][] parameters()

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on code in PR #9613: URL: https://github.com/apache/iceberg/pull/9613#discussion_r1475762390 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java: ## @@ -48,18 +49,15 @@ import org.apache.spark.sql.types

Re: [I] It's not possible to readStream from an Iceberg table as source when its snapshots expire [iceberg]

2024-02-02 Thread via GitHub
elopezal commented on issue #9504: URL: https://github.com/apache/iceberg/issues/9504#issuecomment-1923376536 hello @methiakshit-plutoflume thanks for your answer comparing with the offsets of kafka this situation would be like if there would be a problem if some kafka events correctly pr

Re: [PR] feat: add parquet writer [iceberg-rust]

2024-02-02 Thread via GitHub
ZENOTME commented on code in PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1475772911 ## crates/iceberg/src/io.rs: ## @@ -268,6 +268,16 @@ impl OutputFile { .await?) } +/// Delete file. +pub async fn delete(&self) -> Result

Re: [PR] feat: add parquet writer [iceberg-rust]

2024-02-02 Thread via GitHub
ZENOTME commented on code in PR #176: URL: https://github.com/apache/iceberg-rust/pull/176#discussion_r1475773886 ## crates/iceberg/src/writer/file_writer/location_generator.rs: ## @@ -0,0 +1,251 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475818070 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -80,10 +83,11 @@ public class JdbcCatalog extends BaseMetastoreCatalog private final Function,

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
tomtongue commented on PR #9613: URL: https://github.com/apache/iceberg/pull/9613#issuecomment-1923454127 @nastra thanks a lot for the review. I will create a new PR to move other classes in the extension after this. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475818719 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -245,13 +286,18 @@ public List listTables(Namespace namespace) { row -> Jd

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475821544 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.name

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475826840 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java: ## @@ -182,18 +169,13 @@ private void createTable(String newMetadataLocation) throws SQLExcept

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475829107 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java: ## @@ -71,7 +68,7 @@ public void doRefresh() { Map table; try { - table = getT

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475829647 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -303,7 +330,152 @@ public static Properties filterAndRemovePrefix(Map properties, S return res

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475830202 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcViewOperations.java: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [I] Remove `unwrap()` in `ManifestListWriter.close()` [iceberg-rust]

2024-02-02 Thread via GitHub
Fokko closed issue #177: Remove `unwrap()` in `ManifestListWriter.close()` URL: https://github.com/apache/iceberg-rust/issues/177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: Replace unwrap [iceberg-rust]

2024-02-02 Thread via GitHub
Fokko merged PR #183: URL: https://github.com/apache/iceberg-rust/pull/183 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Create ExtensionTestBase for migration to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra merged PR #9613: URL: https://github.com/apache/iceberg/pull/9613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] `pyiceberg.io.pyarrow.write_file` does not take into account compression settings [iceberg-python]

2024-02-02 Thread via GitHub
Fokko commented on issue #345: URL: https://github.com/apache/iceberg-python/issues/345#issuecomment-1923541517 @jonashaag Sure thing! This is where we need to pass in the configuration: https://github.com/apache/iceberg-python/blob/a4856bc2eadf90ac85dec96d4502ca3517bb1bb5/pyiceberg/i

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
wooyeong commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1475886118 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -405,15 +407,18 @@ public boolean equals(Object other) { return false;

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1475897611 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEagerly

Re: [PR] Improve error message in case of a mismatch [iceberg-python]

2024-02-02 Thread via GitHub
Fokko commented on code in PR #352: URL: https://github.com/apache/iceberg-python/pull/352#discussion_r1475909693 ## pyiceberg/schema.py: ## @@ -221,6 +227,11 @@ def find_type(self, name_or_id: Union[str, int], case_sensitive: bool = True) -> def highest_field_id(self) ->

Re: [PR] Improve error message in case of a mismatch [iceberg-python]

2024-02-02 Thread via GitHub
Fokko commented on code in PR #352: URL: https://github.com/apache/iceberg-python/pull/352#discussion_r1475910255 ## pyiceberg/table/__init__.py: ## @@ -133,6 +132,43 @@ _JAVA_LONG_MAX = 9223372036854775807 +def _check_schema(table_schema: Schema, other_schema: "pa.Schema")

[PR] Core: Add strictness flag to prevent loss of view representation when replacing a view [iceberg]

2024-02-02 Thread via GitHub
nastra opened a new pull request, #9620: URL: https://github.com/apache/iceberg/pull/9620 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [I] Exception occurred while writing to Iceberg tables by 'INSERT OVERWRITE' [iceberg]

2024-02-02 Thread via GitHub
rokity commented on issue #5384: URL: https://github.com/apache/iceberg/issues/5384#issuecomment-1923639270 @sanromeo @MoveLiu Before to create the tempview in pyspark we must to sort the dataframe by partitions specified on sql query. ```python df = df.repartition(*prt_cols).sortWi

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-02 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1476118577 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,23 +341,58 @@ Path versionHintFile() { return metadataPath(Util.VERSION_HINT

Re: [PR] Core: HadoopTable needs to skip file cleanup after task failure under some boundary conditions. [iceberg]

2024-02-02 Thread via GitHub
BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1476118577 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ## @@ -289,23 +341,58 @@ Path versionHintFile() { return metadataPath(Util.VERSION_HINT

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
wooyeong commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1476136487 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEager

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
nastra commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1924010269 > Are you suggesting adding a new API FileScanTask#type()? I was suggesting an enum type at the JSON level, not at the API level (similar to how it's done for `ReportMetricsRequ

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-02-02 Thread via GitHub
wooyeong commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1476145477 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean refreshEager

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1476145516 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.namesp

Re: [PR] Partition Evolution [iceberg-python]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1476173833 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2331,242 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Spark: Bypass Spark's ViewCatalog API when replacing a view [iceberg]

2024-02-02 Thread via GitHub
nastra commented on code in PR #9596: URL: https://github.com/apache/iceberg/pull/9596#discussion_r1476195226 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -608,6 +608,53 @@ public View createView( "Creating a view is not supported

Re: [PR] Partition Evolution [iceberg-python]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #245: URL: https://github.com/apache/iceberg-python/pull/245#discussion_r1476195413 ## pyiceberg/table/__init__.py: ## @@ -2271,3 +2331,242 @@ def commit(self) -> Snapshot: ) return snapshot + + +class UpdateSpec:

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1476197844 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.name

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
stevenzwu commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1924110032 @nastra `ReportMetricsRequest` has the type at API level. ``` ReportType reportType(); ``` -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924154815 that's a great idea! I'm thinking of adding this in [Getting started with PyIceberg](https://github.com/apache/iceberg-python/blob/main/mkdocs/docs/index.md) WDYT?

Re: [I] Iceberg with Glue Catalog updates glue table version on every commit, but there's a maximum of 100,000 versions [iceberg]

2024-02-02 Thread via GitHub
idrissa-mgs commented on issue #5965: URL: https://github.com/apache/iceberg/issues/5965#issuecomment-1924162113 @vshel Did you finally find a long term solution rather than asking for an increase of the aws soft limit on tables versionning ? Did the skipArchive flag help ? -- This is

Re: [I] What is Table Identifier? [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #341: URL: https://github.com/apache/iceberg-python/issues/341#issuecomment-1924165256 So if I understand correctly, TableIdentifier consists of the namespace and the table name. Namespace can be multiple parts, for example ("com"."apache"."iceberg") or "com.ap

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476250156 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/IcebergWriter.java: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476258875 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/IcebergWriterFactory.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Spark: Fix CREATE OR REPLACE VIEW when view doesn't exist [iceberg]

2024-02-02 Thread via GitHub
rdblue merged PR #9621: URL: https://github.com/apache/iceberg/pull/9621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue closed pull request #6981: Push down group by for partition columns URL: https://github.com/apache/iceberg/pull/6981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on PR #6981: URL: https://github.com/apache/iceberg/pull/6981#issuecomment-1924191483 @amogh-jahagirdar, can you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
huaxingao opened a new pull request, #6981: URL: https://github.com/apache/iceberg/pull/6981 Push down min/max/count with group by if group by is on partition columns For example: ``` CREATE TABLE test (id LONG, ts TIMESTAMP, data INT) USING iceberg PARTITIONED BY (id, ts); S

Re: [PR] Push down group by for partition columns [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on PR #6981: URL: https://github.com/apache/iceberg/pull/6981#issuecomment-1924192337 Oops, I didn't mean to close this! I want to work on getting it in next -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476264616 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
bitsondatadev commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924196245 Yeah I want to sprinkle this literally everywhere in the docs so please go for it. I think this will be my preferred way of teaching Iceberg. -- This is an automated mes

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476270090 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add example of using PyIceberg with minimal external dependencies [iceberg-python]

2024-02-02 Thread via GitHub
kevinjqliu commented on issue #326: URL: https://github.com/apache/iceberg-python/issues/326#issuecomment-1924204635 ++, this lowers the barrier to entry by a lot. It's a lot of work to spin up docker/s3/minio integration and hive 😱! -- This is an automated message from the Apache Git Ser

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476276022 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476280534 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476285119 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476289111 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] partitioned write support [iceberg-python]

2024-02-02 Thread via GitHub
syun64 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1476291241 ## pyiceberg/table/__init__.py: ## @@ -2467,3 +2462,131 @@ def commit(self) -> Snapshot: ) return snapshot + + +@dataclass(frozen=True) +cla

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476307388 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476325368 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924284690 > I mean we may need an extra layer for task scheduling, so that we can be adopted to any async runtime such as tokio, async-std. Do you want users to choose their own runtime

Re: [I] Core: complete FileScanTaskParser for other FileScanTask implementation classes (like StaticDataTask) [iceberg]

2024-02-02 Thread via GitHub
nastra commented on issue #9597: URL: https://github.com/apache/iceberg/issues/9597#issuecomment-1924328488 @stevenzwu this is only because `ReportMetricsRequest` is a REST request class for a `MetricsReport`. So in the case of this issue here we'd define the enum type at the JSON level in

Re: [PR] Kafka Connect: Sink connector with data writers and converters [iceberg]

2024-02-02 Thread via GitHub
bryanck commented on code in PR #9466: URL: https://github.com/apache/iceberg/pull/9466#discussion_r1476417816 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/SchemaUtils.java: ## @@ -0,0 +1,375 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] [WIP] Migrate SparkExtensions sub-classes to JUnit5 [iceberg]

2024-02-02 Thread via GitHub
nastra commented on PR #9624: URL: https://github.com/apache/iceberg/pull/9624#issuecomment-1924340442 > Will add all sub-classes in this PR depending on the size of the diff it's also fine to split this into 2-3 PRs. You could probably start within a specific package and combine subc

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476443719 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475821544 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -503,6 +550,84 @@ public boolean namespaceExists(Namespace namespace) { return JdbcUtil.name

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on code in PR #9487: URL: https://github.com/apache/iceberg/pull/9487#discussion_r1475826840 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcTableOperations.java: ## @@ -182,18 +169,13 @@ private void createTable(String newMetadataLocation) throws SQLExcept

Re: [I] Add View Support to Spark [iceberg]

2024-02-02 Thread via GitHub
rdblue closed issue #7938: Add View Support to Spark URL: https://github.com/apache/iceberg/issues/7938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Add View Support to Spark [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on issue #7938: URL: https://github.com/apache/iceberg/issues/7938#issuecomment-1924377622 I think we can call this one done with all of the PRs from @nastra that we've merged lately. Thanks @jzhuge and @nastra for getting this ready! -- This is an automated message from

Re: [PR] Core: Add view support for JDBC catalog [iceberg]

2024-02-02 Thread via GitHub
jbonofre commented on PR #9487: URL: https://github.com/apache/iceberg/pull/9487#issuecomment-1924405491 @nastra @danielcweeks @rdblue I updated the PR. You can already do a new round (I'm checking a couple of stuff but ready to review already). -- This is an automated message from the Ap

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
szehon-ho commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476484200 ## spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/TestSparkExecutorCache.java: ## @@ -0,0 +1,444 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] partitioned write support [iceberg-python]

2024-02-02 Thread via GitHub
jqin61 commented on code in PR #353: URL: https://github.com/apache/iceberg-python/pull/353#discussion_r1476487793 ## pyiceberg/table/__init__.py: ## @@ -2467,3 +2462,131 @@ def commit(self) -> Snapshot: ) return snapshot + + +@dataclass(frozen=True) +cla

Re: [PR] Flink: change defaultFlinkVersion back to 1.18 [iceberg]

2024-02-02 Thread via GitHub
pvary merged PR #9625: URL: https://github.com/apache/iceberg/pull/9625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[PR] Flink: backport #9547 to 1.17 and 1.16 for Adds the ability to read from a branch on the Flink Iceberg Source [iceberg]

2024-02-02 Thread via GitHub
rodmeneses opened a new pull request, #9627: URL: https://github.com/apache/iceberg/pull/9627 1.17 came out clean 1.16 came out clean, after adding couple extra lines on `TestStreamScanSql` -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Spark 3.4, 3.5: Use ProcedureInput for RewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho merged PR #8583: URL: https://github.com/apache/iceberg/pull/8583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

Re: [I] Switch to using ProcedureInput for rewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho closed issue #8582: Switch to using ProcedureInput for rewriteDataFiles URL: https://github.com/apache/iceberg/issues/8582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Spark 3.4, 3.5: Use ProcedureInput for RewriteDataFiles [iceberg]

2024-02-02 Thread via GitHub
szehon-ho commented on PR #8583: URL: https://github.com/apache/iceberg/pull/8583#issuecomment-1924504708 Merged, thanks @dramaticlly , and thanks all for additional reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476567943 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -279,5 +284,29 @@ protected void markRowDeleted(InternalRow row) {

Re: [PR] Spark 3.4: Read deletes in parallel and cache them on executors [iceberg]

2024-02-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #9603: URL: https://github.com/apache/iceberg/pull/9603#discussion_r1476567943 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java: ## @@ -279,5 +284,29 @@ protected void markRowDeleted(InternalRow row) {

Re: [I] add type: Timestamp with nanosecond units [iceberg]

2024-02-02 Thread via GitHub
jacobmarble commented on issue #8657: URL: https://github.com/apache/iceberg/issues/8657#issuecomment-1924559864 Maintainers: please also assign @epgif -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Add runtime module to enable concurrent load of manifest files. [iceberg-rust]

2024-02-02 Thread via GitHub
odysa commented on issue #124: URL: https://github.com/apache/iceberg-rust/issues/124#issuecomment-1924672646 > Do you want users to choose their own runtime like [sqlx](https://github.com/launchbadge/sqlx/tree/main?rgh-link-date=2024-02-02T17%3A02%3A32Z#install)? They are building an abstr

[PR] Add Daft examples and code into PyIceberg docs and Table [iceberg-python]

2024-02-02 Thread via GitHub
jaychia opened a new pull request, #355: URL: https://github.com/apache/iceberg-python/pull/355 1. Adds a new optional installation arg `daft`, so that `pip install pyiceberg[daft]` will pull Daft in as a dependency 2. Adds a new `Table.to_daft()` method to convert a table into a Daft da

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476795233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Spark 3.5: Support executor cache locality [iceberg]

2024-02-02 Thread via GitHub
aokolnychyi commented on code in PR #9563: URL: https://github.com/apache/iceberg/pull/9563#discussion_r1476795233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java: ## @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (AS

[PR] Remove publish directive from .asf.yaml [iceberg-docs]

2024-02-02 Thread via GitHub
bitsondatadev opened a new pull request, #309: URL: https://github.com/apache/iceberg-docs/pull/309 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476850002 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceberg-Da

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
rdblue commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476851450 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceberg-Da

Re: [PR] Remove publish directive from .asf.yaml [iceberg-docs]

2024-02-02 Thread via GitHub
danielcweeks merged PR #309: URL: https://github.com/apache/iceberg-docs/pull/309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ice

Re: [PR] Remove nightly and add .asf.yaml [iceberg]

2024-02-02 Thread via GitHub
danielcweeks merged PR #9622: URL: https://github.com/apache/iceberg/pull/9622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
danielcweeks commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476857183 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceb

Re: [PR] Add REST spec for data access mechanisms [iceberg]

2024-02-02 Thread via GitHub
danielcweeks commented on code in PR #9628: URL: https://github.com/apache/iceberg/pull/9628#discussion_r1476859199 ## open-api/rest-catalog-open-api.yaml: ## @@ -1453,6 +1456,23 @@ components: type: string example: "sales" +data-access: + name: X-Iceb

  1   2   >