[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036804341 ## data/src/test/java/org/apache/iceberg/data/orc/TestOrcRowIterator.java: ## @@ -74,7 +75,7 @@ public void writeFile() throws IOException { .schema(DATA_SC

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036806161 ## spark/v3.1/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteMergeInto.scala: ## @@ -228,13 +225,6 @@ case class RewriteMergeInto(spark:

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036809785 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -83,21 +83,6 @@ public static CloseableIterable markDeleted( }); } - /** - * Retur

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036810161 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -342,19 +335,6 @@ private TableProperties() {} public static final String MERGE_MODE = "write.mer

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036815990 ## core/src/main/java/org/apache/iceberg/LocationProviders.java: ## @@ -84,10 +84,7 @@ static class DefaultLocationProvider implements LocationProvider { private s

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036815756 ## core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java: ## @@ -758,20 +758,6 @@ protected Map summary() { return summaryBuilder.build(); } - /*

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1036817726 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReader.java: ## @@ -144,16 +143,16 @@ public void testInt96TimestampProducedBySparkIsRea

[GitHub] [iceberg] lichaohao opened a new issue, #6330: iceberg : format-version=2 , when the job is running (insert and update), can not execute rewrite small data file ?

2022-12-01 Thread GitBox
lichaohao opened a new issue, #6330: URL: https://github.com/apache/iceberg/issues/6330 ### Query engine iceberg:1.0.0 spark:3.2.0 flink:1.13.2 catalog:hive-catalog ### Question iceberg:1.0.0 spark:3.2.0 flink:1.13.2 catalog:hive-catalog s

[GitHub] [iceberg] SHuixo commented on issue #6330: iceberg : format-version=2 , when the job is running (insert and update), can not execute rewrite small data file ?

2022-12-01 Thread GitBox
SHuixo commented on issue #6330: URL: https://github.com/apache/iceberg/issues/6330#issuecomment-1333486031 This situation is the same as I have encountered #6104, and in the current version, there is no effective way to support compressing historical data files containing delete operations

[GitHub] [iceberg] ConeyLiu opened a new pull request, #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
ConeyLiu opened a new pull request, #6331: URL: https://github.com/apache/iceberg/pull/6331 This PR just ported #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [iceberg] chenjunjiedada commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
chenjunjiedada commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333509100 What about 1.16? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [iceberg] ajantha-bhat commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
ajantha-bhat commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333580498 > What about 1.16? I think flink-1.16 and spark-3.3 is handled in the original PR itself. https://github.com/apache/iceberg/pull/4627 -- This is an automated message from

[GitHub] [iceberg] ajantha-bhat commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
ajantha-bhat commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333582482 nit: we could at least split it into one spark and one flink PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [iceberg] nastra commented on issue #6318: executor logs ton of `INFO CodecPool: Got brand-new decompressor [.zstd]`

2022-12-01 Thread GitBox
nastra commented on issue #6318: URL: https://github.com/apache/iceberg/issues/6318#issuecomment-1333600955 I've looked at the code and this happened even before #5681. The decompressors are actually being re-used, but not across different Parquet Files. So this means in your case you have

[GitHub] [iceberg] camper42 closed issue #6318: executor logs ton of `INFO CodecPool: Got brand-new decompressor [.zstd]`

2022-12-01 Thread GitBox
camper42 closed issue #6318: executor logs ton of `INFO CodecPool: Got brand-new decompressor [.zstd]` URL: https://github.com/apache/iceberg/issues/6318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [iceberg] camper42 commented on issue #6318: executor logs ton of `INFO CodecPool: Got brand-new decompressor [.zstd]`

2022-12-01 Thread GitBox
camper42 commented on issue #6318: URL: https://github.com/apache/iceberg/issues/6318#issuecomment-1333612142 thx, so for my scenario, I may need to compact more promptly and/or set the log level of the `CodecPool` to `WARN` -- This is an automated message from the Apache Git Service. To

[GitHub] [iceberg] ConeyLiu commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
ConeyLiu commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333635843 Thanks @chenjunjiedada @ajantha-bhat for the review. > What about 1.16? It is already done in #4627. > nit: we could at least split it into one spark and one flink PR or o

[GitHub] [iceberg] chenjunjiedada commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2 and Flink 1.14/1.15

2022-12-01 Thread GitBox
chenjunjiedada commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333640055 Well, I thought flink 1.16 was supported recently. Never thought the change was already in it. -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [iceberg] loleek opened a new issue, #6332: Create tables error when using JDBC catalog and mysql backend

2022-12-01 Thread GitBox
loleek opened a new issue, #6332: URL: https://github.com/apache/iceberg/issues/6332 ### Apache Iceberg version 1.1.0 (latest release) ### Query engine _No response_ ### Please describe the bug 🐞 When I use jdbc catalog with mysql backend in flink test case,

[GitHub] [iceberg] ConeyLiu opened a new pull request, #6333: Port #4627 to Flink 1.14/1.15

2022-12-01 Thread GitBox
ConeyLiu opened a new pull request, #6333: URL: https://github.com/apache/iceberg/pull/6333 This PR just ported #4627 to Flink 1.14/1.15. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [iceberg] nastra commented on issue #6332: Create tables error when using JDBC catalog and mysql backend

2022-12-01 Thread GitBox
nastra commented on issue #6332: URL: https://github.com/apache/iceberg/issues/6332#issuecomment-1333663036 The `max key length limit=767` is actually a limit that is imposed by the underlying database being used (MySQL in this case). A valid approach would be to increase that limit on MySQ

[GitHub] [iceberg] nastra closed issue #6273: java.sql.SQLException: Access denied for user 'hive'@'hadoopSlave0' (using password: YES)

2022-12-01 Thread GitBox
nastra closed issue #6273: java.sql.SQLException: Access denied for user 'hive'@'hadoopSlave0' (using password: YES) URL: https://github.com/apache/iceberg/issues/6273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [iceberg] nastra commented on issue #6273: java.sql.SQLException: Access denied for user 'hive'@'hadoopSlave0' (using password: YES)

2022-12-01 Thread GitBox
nastra commented on issue #6273: URL: https://github.com/apache/iceberg/issues/6273#issuecomment-1333664922 I'll close this for now. Feel free to re-open if necessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] nastra commented on issue #4092: Not an iceberg table Error use hive catalog-type

2022-12-01 Thread GitBox
nastra commented on issue #4092: URL: https://github.com/apache/iceberg/issues/4092#issuecomment-1333668098 Closing this for now. Feel free to re-open if necessary -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [iceberg] nastra closed issue #4092: Not an iceberg table Error use hive catalog-type

2022-12-01 Thread GitBox
nastra closed issue #4092: Not an iceberg table Error use hive catalog-type URL: https://github.com/apache/iceberg/issues/4092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [iceberg] nastra commented on issue #6236: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-12-01 Thread GitBox
nastra commented on issue #6236: URL: https://github.com/apache/iceberg/issues/6236#issuecomment-1333670878 @yuangjiang you can probably take a look at https://dev.mysql.com/doc/refman/8.0/en/innodb-limits.html -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] nastra commented on issue #5027: JDBC Catalog create table DDL is not compatible for MYSQL 8.0.29

2022-12-01 Thread GitBox
nastra commented on issue #5027: URL: https://github.com/apache/iceberg/issues/5027#issuecomment-1333676057 @noneback are you able to increase the limit on MySql? Usually different databases impose different limits, but I'll see if we can lower namespace properties from 5500. -- This is

[GitHub] [iceberg] Fokko opened a new pull request, #6334: Python: Add warning on projection by name

2022-12-01 Thread GitBox
Fokko opened a new pull request, #6334: URL: https://github.com/apache/iceberg/pull/6334 ```python ➜ python git:(master) ✗ python3 Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin Type "help", "copyright", "credits" or "license" for more in

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
hililiwei commented on code in PR #5967: URL: https://github.com/apache/iceberg/pull/5967#discussion_r1037105073 ## docs/flink-getting-started.md: ## @@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream"); OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
hililiwei commented on code in PR #5967: URL: https://github.com/apache/iceberg/pull/5967#discussion_r1037109651 ## docs/flink-getting-started.md: ## @@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream"); OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the

[GitHub] [iceberg] hililiwei commented on a diff in pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
hililiwei commented on code in PR #5967: URL: https://github.com/apache/iceberg/pull/5967#discussion_r1037113844 ## docs/flink-getting-started.md: ## @@ -683,7 +683,47 @@ env.execute("Test Iceberg DataStream"); OVERWRITE and UPSERT can't be set together. In UPSERT mode, if the

[GitHub] [iceberg] hililiwei commented on pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
hililiwei commented on PR #5967: URL: https://github.com/apache/iceberg/pull/5967#issuecomment-1333783891 > > Hmm, I understand your concern now. The job-level configuration should have a connector prefix that makes sense to me, shall we consider using the same prefix? > > Yeah. I wa

[GitHub] [iceberg] robinsinghstudios commented on issue #5977: How to write to a bucket-partitioned table using PySpark?

2022-12-01 Thread GitBox
robinsinghstudios commented on issue #5977: URL: https://github.com/apache/iceberg/issues/5977#issuecomment-1333788692 Hi, My data is being appended to the partitioned table without registering any UDFs but the data seems to be written as one row per file which is creating a huge per

[GitHub] [iceberg] ConeyLiu opened a new pull request, #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-01 Thread GitBox
ConeyLiu opened a new pull request, #6335: URL: https://github.com/apache/iceberg/pull/6335 In our production env, we noticed the manifest files have a large random size, ranging from several KB to larger than 100 MB. It seems the `MANIFEST_TARGET_SIZE_BYTES` has not worked during the comm

[GitHub] [iceberg] hililiwei commented on pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
hililiwei commented on PR #5967: URL: https://github.com/apache/iceberg/pull/5967#issuecomment-1333793436 > 2. some of the read configs don't make sense to set in table environment configs, as they are tied to a specific source/table. How should handle this situation? > > ```

[GitHub] [iceberg] ConeyLiu commented on pull request #6335: Core: Avoid generating a large ManifestFile when committing

2022-12-01 Thread GitBox
ConeyLiu commented on PR #6335: URL: https://github.com/apache/iceberg/pull/6335#issuecomment-1333794619 Hi @szehon-ho @rdblue @pvary @stevenzwu @chenjunjiedada pls help to review this when you are free. Thanks a lot. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [iceberg] ConeyLiu commented on pull request #6331: Port #4627 to Spark 2.4/3.1/3.2

2022-12-01 Thread GitBox
ConeyLiu commented on PR #6331: URL: https://github.com/apache/iceberg/pull/6331#issuecomment-1333796211 cc @szehon-ho @pvary @kbendick who reviewed the original pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] ConeyLiu commented on pull request #6333: Port #4627 to Flink 1.14/1.15

2022-12-01 Thread GitBox
ConeyLiu commented on PR #6333: URL: https://github.com/apache/iceberg/pull/6333#issuecomment-1333796294 cc @szehon-ho @pvary @kbendick who reviewed the original pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] hililiwei opened a new pull request, #6336: Doc: Replace build with append in the Flink sink doc

2022-12-01 Thread GitBox
hililiwei opened a new pull request, #6336: URL: https://github.com/apache/iceberg/pull/6336 `build()` has been removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6222: Flink: Support inspecting table

2022-12-01 Thread GitBox
hililiwei commented on code in PR #6222: URL: https://github.com/apache/iceberg/pull/6222#discussion_r1037191643 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/data/StructRowData.java: ## @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6222: Flink: Support inspecting table

2022-12-01 Thread GitBox
hililiwei commented on code in PR #6222: URL: https://github.com/apache/iceberg/pull/6222#discussion_r1037191643 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/data/StructRowData.java: ## @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [iceberg] InvisibleProgrammer opened a new pull request, #6337: Update Iceberg Hive documentation

2022-12-01 Thread GitBox
InvisibleProgrammer opened a new pull request, #6337: URL: https://github.com/apache/iceberg/pull/6337 Issue: https://github.com/apache/iceberg/issues/6249 This documentation contains the list of new features introduced in Hive 4.0.0-alpha-2. The only exception is https://issue

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6222: Flink: Support inspecting table

2022-12-01 Thread GitBox
hililiwei commented on code in PR #6222: URL: https://github.com/apache/iceberg/pull/6222#discussion_r1037252208 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/data/StructRowData.java: ## @@ -0,0 +1,340 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6222: Flink: Support inspecting table

2022-12-01 Thread GitBox
hililiwei commented on code in PR #6222: URL: https://github.com/apache/iceberg/pull/6222#discussion_r1037253800 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java: ## @@ -148,6 +150,17 @@ private Namespace toNamespace(String database) { } Tabl

[GitHub] [iceberg] RussellSpitzer commented on issue #6326: estimateStatistics cost mush time to compute stats

2022-12-01 Thread GitBox
RussellSpitzer commented on issue #6326: URL: https://github.com/apache/iceberg/issues/6326#issuecomment-1333946715 While I don't have a problem with disabling statistics reporting, I am pretty dubious this takes that long. What I believe you are actually seeing is the task list being creat

[GitHub] [iceberg] nastra opened a new pull request, #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
nastra opened a new pull request, #6338: URL: https://github.com/apache/iceberg/pull/6338 Users are running into issues when hooking up the `JdbcCatalog` with `MySql` or other Databases, which actually impose lower limits than [sqlite](https://www.sqlite.org/limits.html) (which we use for t

[GitHub] [iceberg] nastra commented on a diff in pull request #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
nastra commented on code in PR #6338: URL: https://github.com/apache/iceberg/pull/6338#discussion_r1037273853 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -185,9 +185,9 @@ final class JdbcUtil { + NAMESPACE_NAME + " VARCHAR(255) NOT NU

[GitHub] [iceberg] nastra commented on a diff in pull request #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
nastra commented on code in PR #6338: URL: https://github.com/apache/iceberg/pull/6338#discussion_r1037276526 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -72,9 +72,9 @@ final class JdbcUtil { + TABLE_NAME + " VARCHAR(255) NOT NULL,"

[GitHub] [iceberg] nastra commented on pull request #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
nastra commented on PR #6338: URL: https://github.com/apache/iceberg/pull/6338#issuecomment-1333965778 @openinx given that you reviewed https://github.com/apache/iceberg/pull/2778 back then, could you review this one as well please? -- This is an automated message from the Apache Git Serv

[GitHub] [iceberg] nastra commented on issue #6332: Create tables error when using JDBC catalog and mysql backend

2022-12-01 Thread GitBox
nastra commented on issue #6332: URL: https://github.com/apache/iceberg/issues/6332#issuecomment-1333969634 I have created to https://github.com/apache/iceberg/pull/6338 to make this less of an issue with MySql -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] nastra commented on issue #5027: JDBC Catalog create table DDL is not compatible for MYSQL 8.0.29

2022-12-01 Thread GitBox
nastra commented on issue #5027: URL: https://github.com/apache/iceberg/issues/5027#issuecomment-1333969840 I have created to https://github.com/apache/iceberg/pull/6338 to make this less of an issue with MySql -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] nastra commented on issue #6236: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-12-01 Thread GitBox
nastra commented on issue #6236: URL: https://github.com/apache/iceberg/issues/6236#issuecomment-1333970109 I have created to https://github.com/apache/iceberg/pull/6338 to make this less of an issue with MySql -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] RussellSpitzer commented on issue #5977: How to write to a bucket-partitioned table using PySpark?

2022-12-01 Thread GitBox
RussellSpitzer commented on issue #5977: URL: https://github.com/apache/iceberg/issues/5977#issuecomment-1333976156 @robinsinghstudios data needs to be globally sorted on the bucketing function, at least that is my guess with those symptoms. Luckily this should all be less of an issue once

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6339: Spark 3.3: Add hours transform function

2022-12-01 Thread GitBox
aokolnychyi opened a new pull request, #6339: URL: https://github.com/apache/iceberg/pull/6339 This PR adds `hours` transform function in Spark 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] aokolnychyi commented on pull request #6339: Spark 3.3: Add hours transform function

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6339: URL: https://github.com/apache/iceberg/pull/6339#issuecomment-1334029975 cc @kbendick @rdblue @RussellSpitzer @szehon-ho @flyrain @gaborkaszab @nastra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [iceberg] danielcweeks merged pull request #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
danielcweeks merged PR #6338: URL: https://github.com/apache/iceberg/pull/6338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[GitHub] [iceberg] danielcweeks closed issue #5027: JDBC Catalog create table DDL is not compatible for MYSQL 8.0.29

2022-12-01 Thread GitBox
danielcweeks closed issue #5027: JDBC Catalog create table DDL is not compatible for MYSQL 8.0.29 URL: https://github.com/apache/iceberg/issues/5027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [iceberg] danielcweeks closed issue #6236: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes

2022-12-01 Thread GitBox
danielcweeks closed issue #6236: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 3072 bytes URL: https://github.com/apache/iceberg/issues/6236 -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [iceberg] danielcweeks closed issue #6332: Create tables error when using JDBC catalog and mysql backend

2022-12-01 Thread GitBox
danielcweeks closed issue #6332: Create tables error when using JDBC catalog and mysql backend URL: https://github.com/apache/iceberg/issues/6332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] danielcweeks merged pull request #6322: Core: Fix NPE in CloseableIterable.close()

2022-12-01 Thread GitBox
danielcweeks merged PR #6322: URL: https://github.com/apache/iceberg/pull/6322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[GitHub] [iceberg] danielcweeks merged pull request #6317: Core: MetadataUpdateParser should write updates/removals fields rather than updated/removed

2022-12-01 Thread GitBox
danielcweeks merged PR #6317: URL: https://github.com/apache/iceberg/pull/6317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceber

[GitHub] [iceberg] alesk opened a new issue, #6340: Deleting column from an iceberg table breaks schema in AWS Glue catalog

2022-12-01 Thread GitBox
alesk opened a new issue, #6340: URL: https://github.com/apache/iceberg/issues/6340 ### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 When deleting a column with Spark SQL using `ALTER TABLE table_name DRO

[GitHub] [iceberg] nastra commented on issue #6340: Deleting a column from an iceberg table breaks schema in AWS Glue catalog

2022-12-01 Thread GitBox
nastra commented on issue #6340: URL: https://github.com/apache/iceberg/issues/6340#issuecomment-1334099047 @amogh-jahagirdar could you please take a look at this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [iceberg] rdblue commented on a diff in pull request #6338: Core: Use lower lengths for iceberg_namespace_properties / iceberg_tables in JdbcCatalog

2022-12-01 Thread GitBox
rdblue commented on code in PR #6338: URL: https://github.com/apache/iceberg/pull/6338#discussion_r1037388592 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcUtil.java: ## @@ -72,9 +72,9 @@ final class JdbcUtil { + TABLE_NAME + " VARCHAR(255) NOT NULL,"

[GitHub] [iceberg] aokolnychyi merged pull request #6289: Spark 2.4: Preserve file seq numbers while rewriting manifests

2022-12-01 Thread GitBox
aokolnychyi merged PR #6289: URL: https://github.com/apache/iceberg/pull/6289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] aokolnychyi commented on pull request #6289: Spark 2.4: Preserve file seq numbers while rewriting manifests

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6289: URL: https://github.com/apache/iceberg/pull/6289#issuecomment-1334119345 Thanks, @nastra! Sorry for the delay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1037396694 ## spark/v2.4/spark/src/main/java/org/apache/iceberg/spark/actions/BaseRewriteManifestsSparkAction.java: ## @@ -356,7 +356,7 @@ private static ManifestFile writeMan

[GitHub] [iceberg] aokolnychyi commented on pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6274: URL: https://github.com/apache/iceberg/pull/6274#issuecomment-1334121747 I'll take a look at seq number related changes today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6309: URL: https://github.com/apache/iceberg/pull/6309#discussion_r1037409951 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java: ## @@ -63,7 +64,7 @@ class SparkBatchQueryScan extends SparkScan impleme

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6309: URL: https://github.com/apache/iceberg/pull/6309#discussion_r1037410284 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -213,40 +215,53 @@ public Scan build() { Schema expectedSchema

[GitHub] [iceberg] aokolnychyi commented on pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6309: URL: https://github.com/apache/iceberg/pull/6309#issuecomment-1334137970 I tried adding boundaries but it was mostly useless as we need to support arbitrary types. The current approach seems most reasonable at the moment. -- This is an automated message

[GitHub] [iceberg] rdblue merged pull request #6328: Python: Set version to 0.2.0

2022-12-01 Thread GitBox
rdblue merged PR #6328: URL: https://github.com/apache/iceberg/pull/6328 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue merged pull request #6334: Python: Add warning on projection by name

2022-12-01 Thread GitBox
rdblue merged PR #6334: URL: https://github.com/apache/iceberg/pull/6334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] nastra commented on a diff in pull request #6274: Core|ORC|Spark: Remove deprecated functionality

2022-12-01 Thread GitBox
nastra commented on code in PR #6274: URL: https://github.com/apache/iceberg/pull/6274#discussion_r1037420331 ## spark/v2.4/spark/src/main/java/org/apache/iceberg/spark/actions/BaseRewriteManifestsSparkAction.java: ## @@ -356,7 +356,7 @@ private static ManifestFile writeManifest

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
RussellSpitzer commented on code in PR #6309: URL: https://github.com/apache/iceberg/pull/6309#discussion_r1037424678 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -211,11 +213,19 @@ public Scan build() { SparkReadOptions

[GitHub] [iceberg] szehon-ho commented on pull request #6314: Core: Re-add and deprecate HMS_TABLE_OWNER to TableProperties

2022-12-01 Thread GitBox
szehon-ho commented on PR #6314: URL: https://github.com/apache/iceberg/pull/6314#issuecomment-1334176715 Thanks a lot for the fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] aokolnychyi merged pull request #6339: Spark 3.3: Add hours transform function

2022-12-01 Thread GitBox
aokolnychyi merged PR #6339: URL: https://github.com/apache/iceberg/pull/6339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] aokolnychyi commented on pull request #6339: Spark 3.3: Add hours transform function

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6339: URL: https://github.com/apache/iceberg/pull/6339#issuecomment-1334220935 Thanks, @RussellSpitzer @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-12-01 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1037486854 ## core/src/main/java/org/apache/iceberg/MetricsUtil.java: ## @@ -56,4 +72,270 @@ public static MetricsModes.MetricsMode metricsMode( String columnName = inputSc

[GitHub] [iceberg] flyrain commented on a diff in pull request #6012: Spark 3.3: Add a procedure to generate table changes

2022-12-01 Thread GitBox
flyrain commented on code in PR #6012: URL: https://github.com/apache/iceberg/pull/6012#discussion_r1037488519 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/GenerateChangesProcedure.java: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] flyrain commented on a diff in pull request #6012: Spark 3.3: Add a procedure to generate table changes

2022-12-01 Thread GitBox
flyrain commented on code in PR #6012: URL: https://github.com/apache/iceberg/pull/6012#discussion_r1037489027 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/GenerateChangesProcedure.java: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] flyrain commented on a diff in pull request #6012: Spark 3.3: Add a procedure to generate table changes

2022-12-01 Thread GitBox
flyrain commented on code in PR #6012: URL: https://github.com/apache/iceberg/pull/6012#discussion_r1037489658 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/GenerateChangesProcedure.java: ## @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [iceberg] aokolnychyi merged pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
aokolnychyi merged PR #6309: URL: https://github.com/apache/iceberg/pull/6309 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] aokolnychyi commented on pull request #6309: Spark 3.3: Consume arbitrary scans in SparkBatchQueryScan

2022-12-01 Thread GitBox
aokolnychyi commented on PR #6309: URL: https://github.com/apache/iceberg/pull/6309#issuecomment-1334290414 Thanks, @RussellSpitzer! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [iceberg] Fokko opened a new pull request, #6342: Python: Introduce SchemaVisitorPerPrimitiveType

2022-12-01 Thread GitBox
Fokko opened a new pull request, #6342: URL: https://github.com/apache/iceberg/pull/6342 Instead of having another visitor go over the primitives, I think it is nicer to have an extended schema visitor that also goes over the primitive types -- This is an automated message from the Apache

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6343: Spark 3.3: Remove unused RowDataRewriter

2022-12-01 Thread GitBox
aokolnychyi opened a new pull request, #6343: URL: https://github.com/apache/iceberg/pull/6343 This PR removed no longer used `RowDataRewriter`. This class was needed for initial data compaction. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [iceberg] flyrain commented on a diff in pull request #6012: Spark 3.3: Add a procedure to generate table changes

2022-12-01 Thread GitBox
flyrain commented on code in PR #6012: URL: https://github.com/apache/iceberg/pull/6012#discussion_r1037552480 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/SparkProcedures.java: ## @@ -53,6 +53,7 @@ private static Map> initProcedureBuilders() { map

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-12-01 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1037554727 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -102,7 +102,8 @@ public interface DataFile extends ContentFile { int PARTITION_ID = 102; String PAR

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-12-01 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1037554935 ## spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/data/TestHelpers.java: ## @@ -817,4 +824,93 @@ public static Set reachableManifestPaths(Table table) {

[GitHub] [iceberg] aokolnychyi merged pull request #6343: Spark 3.3: Remove unused RowDataRewriter

2022-12-01 Thread GitBox
aokolnychyi merged PR #6343: URL: https://github.com/apache/iceberg/pull/6343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

[GitHub] [iceberg] abmo-x closed pull request #6301: Update Schema - should check if field is optional/required

2022-12-01 Thread GitBox
abmo-x closed pull request #6301: Update Schema - should check if field is optional/required URL: https://github.com/apache/iceberg/pull/6301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [iceberg] rbalamohan commented on issue #6326: estimateStatistics cost mush time to compute stats

2022-12-01 Thread GitBox
rbalamohan commented on issue #6326: URL: https://github.com/apache/iceberg/issues/6326#issuecomment-1334526348 Check if increasing "iceberg.worker.num-threads" helps in this case. Default should be the number of processors available in the system. This can be increased by setting it as sys

[GitHub] [iceberg] github-actions[bot] commented on issue #3705: Explore spark struct streaming write iceberg and synchronize to hive Metastore

2022-12-01 Thread GitBox
github-actions[bot] commented on issue #3705: URL: https://github.com/apache/iceberg/issues/3705#issuecomment-1334604152 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #3705: Explore spark struct streaming write iceberg and synchronize to hive Metastore

2022-12-01 Thread GitBox
github-actions[bot] closed issue #3705: Explore spark struct streaming write iceberg and synchronize to hive Metastore URL: https://github.com/apache/iceberg/issues/3705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] stevenzwu commented on pull request #5967: Flink: Support read options in flink source

2022-12-01 Thread GitBox
stevenzwu commented on PR #5967: URL: https://github.com/apache/iceberg/pull/5967#issuecomment-1334609528 @hililiwei seems that the only remaining task is to add prefix like `connector.iceberg.` to the configs coming from Flink job configuration. everything else are clarified and aligned.

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-12-01 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1037681151 ## core/src/main/java/org/apache/iceberg/MetricsUtil.java: ## @@ -56,4 +72,270 @@ public static MetricsModes.MetricsMode metricsMode( String columnName = inputSc

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6345: Spark 3.3: Choose readers based on task types

2022-12-01 Thread GitBox
aokolnychyi opened a new pull request, #6345: URL: https://github.com/apache/iceberg/pull/6345 This PR adds `SparkPartitionReaderFactory` that creates readers based on tasks in input partitions. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6345: Spark 3.3: Choose readers based on task types

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6345: URL: https://github.com/apache/iceberg/pull/6345#discussion_r1037687769 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BatchDataReader.java: ## @@ -28,21 +28,48 @@ import org.apache.iceberg.io.CloseableIterator; imp

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6345: Spark 3.3: Choose readers based on task types

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6345: URL: https://github.com/apache/iceberg/pull/6345#discussion_r1037687769 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BatchDataReader.java: ## @@ -28,21 +28,48 @@ import org.apache.iceberg.io.CloseableIterator; imp

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6345: Spark 3.3: Choose readers based on task types

2022-12-01 Thread GitBox
aokolnychyi commented on code in PR #6345: URL: https://github.com/apache/iceberg/pull/6345#discussion_r1037688715 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BatchDataReader.java: ## @@ -28,21 +28,48 @@ import org.apache.iceberg.io.CloseableIterator; imp

  1   2   >