[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6746: AWS: Load HttpClientBuilder dynamically to avoid runtime deps of both urlconnection and apache client

2023-02-08 Thread via GitHub
jackye1995 commented on code in PR #6746: URL: https://github.com/apache/iceberg/pull/6746#discussion_r1100376899 ## aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java: ## @@ -1314,55 +1270,18 @@ private void configureEndpoint(T builder, String en } } - @Vi

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6746: AWS: Load HttpClientBuilder dynamically to avoid runtime deps of both urlconnection and apache client

2023-02-08 Thread via GitHub
jackye1995 commented on code in PR #6746: URL: https://github.com/apache/iceberg/pull/6746#discussion_r1100379158 ## aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java: ## @@ -1314,55 +1270,18 @@ private void configureEndpoint(T builder, String en } } - @Vi

[GitHub] [iceberg] JonasJ-ap commented on pull request #6642: WIP: Support Snapshot Copy-On-Write Hudi Table to Iceberg Table

2023-02-08 Thread via GitHub
JonasJ-ap commented on PR #6642: URL: https://github.com/apache/iceberg/pull/6642#issuecomment-1422908119 [Curiosity] Which one is preferred for hudi-related variable names: `hudi` or `hoodiexxx` -- This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [iceberg] jackye1995 commented on pull request #6642: WIP: Support Snapshot Copy-On-Write Hudi Table to Iceberg Table

2023-02-08 Thread via GitHub
jackye1995 commented on PR #6642: URL: https://github.com/apache/iceberg/pull/6642#issuecomment-142298 > [Curiosity] Which one is preferred for hudi-related variable names: hudi or hoodiexxx Hudi should be, Hoodie was the name used before the official project name -- This i

[GitHub] [iceberg] munendrasn commented on issue #6763: ACL when using DynamoDb based Catalog

2023-02-08 Thread via GitHub
munendrasn commented on issue #6763: URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1422946779 With the above plan, schema would remain same for the table, while writing Namespace entries we would swap values (identifier, and namespace columns) As for backward compati

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6651: Spark 3.3 write to branch snapshot

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6651: URL: https://github.com/apache/iceberg/pull/6651#discussion_r1100447577 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -247,9 +247,6 @@ public ScanBuilder newScanBuilder(CaseInsensitiveStri

[GitHub] [iceberg] aokolnychyi commented on issue #6679: Change Default Write Distribution Mode

2023-02-08 Thread via GitHub
aokolnychyi commented on issue #6679: URL: https://github.com/apache/iceberg/issues/6679#issuecomment-1423000390 I will submit a PR to change the default distribution modes for insert and merge. I'll be also happy to review a PR for #6741. -- This is an automated message from the Apache G

[GitHub] [iceberg] szehon-ho commented on issue #6257: Partitions metadata table shows old partitions

2023-02-08 Thread via GitHub
szehon-ho commented on issue #6257: URL: https://github.com/apache/iceberg/issues/6257#issuecomment-1423004210 Hi @gaborkaszab sorry i was just re-reading this issue and had a question on the use-case, do you know why it doesnt use a metadata delete, to remove the partition without delete-f

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-08 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] RussellSpitzer commented on issue #6758: S3FileIO Can Create Non-Posix Paths

2023-02-08 Thread via GitHub
RussellSpitzer commented on issue #6758: URL: https://github.com/apache/iceberg/issues/6758#issuecomment-1423020339 Consensus at Community Sync was that for now we will just add a strip trailing slash to https://github.com/apache/iceberg/blob/6697129a314d98cb793601495f7ebb2ae000b40a/

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6771: Docs: Document that partitions metadata table might show 'old' partitions

2023-02-08 Thread via GitHub
szehon-ho commented on code in PR #6771: URL: https://github.com/apache/iceberg/pull/6771#discussion_r1100488895 ## docs/spark-queries.md: ## @@ -346,6 +346,9 @@ SELECT * FROM prod.db.table.partitions; Note: For unpartitioned tables, the partitions table will contain only the

[GitHub] [iceberg] jackye1995 commented on pull request #6598: Core: View representation core implementation

2023-02-08 Thread via GitHub
jackye1995 commented on PR #6598: URL: https://github.com/apache/iceberg/pull/6598#issuecomment-1423068671 @rdblue @danielcweeks the PR looks good to be merged, do you have any additional comment? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [iceberg] jackye1995 opened a new issue, #6774: Support SQLConf for branch write similar to WAP

2023-02-08 Thread via GitHub
jackye1995 opened a new issue, #6774: URL: https://github.com/apache/iceberg/issues/6774 ### Feature Request / Improvement As discussed in the community sync, support setting a SQLConf to write to a specific branch instead of the table's main branch. Realted to https://github.c

[GitHub] [iceberg] jackye1995 commented on issue #6774: Support SQLConf for branch write similar to WAP

2023-02-08 Thread via GitHub
jackye1995 commented on issue #6774: URL: https://github.com/apache/iceberg/issues/6774#issuecomment-1423071189 @rdblue I remember you said you would like to take this effort in the meeting? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] electrum commented on a diff in pull request #6772: Core: enforce writing POSIX compatible paths

2023-02-08 Thread via GitHub
electrum commented on code in PR #6772: URL: https://github.com/apache/iceberg/pull/6772#discussion_r1100535487 ## core/src/main/java/org/apache/iceberg/util/LocationUtil.java: ## @@ -33,4 +34,8 @@ public static String stripTrailingSlash(String path) { } return result;

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6660: Flink: Support writes to branches in FlinkSink

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6660: URL: https://github.com/apache/iceberg/pull/6660#discussion_r1100543216 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSinkBranch.java: ## @@ -0,0 +1,398 @@ +/* + * Licensed to the Apache Software

[GitHub] [iceberg] jackye1995 commented on issue #6763: ACL when using DynamoDb based Catalog

2023-02-08 Thread via GitHub
jackye1995 commented on issue #6763: URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1423100498 > schema would remain same for the DynamoDb table Cool then we are good here Using a config options sounds like a good idea, but I think that only handles the write s

[GitHub] [iceberg] Fokko commented on a diff in pull request #6745: Python: Use Version Ranges for Various Dependencies

2023-02-08 Thread via GitHub
Fokko commented on code in PR #6745: URL: https://github.com/apache/iceberg/pull/6745#discussion_r1100602879 ## python/pyproject.toml: ## @@ -50,33 +50,33 @@ include = [ [tool.poetry.dependencies] python = "^3.8" mmhash3 = "3.0.1" -requests = "2.28.2" +requests = ">=2.28.1,<=

[GitHub] [iceberg] Fokko opened a new pull request, #6775: Python: Add positional deletes

2023-02-08 Thread via GitHub
Fokko opened a new pull request, #6775: URL: https://github.com/apache/iceberg/pull/6775 Closes #6568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

[GitHub] [iceberg] jackye1995 commented on pull request #6642: WIP: Support Snapshot Copy-On-Write Hudi Table to Iceberg Table

2023-02-08 Thread via GitHub
jackye1995 commented on PR #6642: URL: https://github.com/apache/iceberg/pull/6642#issuecomment-1423178313 Took a brief look, overall I agree with what the community discussion led to, replaying the timeline is cool but Hudi concurrent transaction has awkward behavior and we cannot guarante

[GitHub] [iceberg] RussellSpitzer merged pull request #6554: Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518

2023-02-08 Thread via GitHub
RussellSpitzer merged PR #6554: URL: https://github.com/apache/iceberg/pull/6554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

[GitHub] [iceberg] RussellSpitzer commented on pull request #6554: Parquet: Improve Test Coverage of RowGroupFilter Code with Nans #6518

2023-02-08 Thread via GitHub
RussellSpitzer commented on PR #6554: URL: https://github.com/apache/iceberg/pull/6554#issuecomment-1423240542 Thanks for the PR @youngxinler ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] aokolnychyi opened a new pull request, #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi opened a new pull request, #6776: URL: https://github.com/apache/iceberg/pull/6776 While debugging storage-partitions joins, I realized our log messages in scans don't include table names. It makes hard to match log messages with tables if queries touch multiple tables. This PR

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
stevenzwu commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1100711928 ## docs/flink-getting-started.md: ## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provid

[GitHub] [iceberg] aokolnychyi commented on pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on PR #6776: URL: https://github.com/apache/iceberg/pull/6776#issuecomment-1423279323 cc @RussellSpitzer @rdblue @szehon-ho @flyrain @karuppayya @amogh-jahagirdar @singhpk234 -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100712967 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java: ## @@ -132,11 +132,6 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] RussellSpitzer opened a new pull request, #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
RussellSpitzer opened a new pull request, #6777: URL: https://github.com/apache/iceberg/pull/6777 Previously we could end up in situations where we unintentionally we would create // in file paths which is an issue with non-POSIX FileIOs. This is a partial fix for #6758 -- This is

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
stevenzwu commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1100716901 ## docs/flink-getting-started.md: ## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provid

[GitHub] [iceberg] RussellSpitzer commented on pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
RussellSpitzer commented on PR #6777: URL: https://github.com/apache/iceberg/pull/6777#issuecomment-1423293372 Cleaning up tests now, we have many tests that are using trailing slashes in table location -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [iceberg] agnes-xinyi-lu commented on issue #6710: REST-Catalog: missing conflict-checks for `dropTable` and `updateTable`

2023-02-08 Thread via GitHub
agnes-xinyi-lu commented on issue #6710: URL: https://github.com/apache/iceberg/issues/6710#issuecomment-1423337682 for updateTable, there is UUID check for every commit, that should guarantee the uniqueness of the metadata. Isn't metadata location decided on the server side for each new co

[GitHub] [iceberg] agnes-xinyi-lu opened a new issue, #6778: Rest Catalog UpdateTableRequest IOException handling could cause data discrepancy in case of response getting lost

2023-02-08 Thread via GitHub
agnes-xinyi-lu opened a new issue, #6778: URL: https://github.com/apache/iceberg/issues/6778 ### Apache Iceberg version 1.0.0 ### Query engine None ### Please describe the bug 🐞 Current [HttpClient](https://github.com/apache/iceberg/blob/master/core/src/mai

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
szehon-ho commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100776061 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -140,7 +140,7 @@ protected Statistics estimateStatistics(Snapshot snapshot) {

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100787732 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -140,7 +140,7 @@ protected Statistics estimateStatistics(Snapshot snaps

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100788235 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] abmo-x opened a new pull request, #6779: use table partition schema in add_files for getPartitions to avoid data corruption

2023-02-08 Thread via GitHub
abmo-x opened a new pull request, #6779: URL: https://github.com/apache/iceberg/pull/6779 Issue: partition of string type with integer value with prefix zero like "01" gets stored incorrectly without the zero as "1" resulting in partition and column value getting stored and returned inco

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100788235 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] dramaticlly commented on a diff in pull request #6779: use table partition schema in add_files for getPartitions to avoid data corruption

2023-02-08 Thread via GitHub
dramaticlly commented on code in PR #6779: URL: https://github.com/apache/iceberg/pull/6779#discussion_r1100813099 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkSchemaUtil.java: ## @@ -392,4 +392,14 @@ public static Map indexQuotedNameById(Schema schema) {

[GitHub] [iceberg] github-actions[bot] commented on issue #5355: Bump Flink to 1.15.1

2023-02-08 Thread via GitHub
github-actions[bot] commented on issue #5355: URL: https://github.com/apache/iceberg/issues/5355#issuecomment-1423405231 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #5355: Bump Flink to 1.15.1

2023-02-08 Thread via GitHub
github-actions[bot] closed issue #5355: Bump Flink to 1.15.1 URL: https://github.com/apache/iceberg/issues/5355 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [iceberg] bluzy commented on issue #6750: Failed to get table info from metastore using impersonation

2023-02-08 Thread via GitHub
bluzy commented on issue #6750: URL: https://github.com/apache/iceberg/issues/6750#issuecomment-1423406077 @lirui-apache Hello, I have problem with providing, multi-tenant hive service, so I am waiting for https://github.com/apache/iceberg/pull/6175 to be released. Please, could you p

[GitHub] [iceberg] dramaticlly opened a new issue, #6780: Spark AddFiles infer incorrect partition type when reading parquet files

2023-02-08 Thread via GitHub
dramaticlly opened a new issue, #6780: URL: https://github.com/apache/iceberg/issues/6780 ### Apache Iceberg version 0.14.0 ### Query engine Spark ### Please describe the bug 🐞 Parquet File Layout ``` s3a://bucket/warehouse/foo.db/bar/data/ .

[GitHub] [iceberg] jackye1995 opened a new issue, #6781: Fix migration of Delta table that has performed VACUUM

2023-02-08 Thread via GitHub
jackye1995 opened a new issue, #6781: URL: https://github.com/apache/iceberg/issues/6781 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 based on offline discussion with @JonasJ-ap , VACUUM will mess up with delta log and

[GitHub] [iceberg] dramaticlly commented on issue #6780: Spark AddFiles infer incorrect partition type when reading parquet files

2023-02-08 Thread via GitHub
dramaticlly commented on issue #6780: URL: https://github.com/apache/iceberg/issues/6780#issuecomment-1423414393 I believe @abmo-x is helping fix it in #6779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [iceberg] bryanck opened a new pull request, #6782: Update httpclient5 for thread interrupt fix

2023-02-08 Thread via GitHub
bryanck opened a new pull request, #6782: URL: https://github.com/apache/iceberg/pull/6782 This PR updates the Apache HTTP client from version 5.1 to 5.2.1. The motivation for this is to [include a fix](https://github.com/apache/httpcomponents-client/pull/394) for a potential hang when a t

[GitHub] [iceberg] abmo-x commented on a diff in pull request #6779: use table partition schema in add_files for getPartitions to avoid data corruption

2023-02-08 Thread via GitHub
abmo-x commented on code in PR #6779: URL: https://github.com/apache/iceberg/pull/6779#discussion_r1100862211 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java: ## @@ -836,9 +836,29 @@ public static String quotedFullIdentifier(String catalogName, Identi

[GitHub] [iceberg] dmgcodevil commented on issue #104: ManifestReader is not properly closed in BaseTableScan

2023-02-08 Thread via GitHub
dmgcodevil commented on issue #104: URL: https://github.com/apache/iceberg/issues/104#issuecomment-1423472906 `0.11.0` 23/02/09 01:05:06 WARN HadoopStreams: Unclosed output stream created by: org.apache.iceberg.hadoop.HadoopStreams$HadoopPositionOutputStream.(HadoopStreams.jav

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6777: URL: https://github.com/apache/iceberg/pull/6777#discussion_r1100883382 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -290,7 +291,7 @@ public String toString() { this.metadataFileLocation = metadataFileLoc

[GitHub] [iceberg] jackye1995 commented on a diff in pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
jackye1995 commented on code in PR #6777: URL: https://github.com/apache/iceberg/pull/6777#discussion_r1100891080 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -290,7 +291,7 @@ public String toString() { this.metadataFileLocation = metadataFileLocation;

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6777: URL: https://github.com/apache/iceberg/pull/6777#discussion_r1100906013 ## core/src/main/java/org/apache/iceberg/TableMetadata.java: ## @@ -290,7 +291,7 @@ public String toString() { this.metadataFileLocation = metadataFileLoc

[GitHub] [iceberg] thomasaNvidia opened a new issue, #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-08 Thread via GitHub
thomasaNvidia opened a new issue, #6783: URL: https://github.com/apache/iceberg/issues/6783 ### Apache Iceberg version 1.0.0 ### Query engine Spark ### Please describe the bug 🐞 While using AWS Glue 4.0 and Iceberg 1.0.0 and leveraging the AWS Glue Data Cat

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-08 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1423569969 Here is the error message if that helps. ``` pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table iceberg_gros

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-08 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1423603879 Another note, when running the `CREATE TABLE` command and then manually changing the `inputFormat`, `outputFormat`, and `serialization.lib`. I run the insert command one time a

[GitHub] [iceberg] thomasaNvidia commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-08 Thread via GitHub
thomasaNvidia commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1423625120 Another note for context as I must be using this incorrectly somehow. When I go into AWS Glue Data Catalog and `Edit` the table and hit save with a new version without changing

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
singhpk234 commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100972549 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100981238 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100982048 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100985599 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
aokolnychyi commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100985955 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java: ## @@ -140,7 +140,7 @@ protected Statistics estimateStatistics(Snapshot snapshot)

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6776: Spark 3.3: Improve log messages in scans

2023-02-08 Thread via GitHub
amogh-jahagirdar commented on code in PR #6776: URL: https://github.com/apache/iceberg/pull/6776#discussion_r1100993287 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java: ## @@ -123,8 +127,17 @@ public void filter(Filter[] filters) {

[GitHub] [iceberg] pvary commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
pvary commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1101003182 ## docs/flink-getting-started.md: ## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provided b

[GitHub] [iceberg] pvary commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
pvary commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1101004892 ## docs/flink-getting-started.md: ## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provided b

[GitHub] [iceberg] pvary commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
pvary commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1101005546 ## docs/flink-getting-started.md: ## @@ -747,6 +747,50 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provided b

[GitHub] [iceberg] pvary commented on a diff in pull request #6764: Flink: improve metrics (elapsedSecondsSinceLastSuccessfulCommit) and …

2023-02-08 Thread via GitHub
pvary commented on code in PR #6764: URL: https://github.com/apache/iceberg/pull/6764#discussion_r1101006954 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitterMetrics.java: ## @@ -54,12 +60,38 @@ void commitDuration(long commitDurationMs) {

[GitHub] [iceberg] lirui-apache commented on issue #6750: Failed to get table info from metastore using impersonation

2023-02-08 Thread via GitHub
lirui-apache commented on issue #6750: URL: https://github.com/apache/iceberg/issues/6750#issuecomment-1423670948 Hi @bluzy , we're using #6175 in our internal code and it solves the problem we faced. However, according to the discussions in that PR, we'll implement pluggable client pool an

[GitHub] [iceberg] pvary commented on a diff in pull request #6765: Doc: update Flink doc for sink metrics

2023-02-08 Thread via GitHub
pvary commented on code in PR #6765: URL: https://github.com/apache/iceberg/pull/6765#discussion_r1101007377 ## docs/flink-getting-started.md: ## @@ -747,6 +747,44 @@ FlinkSink.builderFor( .append(); ``` +### monitoring metrics + +The following Flink metrics are provided b

[GitHub] [iceberg] 0xffmeta opened a new issue, #6784: Hive memory issue with reading iceberg v2 from hive

2023-02-08 Thread via GitHub
0xffmeta opened a new issue, #6784: URL: https://github.com/apache/iceberg/issues/6784 ### Apache Iceberg version 0.13.1 ### Query engine Hive ### Please describe the bug 🐞 After we upgrade from iceberg v1 format to v2 with a flink upsert job, we constanly

[GitHub] [iceberg] nastra commented on a diff in pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
nastra commented on code in PR #6777: URL: https://github.com/apache/iceberg/pull/6777#discussion_r1101061561 ## core/src/test/java/org/apache/iceberg/TestTableMetadata.java: ## @@ -1562,6 +1563,19 @@ public void testNoReservedPropertyForTableMetadataCreation() {

[GitHub] [iceberg] findepi commented on pull request #6777: Core: TableMetadata Always Strips Trailing Slash From Location

2023-02-08 Thread via GitHub
findepi commented on PR #6777: URL: https://github.com/apache/iceberg/pull/6777#issuecomment-1423751198 cc @ebyhr @electrum @alexjo2144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [iceberg] ajantha-bhat opened a new issue, #6785: Support Unregister table CALL procedure

2023-02-09 Thread via GitHub
ajantha-bhat opened a new issue, #6785: URL: https://github.com/apache/iceberg/issues/6785 ### Feature Request / Improvement Multiple times users have run into a scenario where they had accidentally deleted the table files from storage but they cannot drop the table from catalog.

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
ajantha-bhat opened a new pull request, #6786: URL: https://github.com/apache/iceberg/pull/6786 Fixes #6785 Note: Not supporting this feature for hadoop catalog as there is no concept of just removing the table entry from catalog in hadoop as hadoop catalog will always clean the

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6786: URL: https://github.com/apache/iceberg/pull/6786#discussion_r1101101294 ## core/src/main/java/org/apache/iceberg/CachingCatalog.java: ## @@ -125,6 +125,10 @@ private TableIdentifier canonicalizeIdentifier(TableIdentifier tableIdentifi

[GitHub] [iceberg] ajantha-bhat closed pull request #4675: Docs: Document register table support of HiveCatalog

2023-02-09 Thread via GitHub
ajantha-bhat closed pull request #4675: Docs: Document register table support of HiveCatalog URL: https://github.com/apache/iceberg/pull/4675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
ajantha-bhat opened a new pull request, #6787: URL: https://github.com/apache/iceberg/pull/6787 PR #[5037](https://github.com/apache/iceberg/pull/5037) supports register tables for all (most of) the catalogs. There is no restriction in the register table procedure about catalog type. Bu

[GitHub] [iceberg] ajantha-bhat commented on pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
ajantha-bhat commented on PR #6787: URL: https://github.com/apache/iceberg/pull/6787#issuecomment-1423827362 @RussellSpitzer, @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
ajantha-bhat commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1423828785 cc: @RussellSpitzer, @flyrain, @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] slfan1989 commented on pull request #6735: Parquet: deprecate Decimal Metadata usage in favor of DecimalLogicalTypeAnnotation

2023-02-09 Thread via GitHub
slfan1989 commented on PR #6735: URL: https://github.com/apache/iceberg/pull/6735#issuecomment-1423841619 > Looks good to me! @jackye1995 Thank you very much for your help in reviewing the code! -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [iceberg] JonasJ-ap commented on pull request #6746: AWS: Load HttpClientBuilder dynamically to avoid runtime deps of both urlconnection and apache client

2023-02-09 Thread via GitHub
JonasJ-ap commented on PR #6746: URL: https://github.com/apache/iceberg/pull/6746#issuecomment-1423889742 Sorry for the late update. It took me sometime to construct the EKS environment properly. ## Test Environment AWS EKS: 1.24, Spark 3.1.2 ## Test spark job / k8s job config: `

[GitHub] [iceberg] nastra commented on issue #6778: Rest Catalog UpdateTableRequest IOException handling could cause data discrepancy in case of response getting lost

2023-02-09 Thread via GitHub
nastra commented on issue #6778: URL: https://github.com/apache/iceberg/issues/6778#issuecomment-1424157372 I believe this was fixed by https://github.com/apache/iceberg/pull/5694. I don't think we can just convert all `IOExceptions` to `CommitStateUnknownException` in the `HttpClient`.

[GitHub] [iceberg] RussellSpitzer commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
RussellSpitzer commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424188600 Seems very specific. I think we talked about this before as just a part of the "drop table" command. If we are doing this method I feel like there needs to be a catalog speci

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
RussellSpitzer commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101460917 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables()

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101483409 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables() {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
RussellSpitzer commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101484896 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables()

[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
ajantha-bhat commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424220728 > So I think either this is a Catalog API or we just have this as part of the "drop table" behavior. We already have a catalog-specific API. Which is `catalog.dropTable(identi

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101494404 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables() {

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101521454 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables() {

[GitHub] [iceberg] ludlows opened a new pull request, #6788: Build: add version.txt for build.gradle

2023-02-09 Thread via GitHub
ludlows opened a new pull request, #6788: URL: https://github.com/apache/iceberg/pull/6788 make the gradle build process run without exceptions in absence of git. [throw new Exception("Neither version.txt nor git version exists")](https://github.com/ludlows/iceberg/blob/505368ad3f

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
RussellSpitzer commented on code in PR #6787: URL: https://github.com/apache/iceberg/pull/6787#discussion_r1101536372 ## spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRegisterTableProcedure.java: ## @@ -55,10 +53,6 @@ public void dropTables()

[GitHub] [iceberg] RussellSpitzer merged pull request #6787: Spark: Update register table procedure test case for all the catalogs

2023-02-09 Thread via GitHub
RussellSpitzer merged PR #6787: URL: https://github.com/apache/iceberg/pull/6787 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

[GitHub] [iceberg] RussellSpitzer commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
RussellSpitzer commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424278602 > > So I think either this is a Catalog API or we just have this as part of the "drop table" behavior. > > We already have a catalog-specific API. Which is `catalog.dropTabl

[GitHub] [iceberg] Fokko commented on pull request #6788: Build: add version.txt for build.gradle

2023-02-09 Thread via GitHub
Fokko commented on PR #6788: URL: https://github.com/apache/iceberg/pull/6788#issuecomment-1424283805 @ludlows thanks for opening this PR. The `version.txt` is created as part of the release process: https://iceberg.apache.org/how-to-release/#creating-a-release-candidate and having it in t

[GitHub] [iceberg] Fokko merged pull request #6782: Core: Update httpclient5 for thread interrupt fix

2023-02-09 Thread via GitHub
Fokko merged PR #6782: URL: https://github.com/apache/iceberg/pull/6782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] ludlows commented on pull request #6788: Build: add version.txt for build.gradle

2023-02-09 Thread via GitHub
ludlows commented on PR #6788: URL: https://github.com/apache/iceberg/pull/6788#issuecomment-1424308948 @Fokko thanks for your comments. let me close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [iceberg] ludlows closed pull request #6788: Build: add version.txt for build.gradle

2023-02-09 Thread via GitHub
ludlows closed pull request #6788: Build: add version.txt for build.gradle URL: https://github.com/apache/iceberg/pull/6788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [iceberg] Fokko merged pull request #6745: Python: Use Version Ranges for Various Dependencies

2023-02-09 Thread via GitHub
Fokko merged PR #6745: URL: https://github.com/apache/iceberg/pull/6745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] stevenzwu merged pull request #6764: Flink: improve metrics (elapsedSecondsSinceLastSuccessfulCommit) and …

2023-02-09 Thread via GitHub
stevenzwu merged PR #6764: URL: https://github.com/apache/iceberg/pull/6764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

2023-02-09 Thread via GitHub
ajantha-bhat commented on PR #6786: URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424471540 > This is what i'm saying, if we want an api which does not treat it as an Iceberg table first I think we need that as a Catalog API. If Drop Table needs the table to be an iceberg t

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6789: Nessie: Handle refresh for catalog APIs that doesn't use table operations

2023-02-09 Thread via GitHub
ajantha-bhat opened a new pull request, #6789: URL: https://github.com/apache/iceberg/pull/6789 TODO: Need to add testcase for each API with multiple clients. Consider the scenario, a) client1 - java Nessie catalog client creates a table1 b) client2 - spark + iceberg creates a ta

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6789: Nessie: Handle refresh for catalog APIs that doesn't use table operations

2023-02-09 Thread via GitHub
ajantha-bhat commented on code in PR #6789: URL: https://github.com/apache/iceberg/pull/6789#discussion_r1101750610 ## nessie/src/test/java/org/apache/iceberg/nessie/TestMultipleClients.java: ## @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [iceberg] amogh-jahagirdar commented on issue #6783: inputFormat, outputFormat, and serialization.lib not being set with AWS Glue 4.0 and Iceberg while Create table

2023-02-09 Thread via GitHub
amogh-jahagirdar commented on issue #6783: URL: https://github.com/apache/iceberg/issues/6783#issuecomment-1424555146 This is expected, since these are properties that are not relevant for Iceberg tables. In other words, in the `CREATE TABLE` you should not have to specify `inputFormat`, `o

[GitHub] [iceberg] jackye1995 commented on pull request #6746: AWS: Load HttpClientBuilder dynamically to avoid runtime deps of both urlconnection and apache client

2023-02-09 Thread via GitHub
jackye1995 commented on PR #6746: URL: https://github.com/apache/iceberg/pull/6746#issuecomment-1424575793 I think all the comments are addressed and we have enough vote, I will go ahead and merge this. Thanks for fixing this with such detailed verification! And thanks for the review @steve

<    12   13   14   15   16   17   18   19   20   21   >