[GitHub] [iceberg] yegangy0718 commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-12 Thread GitBox
yegangy0718 commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1045449754 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #6408: Spark: Cleanup commented out code in SparkValueReaders

2022-12-12 Thread GitBox
amogh-jahagirdar opened a new pull request, #6408: URL: https://github.com/apache/iceberg/pull/6408 Was going through SparkValueReaders and saw some commented out code which can be removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [iceberg] Fokko merged pull request #6342: Python: Introduce SchemaVisitorPerPrimitiveType

2022-12-12 Thread GitBox
Fokko merged PR #6342: URL: https://github.com/apache/iceberg/pull/6342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on a diff in pull request #6342: Python: Introduce SchemaVisitorPerPrimitiveType

2022-12-12 Thread GitBox
Fokko commented on code in PR #6342: URL: https://github.com/apache/iceberg/pull/6342#discussion_r1045589430 ## python/pyiceberg/schema.py: ## @@ -317,6 +331,97 @@ def primitive(self, primitive: PrimitiveType) -> T: """Visit a PrimitiveType""" +class SchemaVisitorPe

[GitHub] [iceberg] Fokko merged pull request #6409: Python: Add missing types

2022-12-12 Thread GitBox
Fokko merged PR #6409: URL: https://github.com/apache/iceberg/pull/6409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] Fokko commented on pull request #6409: Python: Add missing types

2022-12-12 Thread GitBox
Fokko commented on PR #6409: URL: https://github.com/apache/iceberg/pull/6409#issuecomment-1346223849 Merging this to avoid others having a red CI because of the missing types -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [iceberg-docs] gaborkaszab commented on a diff in pull request #187: Update the how-to-release page with findings after being a release manager

2022-12-12 Thread GitBox
gaborkaszab commented on code in PR #187: URL: https://github.com/apache/iceberg-docs/pull/187#discussion_r1045663147 ## landing-page/content/common/how-to-release.md: ## @@ -222,6 +246,12 @@ Therefore, the release candidate is passed/rejected. After the release vote has pass

[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
gaborkaszab commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1045678055 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -304,7 +306,9 @@ public Table loadTable(SessionContext context, TableIdentifier identi

[GitHub] [iceberg] zinking commented on a diff in pull request #6371: Spark 3.3: Support storage-partitioned joins

2022-12-12 Thread GitBox
zinking commented on code in PR #6371: URL: https://github.com/apache/iceberg/pull/6371#discussion_r1045716950 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java: ## @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [iceberg] nastra opened a new pull request, #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra opened a new pull request, #6411: URL: https://github.com/apache/iceberg/pull/6411 Previously, having an unpartitioned table would produce a `"partitions."` entry in the snapshot summary when the partition summary limit was configured -- This is an automated message from the Apache

[GitHub] [iceberg] InvisibleProgrammer commented on issue #6370: What is the purpose of Hive Lock ?

2022-12-12 Thread GitBox
InvisibleProgrammer commented on issue #6370: URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1346382736 @pvary I think it is not a simple yes or no question. We need some time to better understand the topic and the consequences of the change. -- This is an automated me

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045769408 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() { Map summary = snapshot.summary(

[GitHub] [iceberg] gaborkaszab commented on pull request #5837: API,Core: Introduce metrics for data files by file format

2022-12-12 Thread GitBox
gaborkaszab commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1346411031 Thanks again for the review, @nastra! I believe I have addressed all of your previous comments. Another round of review would be appreciated :) -- This is an automated message fr

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045770751 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() { Map summary = snapshot.summary(

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045769408 ## core/src/test/java/org/apache/iceberg/TestRowDelta.java: ## @@ -896,17 +896,14 @@ public void testAddDeleteFilesMultipleSpecs() { Map summary = snapshot.summary(

[GitHub] [iceberg] nastra commented on pull request #5837: API,Core: Introduce metrics for data files by file format

2022-12-12 Thread GitBox
nastra commented on PR #5837: URL: https://github.com/apache/iceberg/pull/5837#issuecomment-1346417834 thanks @gaborkaszab, will try and review it this week. You might also want to include @rdblue or @danielcweeks for a review. -- This is an automated message from the Apache Git Service.

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6313: Flink: use correct metric config for position deletes

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6313: URL: https://github.com/apache/iceberg/pull/6313#discussion_r1045778493 ## core/src/main/java/org/apache/iceberg/MetricsConfig.java: ## @@ -107,6 +107,30 @@ public static MetricsConfig forPositionDelete(Table table) { return ne

[GitHub] [iceberg] RussellSpitzer commented on issue #6406: Overlapping data in data files even after sorting

2022-12-12 Thread GitBox
RussellSpitzer commented on issue #6406: URL: https://github.com/apache/iceberg/issues/6406#issuecomment-1346431638 Are you sure you are only checking the live manifest files for the table? How do the metrics compare with those in the metadata view of the table? -- This is an automated me

[GitHub] [iceberg] Fokko commented on issue #6397: Python Instructions currently do not work for testing

2022-12-12 Thread GitBox
Fokko commented on issue #6397: URL: https://github.com/apache/iceberg/issues/6397#issuecomment-1346447504 @rubenvdg `poetry shell` is not required, since the [Makefile](https://github.com/apache/iceberg/blob/master/python/Makefile#L28-L32) already executes the tests `poetry run`. I'm still

[GitHub] [iceberg] rubenvdg commented on issue #6397: Python Instructions currently do not work for testing

2022-12-12 Thread GitBox
rubenvdg commented on issue #6397: URL: https://github.com/apache/iceberg/issues/6397#issuecomment-1346538268 [@Fokko It seems that it only works on Python >= 3.9.8. There's some stuff on Protocols in the release notes](https://docs.python.org/release/3.9.8/whatsnew/changelog.html) that mi

[GitHub] [iceberg] rubenvdg commented on issue #6397: Python Instructions currently do not work for testing

2022-12-12 Thread GitBox
rubenvdg commented on issue #6397: URL: https://github.com/apache/iceberg/issues/6397#issuecomment-1346543174 Both Docker containers fail btw with: ``` #11 21.80 FAILED tests/io/test_pyarrow.py::test_raise_on_opening_a_local_file_no_permission #11 21.80 FAILED tests/io/test_pya

[GitHub] [iceberg] Fokko commented on issue #6397: Python Instructions currently do not work for testing

2022-12-12 Thread GitBox
Fokko commented on issue #6397: URL: https://github.com/apache/iceberg/issues/6397#issuecomment-1346548075 > It seems that it only works on Python >= 3.9.8. There's some bugfix on Protocols in the [release notes](https://docs.python.org/release/3.9.8/whatsnew/changelog.html) that might rin

[GitHub] [iceberg] cccs-eric commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-12 Thread GitBox
cccs-eric commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1045861171 ## python/tests/io/test_fsspec.py: ## @@ -204,6 +204,191 @@ def test_writing_avro_file(generated_manifest_entry_file: Generator[str, None, N b2 = in_f.r

[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-12 Thread GitBox
gaborkaszab commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1045784906 ## api/src/main/java/org/apache/iceberg/metrics/MetricsReporter.java: ## @@ -18,10 +18,16 @@ */ package org.apache.iceberg.metrics; +import java.util.Map; + /

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045878419 ## spark/v2.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java: ## @@ -898,7 +898,7 @@ public void testSnapshotsTable() {

[GitHub] [iceberg] nastra commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-12 Thread GitBox
nastra commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1045745386 ## api/src/main/java/org/apache/iceberg/metrics/MetricsReporter.java: ## @@ -18,10 +18,16 @@ */ package org.apache.iceberg.metrics; +import java.util.Map; + /** Th

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6377: Flink: add util class to generate test data with extensive coverage d…

2022-12-12 Thread GitBox
hililiwei commented on code in PR #6377: URL: https://github.com/apache/iceberg/pull/6377#discussion_r1045850345 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/DataGenerators.java: ## @@ -0,0 +1,746 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

2022-12-12 Thread GitBox
pvary commented on issue #6370: URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1346635105 @InvisibleProgrammer: Fair enough. If it would have been a simple question, I might not have asked šŸ˜„ -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [iceberg] nastra commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1045939277 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -177,7 +177,9 @@ public void initialize(String name, Map unresolved) { this.io =

[GitHub] [iceberg] cccs-eric commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-12 Thread GitBox
cccs-eric commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1045861171 ## python/tests/io/test_fsspec.py: ## @@ -204,6 +204,191 @@ def test_writing_avro_file(generated_manifest_entry_file: Generator[str, None, N b2 = in_f.r

[GitHub] [iceberg] nastra commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1045949398 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -304,7 +306,9 @@ public Table loadTable(SessionContext context, TableIdentifier identifier)

[GitHub] [iceberg] rubenvdg opened a new pull request, #6413: Python: Remove outdated docs + some suggestions for textual improvements

2022-12-12 Thread GitBox
rubenvdg opened a new pull request, #6413: URL: https://github.com/apache/iceberg/pull/6413 c.f. https://github.com/apache/iceberg/issues/6397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] nastra commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
nastra commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1045978833 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java: ## @@ -953,9 +953,9 @@ public void testDeleteWithMultipleSpecs() {

[GitHub] [iceberg] nastra closed pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra closed pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property URL: https://github.com/apache/iceberg/pull/6404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [iceberg] nastra opened a new pull request, #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra opened a new pull request, #6404: URL: https://github.com/apache/iceberg/pull/6404 In certain cases it makes sense to make the used metrics reporter customizable, so that users have more control over it and how it's reporting -- This is an automated message from the Apache Git Serv

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-12 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1046006735 ## api/src/main/java/org/apache/iceberg/metrics/MetricsReporter.java: ## @@ -18,10 +18,16 @@ */ package org.apache.iceberg.metrics; +import java.util.Map; + /** T

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-12 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1046008836 ## core/src/main/java/org/apache/iceberg/BaseTable.java: ## @@ -63,6 +63,10 @@ public String name() { return name; } + public MetricsReporter reporter() { R

[GitHub] [iceberg] kmozaid commented on a diff in pull request #6410: Configurable metrics reporter by catalog properties

2022-12-12 Thread GitBox
kmozaid commented on code in PR #6410: URL: https://github.com/apache/iceberg/pull/6410#discussion_r1046012714 ## hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCatalog.java: ## @@ -198,6 +201,28 @@ public void testCreateTableTxnBuilder() throws Exception { }

[GitHub] [iceberg] rdblue commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
rdblue commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046056595 ## api/src/main/java/org/apache/iceberg/metrics/LoggingMetricsReporter.java: ## @@ -28,6 +28,11 @@ */ public class LoggingMetricsReporter implements MetricsReporter {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
rdblue commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046057652 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -106,7 +106,7 @@ public class RESTSessionCatalog extends BaseSessionCatalog private Resou

[GitHub] [iceberg] rdblue commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
rdblue commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046062940 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -316,7 +320,7 @@ private void reportMetrics( TableIdentifier tableIdentifier,

[GitHub] [iceberg] rdblue commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
rdblue commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046068902 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -304,7 +306,9 @@ public Table loadTable(SessionContext context, TableIdentifier identifier)

[GitHub] [iceberg] rdblue commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
rdblue commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046069256 ## api/src/main/java/org/apache/iceberg/metrics/LoggingMetricsReporter.java: ## @@ -28,6 +28,11 @@ */ public class LoggingMetricsReporter implements MetricsReporter {

[GitHub] [iceberg] rdblue commented on a diff in pull request #6411: Core: Don't produce partition summaries on unpartitioned table

2022-12-12 Thread GitBox
rdblue commented on code in PR #6411: URL: https://github.com/apache/iceberg/pull/6411#discussion_r1046073562 ## core/src/main/java/org/apache/iceberg/SnapshotSummary.java: ## @@ -148,7 +148,7 @@ public void set(String property, String value) { } private void updateP

[GitHub] [iceberg] danielcweeks commented on a diff in pull request #6352: AWS: Fix inconsistent behavior of naming S3 location between read and write operations by allowing only s3 bucket name

2022-12-12 Thread GitBox
danielcweeks commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1046096643 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit =

[GitHub] [iceberg] ajantha-bhat opened a new issue, #6414: pyiceberg: Support Nessie catalog

2022-12-12 Thread GitBox
ajantha-bhat opened a new issue, #6414: URL: https://github.com/apache/iceberg/issues/6414 ### Feature Request / Improvement very recently pyiceberg has added support for glue catalog (0.2.0). We need to have support for Nessie catalog too just like hive, glue, REST catalogs.

[GitHub] [iceberg] ajantha-bhat commented on issue #6414: pyiceberg: Support Nessie catalog

2022-12-12 Thread GitBox
ajantha-bhat commented on issue #6414: URL: https://github.com/apache/iceberg/issues/6414#issuecomment-1346881787 I will wait a week or two to see if anyone wants to pick it up. If not, I will try to work on it. -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [iceberg] nastra commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046130246 ## api/src/main/java/org/apache/iceberg/metrics/LoggingMetricsReporter.java: ## @@ -28,6 +28,11 @@ */ public class LoggingMetricsReporter implements MetricsReporter {

[GitHub] [iceberg] nastra commented on a diff in pull request #6404: Core: Allow configuring metrics reporter impl via Catalog property

2022-12-12 Thread GitBox
nastra commented on code in PR #6404: URL: https://github.com/apache/iceberg/pull/6404#discussion_r1046134055 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -106,7 +106,7 @@ public class RESTSessionCatalog extends BaseSessionCatalog private Resou

[GitHub] [iceberg] nastra commented on issue #6393: Couldn't initialize a SAX driver to create an XMLReader

2022-12-12 Thread GitBox
nastra commented on issue #6393: URL: https://github.com/apache/iceberg/issues/6393#issuecomment-1346932126 @amogh-jahagirdar could you help out here please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [iceberg] stevenzwu merged pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-12 Thread GitBox
stevenzwu merged PR #6394: URL: https://github.com/apache/iceberg/pull/6394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-12 Thread GitBox
stevenzwu commented on PR #6394: URL: https://github.com/apache/iceberg/pull/6394#issuecomment-1346977801 thanks @hililiwei -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [iceberg] ajantha-bhat commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

2022-12-12 Thread GitBox
ajantha-bhat commented on PR #6296: URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1346988722 ping @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046225363 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -357,6 +357,12 @@ public Transaction newTransaction() { throw new UnsupportedOperationExce

[GitHub] [iceberg] asheeshgarg opened a new issue, #6415: Vectorized Read Issue

2022-12-12 Thread GitBox
asheeshgarg opened a new issue, #6415: URL: https://github.com/apache/iceberg/issues/6415 ### Apache Iceberg version 0.14.0 ### Query engine None ### Please describe the bug šŸž @nastra I am using Vectorized read from Java API and able to load the data

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046275112 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java: ## @@ -68,6 +69,23 @@ public DataIterator( this.recordOffset = 0L; } +

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046286159 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java: ## @@ -220,7 +212,7 @@ public FlinkInputFormat buildFormat() { readab

[GitHub] [iceberg] huaxingao commented on pull request #6405: API: Add Aggregate expression evaluation

2022-12-12 Thread GitBox
huaxingao commented on PR #6405: URL: https://github.com/apache/iceberg/pull/6405#issuecomment-1347229885 @rdblue Thank you very much for the PR! I will get your code to my local and work on integrating my changes into yours. -- This is an automated message from the Apache Git Service. To

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046346355 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046354676 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046360920 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046364398 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046360920 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046364398 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1045171171 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -96,6 +97,7 @@ public void before() { @After public void cle

[GitHub] [iceberg] rdblue commented on pull request #6267: Core: Update StatisticsFile interface in TableMetadata spec

2022-12-12 Thread GitBox
rdblue commented on PR #6267: URL: https://github.com/apache/iceberg/pull/6267#issuecomment-1347376734 > I think we still need an interface to get the current stats file for snapshot id currentStatisticsFiles() No, I don't think this is valuable. This depends too much on some definiti

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6377: Flink: add util class to generate test data with extensive coverage d…

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6377: URL: https://github.com/apache/iceberg/pull/6377#discussion_r1046417674 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/DataGenerators.java: ## @@ -0,0 +1,746 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] ajantha-bhat commented on pull request #6267: Docs: Update spec about statistics file snapshot id

2022-12-12 Thread GitBox
ajantha-bhat commented on PR #6267: URL: https://github.com/apache/iceberg/pull/6267#issuecomment-1347529570 @rdblue: Thanks for the review and suggestions. I have kept it as just the document (spec) update now. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046518537 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046534226 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046534226 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046536345 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -213,13 +214,26 @@ public Scan build() { SparkReadOptions.END_SN

[GitHub] [iceberg] github-actions[bot] commented on issue #5043: Flink import debezium cdc record(delete type) to iceberg(0.13.2+) got IndexOutOfBoundsException

2022-12-12 Thread GitBox
github-actions[bot] commented on issue #5043: URL: https://github.com/apache/iceberg/issues/5043#issuecomment-1347558186 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] github-actions[bot] commented on issue #4900: spark action expireSnapshots and removeOrphanFiles block in spark local mode

2022-12-12 Thread GitBox
github-actions[bot] commented on issue #4900: URL: https://github.com/apache/iceberg/issues/4900#issuecomment-1347558303 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046541174 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -213,13 +214,26 @@ public Scan build() { SparkReadOptions.END_SN

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-12 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1046541628 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -149,8 +149,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046549002 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046553676 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046555442 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046556419 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046557836 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046559017 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046560941 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046560941 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046562966 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-12 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1046560941 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] srilman commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-12 Thread GitBox
srilman commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1347623323 FYI, I did start working on an initial implementation but ran into issues with trying to use just FileIO to perform special operations needed for Hadoop tables (I believe hadoop FS l

[GitHub] [iceberg] tomtongue commented on a diff in pull request #6352: AWS: Fix inconsistent behavior of naming S3 location between read and write operations by allowing only s3 bucket name

2022-12-12 Thread GitBox
tomtongue commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1046599507 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit = sc

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046600657 ## core/src/main/java/org/apache/iceberg/SerializableTable.java: ## @@ -357,6 +357,12 @@ public Transaction newTransaction() { throw new UnsupportedOperatio

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046600948 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/reader/RowDataReaderFunction.java: ## @@ -18,53 +18,45 @@ */ package org.apache.iceberg.fli

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046601114 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/reader/ReaderFunctionTestBase.java: ## @@ -51,16 +55,28 @@ public static Object[][] parameters

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046602827 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java: ## @@ -220,7 +212,7 @@ public FlinkInputFormat buildFormat() { r

[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
chenjunjiedada commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046603540 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/RowDataRewriter.java: ## @@ -77,20 +65,12 @@ public RowDataRewriter( RowType flinkSchema

[GitHub] [iceberg] hililiwei commented on pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-12 Thread GitBox
hililiwei commented on PR #6394: URL: https://github.com/apache/iceberg/pull/6394#issuecomment-1347665691 thanks @stevenzwu for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6407: Flink: use SerializableTable for source

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6407: URL: https://github.com/apache/iceberg/pull/6407#discussion_r1046606070 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -357,13 +358,10 @@ public IcebergSource build() { if (readerFuncti

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6313: Flink: use correct metric config for position deletes

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6313: URL: https://github.com/apache/iceberg/pull/6313#discussion_r1046608479 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkAppenderFactory.java: ## @@ -216,7 +241,8 @@ public EqualityDeleteWriter newEqDeleteWriter( @

[GitHub] [iceberg] stevenzwu commented on a diff in pull request #6313: Flink: use correct metric config for position deletes

2022-12-12 Thread GitBox
stevenzwu commented on code in PR #6313: URL: https://github.com/apache/iceberg/pull/6313#discussion_r1046609770 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkAppenderFactory.java: ## @@ -99,7 +122,8 @@ private RowType lazyPosDeleteFlinkSchema() { @O

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-12 Thread GitBox
hililiwei commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1046618093 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -603,7 +605,103 @@ public void testFilterPushDown2Literal() { }

[GitHub] [iceberg] kamaljit-1991 commented on issue #6122: IcebergGenerics.read(table) doesn't work as expected

2022-12-12 Thread GitBox
kamaljit-1991 commented on issue #6122: URL: https://github.com/apache/iceberg/issues/6122#issuecomment-1347696537 Hey @RussellSpitzer checking again. We are creating the table like this : ``` catalog.createTable( TableIdentifier.of(database, table),

[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #6405: API: Add Aggregate expression evaluation

2022-12-12 Thread GitBox
amogh-jahagirdar commented on code in PR #6405: URL: https://github.com/apache/iceberg/pull/6405#discussion_r1046618283 ## api/src/main/java/org/apache/iceberg/expressions/CountStar.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

  1   2   >