[GitHub] [iceberg] tomtongue commented on a diff in pull request #6352: AWS: Fix inconsistent behavior of naming S3 location between read and write operations by allowing only s3 bucket name

2022-12-09 Thread GitBox
tomtongue commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1045018717 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit = sc

[GitHub] [iceberg] tomtongue commented on a diff in pull request #6352: AWS: Fix inconsistent behavior of naming S3 location between read and write operations by allowing only s3 bucket name

2022-12-09 Thread GitBox
tomtongue commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1045018717 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit = sc

[GitHub] [iceberg] tomtongue commented on a diff in pull request #6352: AWS: Fix inconsistent behavior of naming S3 location between read and write operations by allowing only s3 bucket name

2022-12-09 Thread GitBox
tomtongue commented on code in PR #6352: URL: https://github.com/apache/iceberg/pull/6352#discussion_r1045017558 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3URI.java: ## @@ -74,17 +74,14 @@ class S3URI { this.scheme = schemeSplit[0]; String[] authoritySplit = sc

[GitHub] [iceberg] hililiwei commented on pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-09 Thread GitBox
hililiwei commented on PR #6394: URL: https://github.com/apache/iceberg/pull/6394#issuecomment-1345175587 ``` $ git diff --no-index flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/ flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/ diff --git a/flink/v1.15/flink/src/

[GitHub] [iceberg] nastra commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction

2022-12-09 Thread GitBox
nastra commented on code in PR #6074: URL: https://github.com/apache/iceberg/pull/6074#discussion_r1044998509 ## .palantir/revapi.yml: ## @@ -43,9 +49,6 @@ acceptedBreaks: - code: "java.method.removed" old: "method org.apache.iceberg.RowDelta org.apache.iceberg.RowD

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-09 Thread GitBox
hililiwei commented on code in PR #6394: URL: https://github.com/apache/iceberg/pull/6394#discussion_r1044997749 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/source/IcebergTableSource.java: ## @@ -84,7 +84,7 @@ public IcebergTableSource( TableSchema schema,

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044996396 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on PR #6344: URL: https://github.com/apache/iceberg/pull/6344#issuecomment-1345156075 Thanks @szehon-ho @RussellSpitzer for the review. Resolve all comments. Particularly, I have removed the flag `markUpdated`. I tried to use the flag to cover more use cases like followin

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044996396 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044996306 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044996145 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044996145 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] nastra commented on pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
nastra commented on PR #6246: URL: https://github.com/apache/iceberg/pull/6246#issuecomment-1345154619 Thanks for the review @rdblue. I've addressed the comments and this should be good to go -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [iceberg] nastra commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
nastra commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044995887 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -112,6 +122,19 @@ public ThisT scanManifestsWith(ExecutorService executorService) { return se

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044995857 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044995643 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/TestChangelogIterator.java: ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [iceberg] nastra commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
nastra commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044994327 ## core/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java: ## @@ -262,5 +263,12 @@ public ScanReport lastReport() { } return (ScanReport)

[GitHub] [iceberg] nastra commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
nastra commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044994207 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -436,6 +463,21 @@ private void notifyListeners() { Object event = updateEvent(); if (

[GitHub] [iceberg] nastra commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
nastra commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044992928 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -71,9 +73,19 @@ enum TransactionType { private TableMetadata base; private TableMetadata curre

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-09 Thread GitBox
hililiwei commented on code in PR #6394: URL: https://github.com/apache/iceberg/pull/6394#discussion_r1044983293 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java: ## @@ -112,8 +111,10 @@ public Builder project(TableSchema schema) { retur

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-09 Thread GitBox
hililiwei commented on code in PR #6394: URL: https://github.com/apache/iceberg/pull/6394#discussion_r1044983293 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java: ## @@ -112,8 +111,10 @@ public Builder project(TableSchema schema) { retur

[GitHub] [iceberg] ajantha-bhat commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-09 Thread GitBox
ajantha-bhat commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1345084144 +1, I strongly agree with Ryan. I do believe Hadoop catalog has greatly helped in Iceberg adoption (to try it out quickly). But we don't highlight its problem strongl

[GitHub] [iceberg] xwmr-max closed pull request #6400: Flink: Support Look-up function

2022-12-09 Thread GitBox
xwmr-max closed pull request #6400: Flink: Support Look-up function URL: https://github.com/apache/iceberg/pull/6400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [iceberg] hililiwei commented on a diff in pull request #6394: Flink: Port Support read options in flink source to 1.14 & 1.16

2022-12-09 Thread GitBox
hililiwei commented on code in PR #6394: URL: https://github.com/apache/iceberg/pull/6394#discussion_r1044956464 ## flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSource.java: ## @@ -185,7 +186,7 @@ public Builder exposeLocality(boolean newExposeLocality) {

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044936041 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] rdblue opened a new pull request, #6399: API: Add strict metadata cleanup to SnapshotProducer

2022-12-09 Thread GitBox
rdblue opened a new pull request, #6399: URL: https://github.com/apache/iceberg/pull/6399 In most catalogs, the `CommitStateUnknownException` is used to signal to `SnapshotProducer` that it is not safe to clean up metadata files because they may have been committed. This introduces another

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044929640 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044928986 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044928986 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] rdblue commented on pull request #6254: Python: implement `to_pandas`

2022-12-09 Thread GitBox
rdblue commented on PR #6254: URL: https://github.com/apache/iceberg/pull/6254#issuecomment-1344935248 Sounds good to me. Merge away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] rdblue commented on a diff in pull request #6342: Python: Introduce SchemaVisitorPerPrimitiveType

2022-12-09 Thread GitBox
rdblue commented on code in PR #6342: URL: https://github.com/apache/iceberg/pull/6342#discussion_r1044924166 ## python/pyiceberg/schema.py: ## @@ -317,6 +331,97 @@ def primitive(self, primitive: PrimitiveType) -> T: """Visit a PrimitiveType""" +class SchemaVisitorP

[GitHub] [iceberg] rdblue commented on a diff in pull request #6342: Python: Introduce SchemaVisitorPerPrimitiveType

2022-12-09 Thread GitBox
rdblue commented on code in PR #6342: URL: https://github.com/apache/iceberg/pull/6342#discussion_r1044924033 ## python/pyiceberg/schema.py: ## @@ -317,6 +331,97 @@ def primitive(self, primitive: PrimitiveType) -> T: """Visit a PrimitiveType""" +class SchemaVisitorP

[GitHub] [iceberg] rdblue commented on a diff in pull request #6348: Python: Update license-checker

2022-12-09 Thread GitBox
rdblue commented on code in PR #6348: URL: https://github.com/apache/iceberg/pull/6348#discussion_r1044923603 ## python/dev/.rat-excludes: ## @@ -0,0 +1,2 @@ +.rat-excludes Review Comment: Why do we need to copy it? We could just commit a copy of it in place of the current

[GitHub] [iceberg] github-actions[bot] commented on issue #3639: Regenerate the metadata of an Iceberg table

2022-12-09 Thread GitBox
github-actions[bot] commented on issue #3639: URL: https://github.com/apache/iceberg/issues/3639#issuecomment-1344924921 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-09 Thread GitBox
rdblue commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1044923034 ## python/tests/io/test_fsspec.py: ## @@ -204,6 +204,191 @@ def test_writing_avro_file(generated_manifest_entry_file: Generator[str, None, N b2 = in_f.read

[GitHub] [iceberg] rdblue commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-09 Thread GitBox
rdblue commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1044922073 ## python/dev/docker-compose-azurite.yml: ## @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. S

[GitHub] [iceberg] rdblue commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-09 Thread GitBox
rdblue commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1044921819 ## python/tests/conftest.py: ## @@ -1259,3 +1276,22 @@ def fixture_glue(_aws_credentials: None) -> Generator[boto3.client, None, None]: """Mocked glue client"""

[GitHub] [iceberg] rdblue commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-09 Thread GitBox
rdblue commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1044921558 ## python/pyiceberg/io/__init__.py: ## @@ -257,6 +257,8 @@ def delete(self, location: Union[str, InputFile, OutputFile]) -> None: "gcs": [ARROW_FILE_IO], "fil

[GitHub] [iceberg] rdblue commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-09 Thread GitBox
rdblue commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1044921399 ## python/Makefile: ## @@ -26,14 +26,21 @@ lint: poetry run pre-commit run --all-files test: - poetry run coverage run --source=pyiceberg/ -m pytest tes

[GitHub] [iceberg] rdblue commented on pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
rdblue commented on PR #6246: URL: https://github.com/apache/iceberg/pull/6246#issuecomment-1344899050 I had some minor comments, but overall this looks great. I think we should decide whether to introduce `SnapshotMetrics` and `SnapshotReport` or `CommitMetrics` and `CommitReport`. I tend

[GitHub] [iceberg] rdblue commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
rdblue commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044918769 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -112,6 +122,19 @@ public ThisT scanManifestsWith(ExecutorService executorService) { return se

[GitHub] [iceberg] rdblue commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
rdblue commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044916453 ## core/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java: ## @@ -262,5 +263,12 @@ public ScanReport lastReport() { } return (ScanReport)

[GitHub] [iceberg] rdblue commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
rdblue commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044915674 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -436,6 +463,21 @@ private void notifyListeners() { Object event = updateEvent(); if (

[GitHub] [iceberg] rdblue commented on a diff in pull request #6246: Core: Create and report metrics about Snapshots

2022-12-09 Thread GitBox
rdblue commented on code in PR #6246: URL: https://github.com/apache/iceberg/pull/6246#discussion_r1044915081 ## core/src/main/java/org/apache/iceberg/BaseTransaction.java: ## @@ -71,9 +73,19 @@ enum TransactionType { private TableMetadata base; private TableMetadata curre

[GitHub] [iceberg] rdblue commented on a diff in pull request #6072: Core: Add scan report for incremental Table scans

2022-12-09 Thread GitBox
rdblue commented on code in PR #6072: URL: https://github.com/apache/iceberg/pull/6072#discussion_r1044913375 ## core/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java: ## @@ -239,12 +255,75 @@ public void scanningWithEqualityAndPositionalDeleteFiles() throws I

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044909787 ## api/src/main/java/org/apache/iceberg/ChangelogOperation.java: ## @@ -21,5 +21,7 @@ /** An enum representing possible operations in a changelog. */ public enum Chan

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044906141 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -308,6 +339,17 @@ public Scan buildChangelogScan() { return new Spar

[GitHub] [iceberg] flyrain commented on pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on PR #6350: URL: https://github.com/apache/iceberg/pull/6350#issuecomment-1344870088 Hi @szehon-ho, @RussellSpitzer, @hililiwei thanks for the review! Resolved your comments. Take another look? -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [iceberg] szehon-ho commented on pull request #6354: Spark: Check fileIO instead of reading location when determining locality enabled

2022-12-09 Thread GitBox
szehon-ho commented on PR #6354: URL: https://github.com/apache/iceberg/pull/6354#issuecomment-1344869084 Merged, thanks @amogh-jahagirdar for the change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [iceberg] szehon-ho merged pull request #6354: Spark: Check fileIO instead of reading location when determining locality enabled

2022-12-09 Thread GitBox
szehon-ho merged PR #6354: URL: https://github.com/apache/iceberg/pull/6354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044901322 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -308,6 +339,17 @@ public Scan buildChangelogScan() { return new Spar

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044900565 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r104490 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044899026 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] Fokko commented on issue #5901: pip install pyiceberg on windows require C++ to be installed

2022-12-09 Thread GitBox
Fokko commented on issue #5901: URL: https://github.com/apache/iceberg/issues/5901#issuecomment-1344862830 Awesome, thanks for letting us know! @djouallah -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] RussellSpitzer commented on pull request #3059: Returns isUnpartitioned=true for VoidTransform on all fields

2022-12-09 Thread GitBox
RussellSpitzer commented on PR #3059: URL: https://github.com/apache/iceberg/pull/3059#issuecomment-1344862752 My commit title was inverted, mia culpa. For anyone looking this up in the future I meant that "all void transforms should be false" -- This is an automated message from the Apac

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044897546 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044897410 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -285,6 +286,36 @@ public Scan buildChangelogScan() { Long startSna

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044897305 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -285,6 +286,36 @@ public Scan buildChangelogScan() { Long startSna

[GitHub] [iceberg] rdblue commented on a diff in pull request #6072: Core: Add scan report for incremental Table scans

2022-12-09 Thread GitBox
rdblue commented on code in PR #6072: URL: https://github.com/apache/iceberg/pull/6072#discussion_r1044897257 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -165,6 +170,13 @@ public CloseableIterable apply( context.residual

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044897001 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkReadOptions.java: ## @@ -32,6 +32,12 @@ private SparkReadOptions() {} // End snapshot ID used in incr

[GitHub] [iceberg] RussellSpitzer closed issue #3014: PartitionSpec isUnpartitioned returns true for tables which previously had Partitions but no longer do

2022-12-09 Thread GitBox
RussellSpitzer closed issue #3014: PartitionSpec isUnpartitioned returns true for tables which previously had Partitions but no longer do URL: https://github.com/apache/iceberg/issues/3014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [iceberg] RussellSpitzer merged pull request #3059: Returns isUnpartitioned=true for VoidTransform on all fields

2022-12-09 Thread GitBox
RussellSpitzer merged PR #3059: URL: https://github.com/apache/iceberg/pull/3059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceb

[GitHub] [iceberg] rdblue commented on a diff in pull request #6072: Core: Add scan report for incremental Table scans

2022-12-09 Thread GitBox
rdblue commented on code in PR #6072: URL: https://github.com/apache/iceberg/pull/6072#discussion_r1044896209 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -118,6 +119,10 @@ public static Set getProjectedIds(Schema schema) { return ImmutableSet.copyOf(g

[GitHub] [iceberg] rdblue commented on a diff in pull request #6072: Core: Add scan report for incremental Table scans

2022-12-09 Thread GitBox
rdblue commented on code in PR #6072: URL: https://github.com/apache/iceberg/pull/6072#discussion_r1044896209 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -118,6 +119,10 @@ public static Set getProjectedIds(Schema schema) { return ImmutableSet.copyOf(g

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044896124 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -150,7 +150,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg-docs] Fokko commented on a diff in pull request #185: First version of the changelog

2022-12-09 Thread GitBox
Fokko commented on code in PR #185: URL: https://github.com/apache/iceberg-docs/pull/185#discussion_r1044893793 ## landing-page/content/common/release-notes.md: ## @@ -70,8 +70,25 @@ To add a dependency on Iceberg in Maven, add the following to your `pom.xml`: ## 1.1.0 relea

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044893660 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg-docs] Fokko commented on a diff in pull request #185: First version of the changelog

2022-12-09 Thread GitBox
Fokko commented on code in PR #185: URL: https://github.com/apache/iceberg-docs/pull/185#discussion_r1044893253 ## landing-page/content/common/release-notes.md: ## @@ -70,8 +70,25 @@ To add a dependency on Iceberg in Maven, add the following to your `pom.xml`: ## 1.1.0 relea

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044893113 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044889725 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044848715 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044855421 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044799711 ## api/src/main/java/org/apache/iceberg/ChangelogOperation.java: ## @@ -21,5 +21,7 @@ /** An enum representing possible operations in a changelog. */ public enum Ch

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-09 Thread GitBox
szehon-ho commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1044799711 ## api/src/main/java/org/apache/iceberg/ChangelogOperation.java: ## @@ -21,5 +21,7 @@ /** An enum representing possible operations in a changelog. */ public enum Ch

[GitHub] [iceberg] Fokko opened a new pull request, #6398: Python: Integration tests

2022-12-09 Thread GitBox
Fokko opened a new pull request, #6398: URL: https://github.com/apache/iceberg/pull/6398 This is the first version of a framework to read Iceberg tables, produced by Spark, using PyIceberg. This makes it easier to run end-to-end tests and also validate the behavior of PyArrow and DuckDB.

[GitHub] [iceberg] flyrain commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
flyrain commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044874538 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044857145 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -308,6 +339,17 @@ public Scan buildChangelogScan() { return n

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044841503 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044836296 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044826932 ## spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java: ## @@ -137,6 +138,64 @@ public void testOverwrites() {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6350: Query changelog table with a timestamp range

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #6350: URL: https://github.com/apache/iceberg/pull/6350#discussion_r1044825312 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -150,7 +150,7 @@ public static Iterable ancestorsOf(long snapshotId, Function

[GitHub] [iceberg] RussellSpitzer commented on pull request #3059: Returns isUnpartitioned=true for VoidTransform on all fields

2022-12-09 Thread GitBox
RussellSpitzer commented on PR #3059: URL: https://github.com/apache/iceberg/pull/3059#issuecomment-1344753963 @xinbinhuang Looks good to me, once tests pass I think we are good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #3059: Returns isUnpartitioned=true for VoidTransform on all fields

2022-12-09 Thread GitBox
RussellSpitzer commented on code in PR #3059: URL: https://github.com/apache/iceberg/pull/3059#discussion_r1044822325 ## core/src/test/java/org/apache/iceberg/TestPartitionSpecInfo.java: ## @@ -65,11 +65,22 @@ public void cleanupTables() { TestTables.clearTables(); } +

[GitHub] [iceberg] xinbinhuang commented on pull request #3059: Returns isUnpartitioned=true for VoidTransform on all fields

2022-12-09 Thread GitBox
xinbinhuang commented on PR #3059: URL: https://github.com/apache/iceberg/pull/3059#issuecomment-1344748524 (@RussellSpitzer sorry didn't see the last message from you) @RussellSpitzer @rdblue Just rebased. PTAL -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044792391 ## core/src/main/java/org/apache/iceberg/encryption/Ciphers.java: ## @@ -96,33 +110,45 @@ public AesGcmDecryptor(byte[] keyBytes) { } public byte[] decrypt(b

[GitHub] [iceberg] Fokko commented on a diff in pull request #6348: Python: Update license-checker

2022-12-09 Thread GitBox
Fokko commented on code in PR #6348: URL: https://github.com/apache/iceberg/pull/6348#discussion_r1044764109 ## python/dev/.rat-excludes: ## @@ -0,0 +1,2 @@ +.rat-excludes Review Comment: This will make the building more complicated. Before doing a `poetry build`, we need t

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044743460 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044728554 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044726532 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044725178 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044724923 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044723489 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044720784 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044708947 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044708947 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rdblue commented on a diff in pull request #3231: GCM encryption stream

2022-12-09 Thread GitBox
rdblue commented on code in PR #3231: URL: https://github.com/apache/iceberg/pull/3231#discussion_r1044706354 ## core/src/main/java/org/apache/iceberg/encryption/AesGcmInputStream.java: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

[GitHub] [iceberg] rubenvdg closed issue #6383: Docs: Improve "Getting started" by mentioning required pip version

2022-12-09 Thread GitBox
rubenvdg closed issue #6383: Docs: Improve "Getting started" by mentioning required pip version URL: https://github.com/apache/iceberg/issues/6383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] rdblue commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-09 Thread GitBox
rdblue commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1344611016 For what it's worth, I think that **the biggest mistake I made with Iceberg was introducing the "Hadoop" tables** that rely on atomic rename. These tables have a lot of problems

[GitHub] [iceberg] RussellSpitzer opened a new issue, #6397: Python Instructions currently do not work for testing

2022-12-09 Thread GitBox
RussellSpitzer opened a new issue, #6397: URL: https://github.com/apache/iceberg/issues/6397 ### Apache Iceberg version main (development) ### Query engine Other ### Please describe the bug 🐞 The instructions listed in the README.md under testing ```b

  1   2   >