[GitHub] [iceberg] deadwind4 opened a new issue, #6429: [Feature Proposal] Log Store in Iceberg

2022-12-15 Thread GitBox
deadwind4 opened a new issue, #6429: URL: https://github.com/apache/iceberg/issues/6429 ### Feature Request / Improvement This proposal aims to improve Iceberg's capability in real-time via importing a log store system. Streaming read data that are cached in a log store(Kafka).

[GitHub] [iceberg] Fokko opened a new issue, #6430: Python: Support for static table

2022-12-15 Thread GitBox
Fokko opened a new issue, #6430: URL: https://github.com/apache/iceberg/issues/6430 ### Feature Request / Improvement In Java we have the StaticTableOperations: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/StaticTableOperations.java That allows

[GitHub] [iceberg] Fokko commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-15 Thread GitBox
Fokko commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1352731893 Hey everyone, I think we should split out the idea of implementing the full hadoop catalog, and just being able to read a table from a metadata URL. For the latter, I've created a new

[GitHub] [iceberg] Fokko commented on issue #6397: Python Instructions currently do not work for testing

2022-12-15 Thread GitBox
Fokko commented on issue #6397: URL: https://github.com/apache/iceberg/issues/6397#issuecomment-1352734967 @rubenvdg What do you think of removing the dataclass from the `AvroStruct`. We should be able to create a Struct without including it in the PyIceberg class hierarchy. The `AvroStruct

[GitHub] [iceberg] zhongyujiang opened a new pull request, #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-15 Thread GitBox
zhongyujiang opened a new pull request, #6431: URL: https://github.com/apache/iceberg/pull/6431 This PR fixs ParquetDictionaryRowGroupFilter evaluating `notNaN`. Because Parquet dictionaries cannot contain null values, ParquetDictionaryRowGroupFilter should check if there is null values

[GitHub] [iceberg] zhongyujiang commented on pull request #6431: Parquet: Fix ParquetDictionaryRowGroupFilter evaluating NaN.

2022-12-15 Thread GitBox
zhongyujiang commented on PR #6431: URL: https://github.com/apache/iceberg/pull/6431#issuecomment-1352852748 @yyanyy @rdblue Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] rbalamohan opened a new pull request, #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

2022-12-15 Thread GitBox
rbalamohan opened a new pull request, #6432: URL: https://github.com/apache/iceberg/pull/6432 Issue: https://github.com/apache/iceberg/issues/6387 When tables are updated in "merge-on-read" mode, it creates positional delete files. Performance of reads degrades quite a bit, even with

[GitHub] [iceberg] jaceklaskowski opened a new pull request, #6433: Docs: README

2022-12-15 Thread GitBox
jaceklaskowski opened a new pull request, #6433: URL: https://github.com/apache/iceberg/pull/6433 Found some very minor "issues" while reading README.md and couldn't resist fixing them all up. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [iceberg] cccs-eric commented on a diff in pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
cccs-eric commented on code in PR #6392: URL: https://github.com/apache/iceberg/pull/6392#discussion_r1049565481 ## python/Makefile: ## @@ -26,14 +26,21 @@ lint: poetry run pre-commit run --all-files test: - poetry run coverage run --source=pyiceberg/ -m pytest

[GitHub] [iceberg] grbinho commented on pull request #6223: AWS: Use provided glue catalog id in defaultWarehouseLocation

2022-12-15 Thread GitBox
grbinho commented on PR #6223: URL: https://github.com/apache/iceberg/pull/6223#issuecomment-1352975547 @JonasJ-ap @ajantha-bhat Hi guys, can you advise how we can move this MR forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [iceberg] djouallah commented on issue #3220: [Python] support iceberg hadoop catalog in python library

2022-12-15 Thread GitBox
djouallah commented on issue #3220: URL: https://github.com/apache/iceberg/issues/3220#issuecomment-1353002821 for people that uses only Python ( no spark, nor glue and all this big engine stuff), is there any simple to use catalog, I read about this REST thing, but it seems it is only a sp

[GitHub] [iceberg] djouallah commented on issue #6430: Python: Support for static table

2022-12-15 Thread GitBox
djouallah commented on issue #6430: URL: https://github.com/apache/iceberg/issues/6430#issuecomment-1353005103 FWIW, BigQuery has a *read only* implementation that uses the metadata location. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [iceberg] Fokko commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
Fokko commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1353167480 @cccs-eric it looks like we have to add `pyparsing` to the `pyproject.toml`. I don't know why it was working before, but it should have been added in https://github.com/apache/iceberg/pull/

[GitHub] [iceberg] joshuarobinson opened a new issue, #6434: PyIceberg support for UUID types

2022-12-15 Thread GitBox
joshuarobinson opened a new issue, #6434: URL: https://github.com/apache/iceberg/issues/6434 ### Feature Request / Improvement Currently, pyiceberg 0.2.0 fails on creating a table scan for any table (that I have at least) with UUID columns. The root problem seems to be (thanks

[GitHub] [iceberg] Fokko commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
Fokko commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1353168942 @cccs-eric I forgot one thing, could you also add the `adlfs` option to the docs in `python/mkdocs/`? -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [iceberg] joshuarobinson opened a new issue, #6435: PyIceberg: Avro decode EOF error

2022-12-15 Thread GitBox
joshuarobinson opened a new issue, #6435: URL: https://github.com/apache/iceberg/issues/6435 ### Feature Request / Improvement In reading manifests for a table for a table scan in PyIceberg 0.2.0, I get an EOFError. Table was originally written in June 2022 with the most recent

[GitHub] [iceberg] joshuarobinson commented on issue #6435: PyIceberg: Avro decode EOF error

2022-12-15 Thread GitBox
joshuarobinson commented on issue #6435: URL: https://github.com/apache/iceberg/issues/6435#issuecomment-1353182160 The table in question has only one snapshot. I'm attaching the json and avro metadata files for this table. [iceberg_6435_metadata.zip](https://github.com/apache/iceberg/fi

[GitHub] [iceberg] Fokko commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
Fokko commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1353199075 @cccs-eric Yes, please do! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [iceberg] nastra closed pull request #6436: Core: Add flag to control sending metric reports via REST

2022-12-15 Thread GitBox
nastra closed pull request #6436: Core: Add flag to control sending metric reports via REST URL: https://github.com/apache/iceberg/pull/6436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [iceberg] stevenzwu merged pull request #6412: Doc: Modify some options refer to Read-options in flink streaming rea…

2022-12-15 Thread GitBox
stevenzwu merged PR #6412: URL: https://github.com/apache/iceberg/pull/6412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[GitHub] [iceberg] stevenzwu commented on pull request #6412: Doc: Modify some options refer to Read-options in flink streaming rea…

2022-12-15 Thread GitBox
stevenzwu commented on PR #6412: URL: https://github.com/apache/iceberg/pull/6412#issuecomment-1353410391 thx @xwmr-max for the update -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [iceberg] nastra commented on a diff in pull request #6399: API: Add strict metadata cleanup to SnapshotProducer

2022-12-15 Thread GitBox
nastra commented on code in PR #6399: URL: https://github.com/apache/iceberg/pull/6399#discussion_r1049921925 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -396,7 +399,9 @@ public void commit() { } catch (CommitStateUnknownException commitStateUnknow

[GitHub] [iceberg] Fokko opened a new pull request, #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
Fokko opened a new pull request, #6437: URL: https://github.com/apache/iceberg/pull/6437 instead of name -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[GitHub] [iceberg] nastra commented on a diff in pull request #6399: API: Add strict metadata cleanup to SnapshotProducer

2022-12-15 Thread GitBox
nastra commented on code in PR #6399: URL: https://github.com/apache/iceberg/pull/6399#discussion_r1049921925 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -396,7 +399,9 @@ public void commit() { } catch (CommitStateUnknownException commitStateUnknow

[GitHub] [iceberg] szehon-ho commented on pull request #6419: Doc:Example of correcting the document add/drop partition truncate

2022-12-15 Thread GitBox
szehon-ho commented on PR #6419: URL: https://github.com/apache/iceberg/pull/6419#issuecomment-1353527618 I may be missing something but don't both truncate(data, 4) and truncate(4,data) do the same thing ? -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [iceberg] flyrain commented on pull request #6427: Spark 3.2: Time range query of changelog tables

2022-12-15 Thread GitBox
flyrain commented on PR #6427: URL: https://github.com/apache/iceberg/pull/6427#issuecomment-1353556854 Thanks @szehon-ho. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [iceberg] flyrain merged pull request #6427: Spark 3.2: Time range query of changelog tables

2022-12-15 Thread GitBox
flyrain merged PR #6427: URL: https://github.com/apache/iceberg/pull/6427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apa

[GitHub] [iceberg] e-gat commented on issue #6421: Running rewriteDataFiles on multiple executors in Spark

2022-12-15 Thread GitBox
e-gat commented on issue #6421: URL: https://github.com/apache/iceberg/issues/6421#issuecomment-1353571205 After investigation we found that the latest iceberg versions support running the rewriteDataFiles across multiple executors in spark. -- This is an automated message from the Apach

[GitHub] [iceberg] e-gat closed issue #6421: Running rewriteDataFiles on multiple executors in Spark

2022-12-15 Thread GitBox
e-gat closed issue #6421: Running rewriteDataFiles on multiple executors in Spark URL: https://github.com/apache/iceberg/issues/6421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1050040299 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1050041610 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] bondarenko commented on issue #6257: Partitions metadata table shows old partitions

2022-12-15 Thread GitBox
bondarenko commented on issue #6257: URL: https://github.com/apache/iceberg/issues/6257#issuecomment-1353617197 Looks like without `USING iceberg` you don't create iceberg table and so it doesn't have to have even update support not speaking about partitions table -- This is an automated

[GitHub] [iceberg] Fokko opened a new pull request, #6438: Python: Reduce the use of mock objects

2022-12-15 Thread GitBox
Fokko opened a new pull request, #6438: URL: https://github.com/apache/iceberg/pull/6438 We use mocks extensively in our Python code, this was before we had certain functionality available, such as a working FileIO. Instead of using the mocks, we can also use the actual code. -- Th

[GitHub] [iceberg] Fokko opened a new pull request, #6439: Python: Add pyparsing

2022-12-15 Thread GitBox
Fokko opened a new pull request, #6439: URL: https://github.com/apache/iceberg/pull/6439 This one was missing and was being pulled in transitively I presume -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [iceberg] islamismailov commented on pull request #6353: Make sure S3 stream opened by ReadConf ctor is closed

2022-12-15 Thread GitBox
islamismailov commented on PR #6353: URL: https://github.com/apache/iceberg/pull/6353#issuecomment-1353743389 i will try to update this PR with the feedback provided -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] cccs-eric commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
cccs-eric commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1353755124 > @cccs-eric it looks like we have to add `pyparsing` to the `pyproject.toml`. I don't know why it was working before, but it should have been added in #6259 @Fokko I don't under

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-15 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1050207289 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-15 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1050213598 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

[GitHub] [iceberg] rdblue commented on pull request #6425: Core: Unify fromJson(String) parsing

2022-12-15 Thread GitBox
rdblue commented on PR #6425: URL: https://github.com/apache/iceberg/pull/6425#issuecomment-1353818545 Thanks, @nastra! Good to have these cleaned up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [iceberg] rdblue merged pull request #6425: Core: Unify fromJson(String) parsing

2022-12-15 Thread GitBox
rdblue merged PR #6425: URL: https://github.com/apache/iceberg/pull/6425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-15 Thread GitBox
szehon-ho commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1050217764 ## core/src/main/java/org/apache/iceberg/AbstractTableScan.java: ## @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6365: Core: Add position deletes metadata table

2022-12-15 Thread GitBox
RussellSpitzer commented on code in PR #6365: URL: https://github.com/apache/iceberg/pull/6365#discussion_r1050219983 ## core/src/main/java/org/apache/iceberg/BaseMetadataTable.java: ## @@ -64,9 +64,12 @@ protected BaseMetadataTable(TableOperations ops, Table table, String name

[GitHub] [iceberg] rdblue commented on issue #6415: Vectorized Read Issue

2022-12-15 Thread GitBox
rdblue commented on issue #6415: URL: https://github.com/apache/iceberg/issues/6415#issuecomment-1353829220 I agree with the analysis that the problem is that we are returning dictionary-encoded Arrow vectors. Maybe we're not doing that the right way. I'll take a look at #3024. -- This i

[GitHub] [iceberg] sfc-gh-mparmar commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-15 Thread GitBox
sfc-gh-mparmar commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1050236173 ## snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] github-actions[bot] closed issue #4931: Rewrite Data Files with Manual Sort Order should also use Table Partitioning in Sort Order

2022-12-15 Thread GitBox
github-actions[bot] closed issue #4931: Rewrite Data Files with Manual Sort Order should also use Table Partitioning in Sort Order URL: https://github.com/apache/iceberg/issues/4931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [iceberg] github-actions[bot] commented on issue #4931: Rewrite Data Files with Manual Sort Order should also use Table Partitioning in Sort Order

2022-12-15 Thread GitBox
github-actions[bot] commented on issue #4931: URL: https://github.com/apache/iceberg/issues/4931#issuecomment-1353917180 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-15 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1050257006 ## snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] dennishuo commented on a diff in pull request #6428: Add new SnowflakeCatalog implementation to enable directly using Snowflake-managed Iceberg tables

2022-12-15 Thread GitBox
dennishuo commented on code in PR #6428: URL: https://github.com/apache/iceberg/pull/6428#discussion_r1050257006 ## snowflake/src/test/java/org/apache/iceberg/snowflake/SnowflakeCatalogTest.java: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [iceberg] jackye1995 merged pull request #6152: Docs: Update table snapshot retention property descriptions

2022-12-15 Thread GitBox
jackye1995 merged PR #6152: URL: https://github.com/apache/iceberg/pull/6152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

[GitHub] [iceberg] jackye1995 commented on pull request #6152: Docs: Update table snapshot retention property descriptions

2022-12-15 Thread GitBox
jackye1995 commented on PR #6152: URL: https://github.com/apache/iceberg/pull/6152#issuecomment-1353975897 Thanks for the fix! @amogh-jahagirdar , and thanks for the review @singhpk234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1050273317 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/TestChangelogIterator.java: ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1050273608 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on a diff in pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on code in PR #6344: URL: https://github.com/apache/iceberg/pull/6344#discussion_r1050273739 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/ChangelogIterator.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [iceberg] flyrain commented on pull request #6344: Spark 3.3: Introduce the changelog iterator

2022-12-15 Thread GitBox
flyrain commented on PR #6344: URL: https://github.com/apache/iceberg/pull/6344#issuecomment-1354023746 Thanks @szehon-ho and @RussellSpitzer for the review. Resolved all comments. Ready for the another look. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [iceberg] rdblue commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1050275269 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/FlinkFilters.java: ## @@ -246,18 +248,70 @@ private static Optional convertFieldAndLiteral( org.apache.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1050277049 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/FlinkFilters.java: ## @@ -246,18 +248,70 @@ private static Optional convertFieldAndLiteral( org.apache.

[GitHub] [iceberg] rdblue commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1050277901 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -603,7 +603,108 @@ public void testFilterPushDown2Literal() { }

[GitHub] [iceberg] rdblue commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1050278409 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -603,7 +603,108 @@ public void testFilterPushDown2Literal() { }

[GitHub] [iceberg] rdblue commented on a diff in pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on code in PR #6402: URL: https://github.com/apache/iceberg/pull/6402#discussion_r1050278958 ## flink/v1.16/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkTableSource.java: ## @@ -603,7 +605,103 @@ public void testFilterPushDown2Literal() { }

[GitHub] [iceberg] rdblue commented on pull request #6402: Flink: Add UT for NaN

2022-12-15 Thread GitBox
rdblue commented on PR #6402: URL: https://github.com/apache/iceberg/pull/6402#issuecomment-1354038557 @hililiwei, I flagged the test cases in my review, but I now see that @stevenzwu did as well. The problem is that NaN comparison should always result in `false`. That's why Iceberg

[GitHub] [iceberg] rdblue merged pull request #6439: Python: Add pyparsing

2022-12-15 Thread GitBox
rdblue merged PR #6439: URL: https://github.com/apache/iceberg/pull/6439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6439: Python: Add pyparsing

2022-12-15 Thread GitBox
rdblue commented on PR #6439: URL: https://github.com/apache/iceberg/pull/6439#issuecomment-1354039711 I think this was my mistake. I thought it was part of the standard library since I didn't need to install it. -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [iceberg] rdblue commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
rdblue commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1354040055 @cccs-eric, I just merged $6439 that fixes the pyparsing problem (my fault) so you should be able to rebase and get tests working. Sorry about that! -- This is an automated message from

[GitHub] [iceberg] rdblue commented on a diff in pull request #6074: API,Core: SnapshotManager to be created through Transaction

2022-12-15 Thread GitBox
rdblue commented on code in PR #6074: URL: https://github.com/apache/iceberg/pull/6074#discussion_r1050282810 ## core/src/main/java/org/apache/iceberg/SnapshotManager.java: ## @@ -30,6 +31,17 @@ public class SnapshotManager implements ManageSnapshots { ops.current() !=

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050284139 ## python/pyiceberg/expressions/visitors.py: ## @@ -753,3 +757,68 @@ def inclusive_projection( schema: Schema, spec: PartitionSpec, case_sensitive: bool = True ) -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050284725 ## python/pyiceberg/expressions/visitors.py: ## @@ -753,3 +757,68 @@ def inclusive_projection( schema: Schema, spec: PartitionSpec, case_sensitive: bool = True ) -

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050285350 ## python/pyiceberg/expressions/visitors.py: ## @@ -753,3 +757,68 @@ def inclusive_projection( schema: Schema, spec: PartitionSpec, case_sensitive: bool = True ) -

[GitHub] [iceberg] yegangy0718 commented on a diff in pull request #6382: Implement ShuffleOperator to collect data statistics

2022-12-15 Thread GitBox
yegangy0718 commented on code in PR #6382: URL: https://github.com/apache/iceberg/pull/6382#discussion_r1050285783 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ShuffleOperator.java: ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050286173 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +457,103 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050287321 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +457,103 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050287724 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +457,103 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] rdblue commented on a diff in pull request #6437: Python: Projection by Field ID

2022-12-15 Thread GitBox
rdblue commented on code in PR #6437: URL: https://github.com/apache/iceberg/pull/6437#discussion_r1050288238 ## python/pyiceberg/io/pyarrow.py: ## @@ -437,3 +457,103 @@ def visit_or(self, left_result: pc.Expression, right_result: pc.Expression) -> p def expression_to_pyarro

[GitHub] [iceberg] cccs-eric commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
cccs-eric commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1354058385 > @cccs-eric, I just merged $6439 that fixes the pyparsing problem (my fault) so you should be able to rebase and get tests working. Sorry about that! Thanks @rdblue , build is no

[GitHub] [iceberg] cccs-eric commented on pull request #6392: Python: Add adlfs support (Azure DataLake FileSystem)

2022-12-15 Thread GitBox
cccs-eric commented on PR #6392: URL: https://github.com/apache/iceberg/pull/6392#issuecomment-1354060350 @Fokko One last thing that I haven't done is modifying verify-release.md for the integration tests. Should I add `test-adlfs` in there? https://github.com/apache/iceberg/blob/master

[GitHub] [iceberg] xwmr-max opened a new pull request, #6440: Flink: Support Look-up Function

2022-12-15 Thread GitBox
xwmr-max opened a new pull request, #6440: URL: https://github.com/apache/iceberg/pull/6440 Currently, ice does not support look-up join. This PR provides the look-up join function to meet the requirements of basic join scenarios. -- This is an automated message from the Apache Git Servic