[GitHub] [iceberg] wanlce commented on issue #8510: Iceberg does not trigger actually rewrite_data_files in certain situations

2023-09-07 Thread via GitHub


wanlce commented on issue #8510:
URL: https://github.com/apache/iceberg/issues/8510#issuecomment-1709634179

   Sincerely hope to receive answers from the teachers  @nastra @wypoon
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko opened a new issue, #8518: Python: Write Metadata json

2023-09-07 Thread via GitHub


Fokko opened a new issue, #8518:
URL: https://github.com/apache/iceberg/issues/8518

   ### Feature Request / Improvement
   
   With the REST catalog setting properties, and updating the schema is 
working. This is easy since the REST catalog is in charge of updating the 
metadata, and writing a new version of it. For the other catalogs, we need to 
write the metadata using PyIceberg and update the table property, pointing to 
the latest version of the metadata.
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on issue #8514: Python: implement GlueCatalog._commit_table

2023-09-07 Thread via GitHub


Fokko commented on issue #8514:
URL: https://github.com/apache/iceberg/issues/8514#issuecomment-1709791580

   @antonysouthworth-halter This is not implemented today but is high on the 
list. Also, other users are looking for snapshot expiration, which is a very 
similar operation. I've created 
[this](https://github.com/apache/iceberg/issues/8518) ticket which has to 
happen first, then updating the `SortOrder` should be straightforward. I'll 
keep this ticket open for visibility. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on pull request #7105: Spec: Add partition stats spec

2023-09-07 Thread via GitHub


szehon-ho commented on PR #7105:
URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1709898544

   @ajantha-bhat i think that makes sense to me, discussion is here for 
reference:  https://github.com/apache/iceberg/pull/7105#discussion_r1318005645  
Yea let's see if there is consensus.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on pull request #8299: Core: use ZSTD compression parquet by default for new tables

2023-09-07 Thread via GitHub


szehon-ho commented on PR #8299:
URL: https://github.com/apache/iceberg/pull/8299#issuecomment-1709907691

   Rebased on #3438 , TestCompression tests now pass, fyi @aokolnychyi when you 
are back


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on a diff in pull request #8517: To fix the issue https://github.com/apache/iceberg/issues/8516

2023-09-07 Thread via GitHub


szehon-ho commented on code in PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#discussion_r1318413830


##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   Looking at code, it looks a bit more nuanced: none for unpartitioned table, 
hash for partitioned table, range for sorted table.
   
   Ref: 
https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java#L263



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] szehon-ho commented on pull request #8517: To fix the issue https://github.com/apache/iceberg/issues/8516

2023-09-07 Thread via GitHub


szehon-ho commented on PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#issuecomment-1709914728

   Thanks for pr, also please make the PR description clear what it does (we 
can put the link to the issue in the PR description if you want)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on issue #8499: Python: Add support for Python 3.12

2023-09-07 Thread via GitHub


Fokko commented on issue #8499:
URL: https://github.com/apache/iceberg/issues/8499#issuecomment-1709922231

   Thanks @steinsgateted that would be awesome


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] frankliee commented on issue #8487: Python: Cannot resolve avro file

2023-09-07 Thread via GitHub


frankliee commented on issue #8487:
URL: https://github.com/apache/iceberg/issues/8487#issuecomment-1709973457

@Fokko Thanks for your reminder, I have found the reason is that the wrong 
iceberg version is used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] frankliee commented on issue #8487: Python: Cannot resolve avro file

2023-09-07 Thread via GitHub


frankliee commented on issue #8487:
URL: https://github.com/apache/iceberg/issues/8487#issuecomment-1709976000

   I could query the iceberg table by the latest version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] frankliee commented on issue #8482: Python: Cannot install from source code.

2023-09-07 Thread via GitHub


frankliee commented on issue #8482:
URL: https://github.com/apache/iceberg/issues/8482#issuecomment-1709979652

   @Fokko Thanks for you reminder, the root cause is pip version, and pip 23.0 
could compile pyiceberg correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] atifiu commented on a diff in pull request #8517: To fix the issue https://github.com/apache/iceberg/issues/8516

2023-09-07 Thread via GitHub


atifiu commented on code in PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#discussion_r1318465020


##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   Shall we document this. 
   
   > Defines distribution of write data: none: don’t shuffle rows; hash: hash 
distribute by partition key ; range: range distribute by partition key or sort 
key if table has an SortOrder. Default value **none** for unpartitioned table, 
**hash** for partitioned table and **range** for sorted table.



##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   Shall we document this. 
   
   > Defines distribution of write data: none: don’t shuffle rows; hash: hash 
distribute by partition key ; range: range distribute by partition key or sort 
key if table has an SortOrder. Default value **none** for unpartitioned table, 
**hash** for partitioned table and **range** for sorted table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] atifiu commented on a diff in pull request #8517: To fix the table configuration documentation for write.distrubition-mode

2023-09-07 Thread via GitHub


atifiu commented on code in PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#discussion_r1318465020


##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   Shall we document this in the description and leave the default value as 
none.
   
   > Defines distribution of write data: none: don’t shuffle rows; hash: hash 
distribute by partition key ; range: range distribute by partition key or sort 
key if table has an SortOrder. Default value **none** for unpartitioned table, 
**hash** for partitioned table and **range** for sorted table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on issue #8487: Python: Cannot resolve avro file

2023-09-07 Thread via GitHub


Fokko commented on issue #8487:
URL: https://github.com/apache/iceberg/issues/8487#issuecomment-1710007874

   Thank you for letting me know. Also pyiceberg==0.5.0rc1 is available on PyPi 
for testing :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko closed issue #8487: Python: Cannot resolve avro file

2023-09-07 Thread via GitHub


Fokko closed issue #8487: Python: Cannot resolve avro file
URL: https://github.com/apache/iceberg/issues/8487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko commented on issue #8482: Python: Cannot install from source code.

2023-09-07 Thread via GitHub


Fokko commented on issue #8482:
URL: https://github.com/apache/iceberg/issues/8482#issuecomment-1710014853

   @frankliee Thanks for letting me know. I've added a note on the website to 
update `pip`: https://github.com/apache/iceberg/pull/8484 Let's close this one 
as well, thanks for reporting this, since it will improve PyIceberg and the 
documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko closed issue #8482: Python: Cannot install from source code.

2023-09-07 Thread via GitHub


Fokko closed issue #8482: Python: Cannot install from source code.
URL: https://github.com/apache/iceberg/issues/8482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] sandflee opened a new pull request, #8519: add the process of snapshots queryParam for RESTCatalogAdapter

2023-09-07 Thread via GitHub


sandflee opened a new pull request, #8519:
URL: https://github.com/apache/iceberg/pull/8519

   #6850 add lazy snapshot loading for RESTCatalog, but RESTCatalogAdapter 
doesn't process the 'snapshots' queryParam.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] juanrondineau commented on issue #8333: Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.la

2023-09-07 Thread via GitHub


juanrondineau commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1710250429

   Hi, we are running spark jobs from DBT fwk, and we have the same issue when 
we run a model, with merge strategie, that has a table with struct data_type 
columns in its schema as source. Internally, for models with merge strategie, 
dbt creates a temporary view with the data that has to merge in the destination 
table. We recently discover that if insteed of a temprary view, we materialize 
the new data as a table and the we run the merge we save the CastException. 
That seems to be a workaround and not the final solution to this issue. We are 
actually running spark 3.3.2, iceberg 1.3.0 and hive 2.3.9 as metastore.
   Hope this help to solve the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #8517: To fix the table configuration documentation for write.distrubition-mode

2023-09-07 Thread via GitHub


RussellSpitzer commented on code in PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#discussion_r1318702274


##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   Let's say the default is hash ... or "depends"?  I still think none is wrong 
since almost no one is using unpartitioned tables



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] atifiu commented on a diff in pull request #8517: To fix the table configuration documentation for write.distrubition-mode

2023-09-07 Thread via GitHub


atifiu commented on code in PR #8517:
URL: https://github.com/apache/iceberg/pull/8517#discussion_r1318706970


##
docs/configuration.md:
##
@@ -74,7 +74,7 @@ Iceberg tables support table properties to configure table 
behavior, like the de
 | write.metadata.metrics.column.col1   | (not set) 
  | Metrics mode for column 'col1' to allow per-column tuning; none, 
counts, truncate(length), or full   
 |
 | write.target-file-size-bytes | 536870912 (512 MB)
  | Controls the size of files generated to target about this many bytes

  |
 | write.delete.target-file-size-bytes  | 67108864 (64 MB)  
  | Controls the size of delete files generated to target about this many 
bytes   
|
-| write.distribution-mode  | none  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |
+| write.distribution-mode  | hash  
  | Defines distribution of write data: __none__: don't shuffle rows; 
__hash__: hash distribute by partition key ; __range__: range distribute by 
partition key or sort key if table has an SortOrder |

Review Comment:
   This is not a normal case because default itself is depended on other 
factors. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ext-diego-duenas commented on issue #8333: Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java

2023-09-07 Thread via GitHub


ext-diego-duenas commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1710293072

   Just to add to the Thread, we made a test changing all data types to String 
in the table with structs data type, and the error persist, so we don't think 
the error is related with the schema.  Is applying a merge into in an iceberg 
table,  with a source table with struct data types,  regardless the underlying 
schema. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #8520: Core: Clear deleted files set between commit attempts for simple transactions

2023-09-07 Thread via GitHub


amogh-jahagirdar opened a new pull request, #8520:
URL: https://github.com/apache/iceberg/pull/8520

   For transactions, a `deletedFiles` set is maintained which keeps track of 
the files to delete as part of the most recent commit. For simple transactions, 
this set is not being cleared between attempts to commit the transaction. It 
could be possible that on the first transaction attempt, an operation marks 
certain files for deletion. When the transaction fails and is retried those 
files may actually be used, and when the retry succeeds those files are cleaned 
up when they should not be. 
   
   The set should preserve the invariant of representing the set of files to 
clear after a transaction is committed, thus this PR clears the set between 
transaction commit attempts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on pull request #8520: Core: Clear deleted files set between commit attempts for simple transactions

2023-09-07 Thread via GitHub


amogh-jahagirdar commented on PR #8520:
URL: https://github.com/apache/iceberg/pull/8520#issuecomment-1710405905

   Still need to add test cases


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-go] rdblue commented on pull request #3: feat(manifests): Adding implementation of manifest files

2023-09-07 Thread via GitHub


rdblue commented on PR #3:
URL: https://github.com/apache/iceberg-go/pull/3#issuecomment-1710410048

   @zeroshade, sorry for the delay! I'll find some time for reviews over here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] Xuanwo commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


Xuanwo commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-1710446025

   I have a new idea for naming: How about using `Tabular`?
   
   So we will have:
   
   - `load_tabular()` / `load_table()` / `load_view()`
   - `Tabular::Table` / `Tabular::View`
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


rdblue commented on code in PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#discussion_r1318868914


##
core/src/main/java/org/apache/iceberg/BaseTransaction.java:
##
@@ -380,14 +375,9 @@ private void commitReplaceTransaction(boolean orCreate) {
 
 } catch (RuntimeException e) {
   // the commit failed and no files were committed. clean up each update.
-  Tasks.foreach(updates)
-  .suppressFailureWhenFinished()
-  .run(
-  update -> {
-if (update instanceof SnapshotProducer) {
-  ((SnapshotProducer) update).cleanAll();
-}
-  });
+  if (!ops.requireStrictCleanup() || e instanceof CleanableFailure) {
+cleanUpOnCommitFailure();

Review Comment:
   Shouldn't this be `cleanAllUpdates`? Both `cleanUpOnCommitFailure` and the 
`finally` block are going to run the delete function on every file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on a diff in pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


rdblue commented on code in PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#discussion_r1318870175


##
core/src/main/java/org/apache/iceberg/BaseTransaction.java:
##
@@ -380,14 +375,9 @@ private void commitReplaceTransaction(boolean orCreate) {
 
 } catch (RuntimeException e) {
   // the commit failed and no files were committed. clean up each update.
-  Tasks.foreach(updates)
-  .suppressFailureWhenFinished()
-  .run(
-  update -> {
-if (update instanceof SnapshotProducer) {
-  ((SnapshotProducer) update).cleanAll();
-}
-  });
+  if (!ops.requireStrictCleanup() || e instanceof CleanableFailure) {
+cleanUpOnCommitFailure();

Review Comment:
   Otherwise, I think this looks good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] Samrose-Ahmed commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


Samrose-Ahmed commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-1710482809

   > I have a new idea for naming: How about using `Tabular`?
   > 
   > So we will have:
   > 
   > - `load_tabular()` / `load_table()` / `load_view()`
   > - `Tabular::Table(table)` / `Tabular::View(view)`
   > 
   > What do you think?
   
   I would avoid the confusion with the vendor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rdblue commented on pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


rdblue commented on PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#issuecomment-1710483728

   Looks good to me. Let's merge when tests are passing. Thanks, 
@amogh-jahagirdar!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


amogh-jahagirdar commented on code in PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#discussion_r1318884428


##
core/src/main/java/org/apache/iceberg/BaseTransaction.java:
##
@@ -380,14 +375,9 @@ private void commitReplaceTransaction(boolean orCreate) {
 
 } catch (RuntimeException e) {
   // the commit failed and no files were committed. clean up each update.
-  Tasks.foreach(updates)
-  .suppressFailureWhenFinished()
-  .run(
-  update -> {
-if (update instanceof SnapshotProducer) {
-  ((SnapshotProducer) update).cleanAll();
-}
-  });
+  if (!ops.requireStrictCleanup() || e instanceof CleanableFailure) {
+cleanUpOnCommitFailure();

Review Comment:
   Oh yes good catch, we're duplicating the call to delete the files. Although 
I wonder how the create/replace tests for this case were passing. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


amogh-jahagirdar commented on code in PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#discussion_r1318885267


##
core/src/main/java/org/apache/iceberg/BaseTransaction.java:
##
@@ -380,14 +375,9 @@ private void commitReplaceTransaction(boolean orCreate) {
 
 } catch (RuntimeException e) {
   // the commit failed and no files were committed. clean up each update.
-  Tasks.foreach(updates)
-  .suppressFailureWhenFinished()
-  .run(
-  update -> {
-if (update instanceof SnapshotProducer) {
-  ((SnapshotProducer) update).cleanAll();
-}
-  });
+  if (!ops.requireStrictCleanup() || e instanceof CleanableFailure) {
+cleanUpOnCommitFailure();

Review Comment:
   Ah the delete is best effort so the second time when the file no longer 
exists the delete would fail but we catch it and emit a warn log.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] Xuanwo commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


Xuanwo commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-171048

   > I would avoid the confusion with the vendor.
   
   Oh, I didn't take the vendor into consideration. I think tabular is a good 
alternative to table like and widely used in pandas ecosystem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-go] JustinTimperio commented on a diff in pull request #2: feat(partitioning): Add partition specs/fields and basic transform interface

2023-09-07 Thread via GitHub


JustinTimperio commented on code in PR #2:
URL: https://github.com/apache/iceberg-go/pull/2#discussion_r1318954109


##
partitions.go:
##
@@ -0,0 +1,218 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package iceberg
+
+import (
+   "encoding/json"
+   "fmt"
+   "strings"
+
+   "golang.org/x/exp/slices"

Review Comment:
   Generally I am really against using the experimental packages for anything 
that will be used likely in production. There recently has been a lot of people 
ignoring this and creating havoc in the release cycles of Go as seen in the 
following issues:
   
   https://github.com/golang/go/issues/51317
   https://github.com/golang/go/issues/61374
   
   I think it would be better at this time to avoid using anything in the exp 
branch. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-go] JustinTimperio commented on a diff in pull request #2: feat(partitioning): Add partition specs/fields and basic transform interface

2023-09-07 Thread via GitHub


JustinTimperio commented on code in PR #2:
URL: https://github.com/apache/iceberg-go/pull/2#discussion_r1318955697


##
partitions.go:
##
@@ -0,0 +1,218 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package iceberg
+
+import (
+   "encoding/json"
+   "fmt"
+   "strings"
+
+   "golang.org/x/exp/slices"
+)
+
+const (
+   partitionDataIDStart   = 1000
+   InitialPartitionSpecID = 0
+)
+
+// UnpartitionedPartitionSpec is the default unpartitioned spec which can
+// be used for comparisons or to just provide a convenience for referencing
+// the same unpartitioned spec object.
+var UnpartitionedPartitionSpec = &PartitionSpec{id: 0}
+
+// PartitionField represents how one partition value is derived from the
+// source column by transformation.
+type PartitionField struct {
+   // SourceID is the source column id of the table's schema
+   SourceID int `json:"source-id"`
+   // FieldID is the partition field id across all the table partition 
specs
+   FieldID int `json:"field-id"`
+   // Name is the name of the partition field itself
+   Name string `json:"name"`
+   // Transform is the transform used to produce the partition value
+   Transform Transform `json:"transform"`
+}
+
+func (p *PartitionField) String() string {
+   return fmt.Sprintf("%d: %s: %s(%d)", p.FieldID, p.Name, p.Transform, 
p.SourceID)
+}
+
+func (p *PartitionField) UnmarshalJSON(b []byte) error {
+   type Alias PartitionField
+   aux := struct {
+   TransformString string `json:"transform"`
+   *Alias
+   }{
+   Alias: (*Alias)(p),
+   }
+
+   err := json.Unmarshal(b, &aux)
+   if err != nil {
+   return err
+   }
+
+   if p.Transform, err = ParseTransform(aux.TransformString); err != nil {
+   return err
+   }
+
+   return nil
+}
+
+// PartitionSpec captures the transformation from table data to partition 
values
+type PartitionSpec struct {
+   // any change to a PartitionSpec will produce a new spec id
+   id int
+   fields []PartitionField
+
+   // this is populated by initialize after creation
+   sourceIdToFields map[int][]PartitionField
+}
+
+func NewPartitionSpec(fields ...PartitionField) PartitionSpec {
+   return NewPartitionSpecID(InitialPartitionSpecID, fields...)
+}
+
+func NewPartitionSpecID(id int, fields ...PartitionField) PartitionSpec {
+   ret := PartitionSpec{id: id, fields: fields}
+   ret.initialize()
+   return ret
+}
+
+// CompatibleWith returns true if this partition spec is considered
+// compatible with the passed in partition spec. This means that the two
+// specs have equivalent field lists regardless of the spec id.
+func (ps *PartitionSpec) CompatibleWith(other *PartitionSpec) bool {
+   if ps == other {
+   return true
+   }
+
+   if len(ps.fields) != len(other.fields) {
+   return false
+   }
+
+   return slices.EqualFunc(ps.fields, other.fields, func(left, right 
PartitionField) bool {

Review Comment:
   Happy to write a replacement function for this to avoid the exp package



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar commented on pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


amogh-jahagirdar commented on PR #8397:
URL: https://github.com/apache/iceberg/pull/8397#issuecomment-1710574693

   Thanks for review @rdblue ! merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar merged pull request #8397: API, Core: Add strict metadata cleanup to TableOperations

2023-09-07 Thread via GitHub


amogh-jahagirdar merged PR #8397:
URL: https://github.com/apache/iceberg/pull/8397


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] Fokko opened a new pull request, #8521: Python: Non-Cython fallback Avro parser

2023-09-07 Thread via GitHub


Fokko opened a new pull request, #8521:
URL: https://github.com/apache/iceberg/pull/8521

   I think it would be good to include a pure Python fallback.
   
   @rustyconover I'm seeing some weird errors here: 
https://github.com/pola-rs/polars/actions/runs/6109489610/job/16580630135?pr=10375


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] hiteshbedre commented on pull request #8491: Python: Improved Readability and Alignment of Regex Patterns

2023-09-07 Thread via GitHub


hiteshbedre commented on PR #8491:
URL: https://github.com/apache/iceberg/pull/8491#issuecomment-1710683580

   @amogh-jahagirdar who can help me with the next approval?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rafoid opened a new issue, #8522: ALTER TABLE ... DROP COLUMN allows dropping the last column of a table

2023-09-07 Thread via GitHub


rafoid opened a new issue, #8522:
URL: https://github.com/apache/iceberg/issues/8522

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   When performing ALTER TABLE ... DROP COLUMN, it drops columns as expected. 
However, the API also allows dropping the last column of a table, rendering the 
table useless. There are no more operations allowed on the table unless a new 
column is added. 
   
   I think we need to check whether the final schema has any columns or not 
before committing the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rafoid commented on issue #8522: ALTER TABLE ... DROP COLUMN allows dropping the last column of a table

2023-09-07 Thread via GitHub


rafoid commented on issue #8522:
URL: https://github.com/apache/iceberg/issues/8522#issuecomment-1710706057

   I think we need to modify the `commit()` method as following:
   ```
 @Override
 public void commit() {
   TableMetadata update = applyChangesToMetadata(base.updateSchema(apply(), 
lastColumnId));
   if (update.schema().columns().size() == 0)
 throw new UnsupportedOperationException("Unable to commit table 
without any columns.");
   ops.commit(base, update);
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rafoid opened a new pull request, #8523: Prevent dropping last column.

2023-09-07 Thread via GitHub


rafoid opened a new pull request, #8523:
URL: https://github.com/apache/iceberg/pull/8523

   This code intends to address the problem of allowing API dropping the last 
column of a table mentioned here: https://github.com/apache/iceberg/issues/8522


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] github-actions[bot] commented on issue #7071: Dynamic Split generation based on table size

2023-09-07 Thread via GitHub


github-actions[bot] commented on issue #7071:
URL: https://github.com/apache/iceberg/issues/7071#issuecomment-1710909025

   This issue has been automatically marked as stale because it has been open 
for 180 days with no activity. It will be closed in next 14 days if no further 
activity occurs. To permanently prevent this issue from being considered stale, 
add the label 'not-stale', but commenting on the issue is preferred when 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-docs] melvynator opened a new pull request, #274: Update vendors.md

2023-09-07 Thread via GitHub


melvynator opened a new pull request, #274:
URL: https://github.com/apache/iceberg-docs/pull/274

   Added clickhouse in the list of vendors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] liurenjie1024 commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


liurenjie1024 commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-1710958778

   > I have a new idea for naming: How about using `Tabular`?
   > 
   > So we will have:
   > 
   > * `load_tabular()` / `load_table()` / `load_view()`
   > * `Tabular::Table(table)` / `Tabular::View(view)`
   > 
   > What do you think?
   
   I asked chatgpt, and he also thinks it's good idea. BTW, should we raise 
discussion in iceberg community to add an api to catalog interface?
   
   Both java and python all have catalog interface:
   
   
https://github.com/apache/iceberg/blob/f0149a5860efe3f78d7cb07b81b4e1910b438020/api/src/main/java/org/apache/iceberg/catalog/Catalog.java#L33
   
   
https://github.com/apache/iceberg/blob/master/python/pyiceberg/catalog/__init__.py
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] ConeyLiu commented on pull request #8523: Prevent dropping last column.

2023-09-07 Thread via GitHub


ConeyLiu commented on PR #8523:
URL: https://github.com/apache/iceberg/pull/8523#issuecomment-1710964389

   What kind of scenario needs to drop all columns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] frankliee opened a new pull request, #8524: Python: Fix pyarrow hdfs support

2023-09-07 Thread via GitHub


frankliee opened a new pull request, #8524:
URL: https://github.com/apache/iceberg/pull/8524

   I have tested pyiceberg on our hdfs cluster, and this pr fixed some bugs.
   
   Changed. 
   
   1. Using optional parameters to create hdfs client, similar to  gcs's style.
   2. Fix `PyArrowFileIO.parse_location` for hdfs style, which cannot be 
resolved by HadoopFileSystem.
   
   Previous behavior:  parse_location("hdfs://127.0.0.1:9000/root/abc") ->  
"hdfs", "127.0.0.1:9000/root/abc"
   New behavior:   parse_location("hdfs://127.0.0.1:9000/root/abc") ->  
"hdfs", "hdfs://127.0.0.1:9000/root/abc"
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rafoid commented on pull request #8523: Prevent dropping last column.

2023-09-07 Thread via GitHub


rafoid commented on PR #8523:
URL: https://github.com/apache/iceberg/pull/8523#issuecomment-1711009177

   @ConeyLiu That's exactly my point - under no circumstances a user should be 
allowed to drop all columns. At this point the API is permissive enough for 
such a scenario to occur.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] rustyconover commented on pull request #8521: Python: Non-Cython fallback Avro parser

2023-09-07 Thread via GitHub


rustyconover commented on PR #8521:
URL: https://github.com/apache/iceberg/pull/8521#issuecomment-1711014712

   Hi @Fokko,
   
   Reading the logs it looks like it's a packaging problem for Windows.
   
   I'm pretty sure Cython works on Windows, but the wheel may not have the 
necessary module packaged.
   
   Linux packages worked well (no PyIceberg errors), but the Windows build 
failed.  I don't have access to Windows so I can't really test it for you, but 
you may want to revisit including the shared library that is built by Cython.  
Maybe we need to include *.dll rather than *.so or some other extension on 
Windows?
   
   Rusty


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #8525: Core: Add a validation API to DeleteFiles which validates files exist prior to attempting to deletion.

2023-09-07 Thread via GitHub


amogh-jahagirdar opened a new pull request, #8525:
URL: https://github.com/apache/iceberg/pull/8525

   Currently there is no validation on delete files API for verifying that the 
file to be deleted actually exists prior to the commit. This can cause 
unexpected behavior, for example:
   
   1.) Rewrite Data files compacts FILE_A + some delete files
   2.) Concurrently a DeleteFiles call was done for FILE_A.
   3.) Deletion gets retried after 1 completes. At the point it retries, FILE_A 
no longer exists due to the compaction so the deletion is a no-op.
   
   From a user's perspective when they go and query the table after 3, they'll 
still see the data in FILE_A, even though the wouldn't expect it since they 
recieved a successful delete of FILE_A.
   
   This change currently adds a new validation API as opposed to changing the 
behavior to always do strict validation for purpose of backwards compatibility. 
I wanted to open this up and discuss with the community to see about if we want 
the new API or if we just want to change this behavior since it's mainly just 
used by maintenance procedures.
   
   Also I'd need to compare what serializable isolation vs snapshot isolation 
should guarantee in this concurrent delete scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] JanKaul commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


JanKaul commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-1711086189

   > I have a new idea for naming: How about using `Tabular`?
   > 
   > So we will have:
   > 
   > - `load_tabular()` / `load_table()` / `load_view()`
   > - `Tabular::Table(table)` / `Tabular::View(view)`
   > 
   > What do you think?
   
   Nice one, I think it's a fitting name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] JanKaul commented on issue #55: Iceberg View Support

2023-09-07 Thread via GitHub


JanKaul commented on issue #55:
URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-1711087506

   > > I have a new idea for naming: How about using `Tabular`?
   > > 
   > > So we will have:
   > > 
   > > * `load_tabular()` / `load_table()` / `load_view()`
   > > * `Tabular::Table(table)` / `Tabular::View(view)`
   > > 
   > > What do you think?
   > 
   > I asked chatgpt, and he also thinks it's good idea. BTW, should we raise 
discussion in iceberg community to add an api to catalog interface?
   > 
   > Both java and python all have catalog interface:
   > 
   > 
https://github.com/apache/iceberg/blob/f0149a5860efe3f78d7cb07b81b4e1910b438020/api/src/main/java/org/apache/iceberg/catalog/Catalog.java#L33
   > 
   > 
https://github.com/apache/iceberg/blob/master/python/pyiceberg/catalog/__init__.py
   > 
   > 
   
   Good point. I think this should be really valuable for java and python. I 
don't see a way for them to easily get around this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] liurenjie1024 opened a new issue, #8526: Rest catalog server doesn't return table configuration as expected.

2023-09-07 Thread via GitHub


liurenjie1024 opened a new issue, #8526:
URL: https://github.com/apache/iceberg/issues/8526

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   According to rest [open api's 
spec](https://github.com/apache/iceberg/blob/621d301b17417af4b71a0e8c93db9db6f8266839/open-api/rest-catalog-open-api.yaml#L1899),
 the `loadTable` api should return configuration of table in `config` property. 
 This is not implemented in [rest server 
handler](https://github.com/apache/iceberg/blob/f3826bd9bd08262109df741ca8bf5aa0b697/core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java#L256)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] liurenjie1024 commented on issue #8526: Rest catalog server doesn't return table configuration as expected.

2023-09-07 Thread via GitHub


liurenjie1024 commented on issue #8526:
URL: https://github.com/apache/iceberg/issues/8526#issuecomment-1711123130

   I'm happy to take this if confirmed that it's a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] JanKaul commented on a diff in pull request #42: feat: support transform function

2023-09-07 Thread via GitHub


JanKaul commented on code in PR #42:
URL: https://github.com/apache/iceberg-rust/pull/42#discussion_r1319404484


##
crates/iceberg/src/transform/truncate.rs:
##
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use std::sync::Arc;
+
+use arrow_array::ArrayRef;
+use arrow_schema::DataType;
+
+use crate::Error;
+
+use super::TransformFunction;
+
+pub struct Truncate {
+width: u32,
+}
+
+impl Truncate {
+pub fn new(width: u32) -> Self {
+Self { width }
+}
+}
+
+impl TransformFunction for Truncate {
+fn transform(&self, input: ArrayRef) -> crate::Result {
+match input.data_type() {
+DataType::Int32 => {
+let width: i32 = self.width.try_into().map_err(|_| {
+Error::new(
+crate::ErrorKind::DataInvalid,
+"width is failed to convert to i32 when truncate 
Int32Array",
+)
+})?;
+let res: arrow_array::Int32Array = input
+.as_any()
+.downcast_ref::()
+.unwrap()
+.unary(|v| v - (((v % width) + width) % width));

Review Comment:
   We could use `v - v.rem_euclid(width)` instead of `v - (((v % width) + 
width) % width)`. But it's also fine as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg] chandu-1101 commented on issue #8333: Unable to merge CDC data into snapshot data. java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang

2023-09-07 Thread via GitHub


chandu-1101 commented on issue #8333:
URL: https://github.com/apache/iceberg/issues/8333#issuecomment-1711162146

   @juanrondineau , @ext-diego-duenas  Glad to see updates on this thread. 
   
   @nastra even I don't think this is a schema issue! 
   
   @juanrondineau Can you explain further on below
   > We materialize the new data as a table and then we run the merge
   
   In my case, I had a new iceberg table with the existing data from the 
snapshot added to it. meaning, the iceberg table has a single version --more 
like a materialized view (my guess of the materialized view is (a.)Format is 
iceberg (b.)Only 1 version of data is present). Please correct me if I am 
wrong. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [iceberg-rust] JanKaul commented on a diff in pull request #56: feat: support read Manifest List

2023-09-07 Thread via GitHub


JanKaul commented on code in PR #56:
URL: https://github.com/apache/iceberg-rust/pull/56#discussion_r1319419895


##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -0,0 +1,881 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! ManifestList for Iceberg.
+
+use crate::{avro::schema_to_avro_schema, spec::Literal, Error};
+use apache_avro::{from_value, types::Value, Reader};
+use once_cell::sync::Lazy;
+use std::sync::Arc;
+
+use super::{FormatVersion, ListType, NestedField, NestedFieldRef, Schema, 
StructType};
+
+/// Snapshots are embedded in table metadata, but the list of manifests for a
+/// snapshot are stored in a separate manifest list file.
+///
+/// A new manifest list is written for each attempt to commit a snapshot
+/// because the list of manifests always changes to produce a new snapshot.
+/// When a manifest list is written, the (optimistic) sequence number of the
+/// snapshot is written for all new manifest files tracked by the list.
+///
+/// A manifest list includes summary metadata that can be used to avoid
+/// scanning all of the manifests in a snapshot when planning a table scan.
+/// This includes the number of added, existing, and deleted files, and a
+/// summary of values for each field of the partition spec used to write the
+/// manifest.
+#[derive(Debug, Clone)]
+pub struct ManifestList {
+/// Entries in a manifest list.
+entries: Vec,
+}
+
+impl ManifestList {
+/// Parse manifest list from bytes.
+///
+/// QUESTION: Will we have more than one manifest list in a single file?
+pub fn parse_with_version(
+bs: &[u8],
+version: FormatVersion,
+partition_type: &StructType,
+) -> Result {
+match version {
+FormatVersion::V2 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v2_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV2>(&values)?.try_into(partition_type)
+}
+FormatVersion::V1 => {
+let schema = schema_to_avro_schema("manifest_list", 
&Self::v1_schema()).unwrap();
+let reader = Reader::with_schema(&schema, bs)?;
+let values = Value::Array(reader.collect::, 
_>>()?);
+
from_value::<_serde::ManifestListV1>(&values)?.try_into(partition_type)
+}
+}
+}
+
+/// Get the entries in the manifest list.
+pub fn entries(&self) -> &[ManifestListEntry] {
+&self.entries
+}
+
+const MANIFEST_PATH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+500,
+"manifest_path",
+super::Type::Primitive(super::PrimitiveType::String),
+))
+})
+};
+const MANIFEST_LENGTH: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+501,
+"manifest_length",
+super::Type::Primitive(super::PrimitiveType::Long),
+))
+})
+};
+const PARTITION_SPEC_ID: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+502,
+"partition_spec_id",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const CONTENT: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+517,
+"content",
+super::Type::Primitive(super::PrimitiveType::Int),
+))
+})
+};
+const SEQUENCE_NUMBER: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+515,
+"sequence_number",
+super::Type::Primitive(super::PrimitiveType::Long),
+))
+})
+};
+const MIN_SEQUENCE_NUMBER: Lazy = {
+Lazy::new(|| {
+Arc::new(NestedField::required(
+516,
+"min_sequence_number",
+super::Type::Primitive(super::PrimitiveType::Long),
+))

[GitHub] [iceberg-rust] JanKaul commented on a diff in pull request #56: feat: support read Manifest List

2023-09-07 Thread via GitHub


JanKaul commented on code in PR #56:
URL: https://github.com/apache/iceberg-rust/pull/56#discussion_r1319441548


##
crates/iceberg/src/spec/manifest_list.rs:
##
@@ -0,0 +1,581 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! ManifestList for Iceberg.
+
+use apache_avro::{from_value, Reader};
+
+use crate::{spec::Literal, Error};
+
+use super::{FormatVersion, StructType};
+
+/// Snapshots are embedded in table metadata, but the list of manifests for a
+/// snapshot are stored in a separate manifest list file.
+///
+/// A new manifest list is written for each attempt to commit a snapshot
+/// because the list of manifests always changes to produce a new snapshot.
+/// When a manifest list is written, the (optimistic) sequence number of the
+/// snapshot is written for all new manifest files tracked by the list.
+///
+/// A manifest list includes summary metadata that can be used to avoid
+/// scanning all of the manifests in a snapshot when planning a table scan.
+/// This includes the number of added, existing, and deleted files, and a
+/// summary of values for each field of the partition spec used to write the
+/// manifest.
+#[derive(Debug, Clone)]
+pub struct ManifestList {
+version: FormatVersion,
+/// Entries in a manifest list.
+entries: Vec,
+}
+
+impl ManifestList {
+/// Parse manifest list from bytes.
+///
+/// QUESTION: Will we have more than one manifest list in a single file?

Review Comment:
   I think there should only be one manifest list with many manifest list 
entries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org