wypoon opened a new pull request, #15765: URL: https://github.com/apache/iceberg/pull/15765
`TestRewriteDataFilesAction` has helper `createTable` and `createTablePartitioned` methods that create a `Table` instance, write to the table, and return the `Table` instance. However, as the write is done using the `DataFrame` API and not through the `Table`, the `Table` instance that is returned is stale. This is bad design and a trap for unwary new users of the methods who want to add tests. The trap is made worse by the helper methods `shouldHaveFiles` and `shouldHaveSnapshots` (which take the `Table` as an argument) doing a `Table::refresh` before performing its work, so if one looks at some existing test method that calls `createTable` followed by `shouldHaveFiles` as a model, one does not see the hidden call to `Table::refresh`. The `Table` should be refreshed before it is returned, so it has all the changes. Once this is done, some new `Table::refresh` calls need to be added. However, I found many unnecessary `Table::refresh` calls (probably added defensively due to being once bitten, twice shy) and have removed them. I also found and removed unnecessary `Table::refresh` calls in `TestRewritePositionDeleteFilesAction`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
