[PR] build(deps): bump github.com/aws/aws-sdk-go-v2 from 1.21.2 to 1.22.1 [iceberg-go]

2023-11-05 Thread via GitHub


dependabot[bot] opened a new pull request, #27:
URL: https://github.com/apache/iceberg-go/pull/27

   Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) 
from 1.21.2 to 1.22.1.
   
   Commits
   
   https://github.com/aws/aws-sdk-go-v2/commit/ee5e3f05637540596cc7aab1359742000a8d533a";>ee5e3f0
 Release 2023-11-01
   https://github.com/aws/aws-sdk-go-v2/commit/b65c226f47aa1f837699664bdc65c3c3e3611765";>b65c226
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/7a194b9b0344774a5af100d11ea2066c5b0cf234";>7a194b9
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/0cb924a0007bc681d12f382a604368e0660827ee";>0cb924a
 Add support for configured endpoints. (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2328";>#2328)
   https://github.com/aws/aws-sdk-go-v2/commit/61039fea9cc9e080c53382850c87685b5406fd68";>61039fe
 Release 2023-10-31
   https://github.com/aws/aws-sdk-go-v2/commit/797e0560769725635218fc30a2554c1bbaccc01b";>797e056
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/822585d3f621a7c5844584d8e471c32f852702aa";>822585d
 Update SDK's smithy-go dependency to v1.16.0
   https://github.com/aws/aws-sdk-go-v2/commit/abf753db747dd256f3ee69712a19d1d3dc681f23";>abf753d
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/99861c071109ce5ee4f1cb3b72ead2062b3bd86c";>99861c0
 lang: bump minimum go version to 1.19 (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2338";>#2338)
   https://github.com/aws/aws-sdk-go-v2/commit/2ac0a53ac45acaadc4526fd25b643dc46032b02a";>2ac0a53
 Release 2023-10-30
   Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/v1.21.2...v1.22.1";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2&package-manager=go_modules&previous-version=1.21.2&new-version=1.22.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] build(deps): bump github.com/hamba/avro/v2 from 2.16.0 to 2.17.1 [iceberg-go]

2023-11-05 Thread via GitHub


dependabot[bot] opened a new pull request, #28:
URL: https://github.com/apache/iceberg-go/pull/28

   Bumps [github.com/hamba/avro/v2](https://github.com/hamba/avro) from 2.16.0 
to 2.17.1.
   
   Release notes
   Sourced from https://github.com/hamba/avro/releases";>github.com/hamba/avro/v2's 
releases.
   
   v2.17.1
   What's Changed
   
   fix: issue with dereferencing schemas by https://github.com/nrwiersma";>@​nrwiersma in https://redirect.github.com/hamba/avro/pull/319";>hamba/avro#319
   
   Full Changelog: https://github.com/hamba/avro/compare/v2.17.0...v2.17.1";>https://github.com/hamba/avro/compare/v2.17.0...v2.17.1
   v2.17.0
   What's Changed
   
   Allow tag style "original" for additional tags by https://github.com/founderio";>@​founderio in https://redirect.github.com/hamba/avro/pull/313";>hamba/avro#313
   Added Types methods to Protocol by https://github.com/EliaBracciSumo";>@​EliaBracciSumo in 
https://redirect.github.com/hamba/avro/pull/315";>hamba/avro#315
   
   New Contributors
   
   https://github.com/founderio";>@​founderio made 
their first contribution in https://redirect.github.com/hamba/avro/pull/313";>hamba/avro#313
   https://github.com/EliaBracciSumo";>@​EliaBracciSumo made 
their first contribution in https://redirect.github.com/hamba/avro/pull/315";>hamba/avro#315
   
   Full Changelog: https://github.com/hamba/avro/compare/v2.16.0...v2.17.0";>https://github.com/hamba/avro/compare/v2.16.0...v2.17.0
   
   
   
   Commits
   
   https://github.com/hamba/avro/commit/0429db3bae0390938223d14e8b36737b5fb3ef3c";>0429db3
 fix: issue with dereferencing schemas (https://redirect.github.com/hamba/avro/issues/319";>#319)
   https://github.com/hamba/avro/commit/50a7897f6ce66c9f9907128355c618468578bd2b";>50a7897
 feat: added Types methods to Protocol (https://redirect.github.com/hamba/avro/issues/315";>#315)
   https://github.com/hamba/avro/commit/3ac44d5d4fbed8a582e47f6ba91a65bbc23fe5bd";>3ac44d5
 feat: allow tag style "original" for additional tags (https://redirect.github.com/hamba/avro/issues/313";>#313)
   https://github.com/hamba/avro/commit/00fb9ace37cb66d12f28cc09fe4f58089574";>00fb9ac
 chore: add dependency groups (https://redirect.github.com/hamba/avro/issues/312";>#312)
   See full diff in https://github.com/hamba/avro/compare/v2.16.0...v2.17.1";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/hamba/avro/v2&package-manager=go_modules&previous-version=2.16.0&new-version=2.17.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[PR] build(deps): bump github.com/google/uuid from 1.3.1 to 1.4.0 [iceberg-go]

2023-11-05 Thread via GitHub


dependabot[bot] opened a new pull request, #29:
URL: https://github.com/apache/iceberg-go/pull/29

   Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.3.1 to 
1.4.0.
   
   Release notes
   Sourced from https://github.com/google/uuid/releases";>github.com/google/uuid's 
releases.
   
   v1.4.0
   https://github.com/google/uuid/compare/v1.3.1...v1.4.0";>1.4.0 
(2023-10-26)
   Features
   
   UUIDs slice type with Strings() convenience method (https://redirect.github.com/google/uuid/issues/133";>#133) (https://github.com/google/uuid/commit/cd5fbbdd02f3e3467ac18940e07e062be1f864b4";>cd5fbbd)
   
   Fixes
   
   Clarify that Parse's job is to parse but not necessarily validate 
strings. (Documents current behavior)
   
   
   
   
   Changelog
   Sourced from https://github.com/google/uuid/blob/master/CHANGELOG.md";>github.com/google/uuid's
 changelog.
   
   https://github.com/google/uuid/compare/v1.3.1...v1.4.0";>1.4.0 
(2023-10-26)
   Features
   
   UUIDs slice type with Strings() convenience method (https://redirect.github.com/google/uuid/issues/133";>#133) (https://github.com/google/uuid/commit/cd5fbbdd02f3e3467ac18940e07e062be1f864b4";>cd5fbbd)
   
   Fixes
   
   Clarify that Parse's job is to parse but not necessarily validate 
strings. (Documents current behavior)
   
   
   
   
   Commits
   
   https://github.com/google/uuid/commit/8de8764e294f072b7a2f1a209e88fdcdb1ebc875";>8de8764
 chore(master): release 1.4.0 (https://redirect.github.com/google/uuid/issues/134";>#134)
   https://github.com/google/uuid/commit/7c22e97ff7647f3b21c3e0870ab335c3889de467";>7c22e97
 Clarify the documentation of Parse to state its job is to parse, not 
validate...
   https://github.com/google/uuid/commit/cd5fbbdd02f3e3467ac18940e07e062be1f864b4";>cd5fbbd
 feat: UUIDs slice type with Strings() convenience method (https://redirect.github.com/google/uuid/issues/133";>#133)
   https://github.com/google/uuid/commit/47f5b3936c94efb365bdfc62716912ed9e66326f";>47f5b39
 docs: fix a typo in CONTRIBUTING.md (https://redirect.github.com/google/uuid/issues/130";>#130)
   https://github.com/google/uuid/commit/542ddabd47d7bfa79359b7b4e2af7f975354e35f";>542ddab
 chore(tests): add Fuzz tests (https://redirect.github.com/google/uuid/issues/128";>#128)
   https://github.com/google/uuid/commit/06716f6a60da5ba158f1d53a8236a534968ff76e";>06716f6
 chore(tests): Add json.Unmarshal test with empty value cases (https://redirect.github.com/google/uuid/issues/116";>#116)
   See full diff in https://github.com/google/uuid/compare/v1.3.1...v1.4.0";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/google/uuid&package-manager=go_modules&previous-version=1.3.1&new-version=1.4.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubs

[PR] build(deps): bump github.com/wolfeidau/s3iofs from 1.3.0 to 1.3.1 [iceberg-go]

2023-11-05 Thread via GitHub


dependabot[bot] opened a new pull request, #31:
URL: https://github.com/apache/iceberg-go/pull/31

   Bumps [github.com/wolfeidau/s3iofs](https://github.com/wolfeidau/s3iofs) 
from 1.3.0 to 1.3.1.
   
   Release notes
   Sourced from https://github.com/wolfeidau/s3iofs/releases";>github.com/wolfeidau/s3iofs's
 releases.
   
   v1.3.1
   What's Changed
   
   docs(README): add some badges with a godoc link by https://github.com/wolfeidau";>@​wolfeidau in https://redirect.github.com/wolfeidau/s3iofs/pull/14";>wolfeidau/s3iofs#14
   feat(testing): increase integration test coverage :rocket: by https://github.com/wolfeidau";>@​wolfeidau in https://redirect.github.com/wolfeidau/s3iofs/pull/15";>wolfeidau/s3iofs#15
   feat(tests): added flags for vscode to ensure integration test coverage 
works by https://github.com/wolfeidau";>@​wolfeidau in 
https://redirect.github.com/wolfeidau/s3iofs/pull/16";>wolfeidau/s3iofs#16
   chore(deps): upgrade go deps by https://github.com/wolfeidau";>@​wolfeidau in https://redirect.github.com/wolfeidau/s3iofs/pull/19";>wolfeidau/s3iofs#19
   chore(deps): upgrade go deps for integration tests by https://github.com/wolfeidau";>@​wolfeidau in https://redirect.github.com/wolfeidau/s3iofs/pull/20";>wolfeidau/s3iofs#20
   
   Full Changelog: https://github.com/wolfeidau/s3iofs/compare/v1.3.0...v1.3.1";>https://github.com/wolfeidau/s3iofs/compare/v1.3.0...v1.3.1
   
   
   
   Commits
   
   https://github.com/wolfeidau/s3iofs/commit/710788272cd775c490622c9fd2d56a25ea138929";>7107882
 Merge pull request https://redirect.github.com/wolfeidau/s3iofs/issues/20";>#20 from 
wolfeidau/chore_upgrade_integration_deps
   https://github.com/wolfeidau/s3iofs/commit/8e14816297b4761912d1f65d7b25ffa5145d1a41";>8e14816
 chore(deps): upgrade go deps for integration tests
   https://github.com/wolfeidau/s3iofs/commit/87378762a59e2b2ec85822e8e217a4322771db39";>8737876
 Merge pull request https://redirect.github.com/wolfeidau/s3iofs/issues/19";>#19 from 
wolfeidau/chore_oct_dep_upgrades
   https://github.com/wolfeidau/s3iofs/commit/5bcee15b28710992fea999ddacd931e206eccef2";>5bcee15
 chore(deps): upgrade go deps
   https://github.com/wolfeidau/s3iofs/commit/ba8909f07876d88ae05ae3cec4756736bf185371";>ba8909f
 Merge pull request https://redirect.github.com/wolfeidau/s3iofs/issues/16";>#16 from 
wolfeidau/feat_vscode_test_coverage
   https://github.com/wolfeidau/s3iofs/commit/385abc4f78bff56a39ff673baaf8f306b19cfd40";>385abc4
 feat(tests): added flags for vscode to ensure integration test coverage 
works
   https://github.com/wolfeidau/s3iofs/commit/405c8424b0cc3a8e5b65a870f9d5092188920b44";>405c842
 Merge pull request https://redirect.github.com/wolfeidau/s3iofs/issues/15";>#15 from 
wolfeidau/feat_testing
   https://github.com/wolfeidau/s3iofs/commit/144df5813d4d373436dcced549f49dd82f4afffe";>144df58
 feat(testing): increase integration test coverage :rocket:
   https://github.com/wolfeidau/s3iofs/commit/2281acecd4ee81ba62fed349555642fb4399e0e1";>2281ace
 Merge pull request https://redirect.github.com/wolfeidau/s3iofs/issues/14";>#14 from 
wolfeidau/docs_readme
   https://github.com/wolfeidau/s3iofs/commit/b9f30f1374dd7e1ba2f940e7058887f5c99b9874";>b9f30f1
 docs(README): add some badges with a godoc link
   See full diff in https://github.com/wolfeidau/s3iofs/compare/v1.3.0...v1.3.1";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/wolfeidau/s3iofs&package-manager=go_modules&previous-version=1.3.0&new-version=1.3.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it 

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.19.1 to 1.22.0 [iceberg-go]

2023-11-05 Thread via GitHub


dependabot[bot] opened a new pull request, #30:
URL: https://github.com/apache/iceberg-go/pull/30

   Bumps 
[github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) 
from 1.19.1 to 1.22.0.
   
   Commits
   
   https://github.com/aws/aws-sdk-go-v2/commit/61039fea9cc9e080c53382850c87685b5406fd68";>61039fe
 Release 2023-10-31
   https://github.com/aws/aws-sdk-go-v2/commit/797e0560769725635218fc30a2554c1bbaccc01b";>797e056
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/822585d3f621a7c5844584d8e471c32f852702aa";>822585d
 Update SDK's smithy-go dependency to v1.16.0
   https://github.com/aws/aws-sdk-go-v2/commit/abf753db747dd256f3ee69712a19d1d3dc681f23";>abf753d
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/99861c071109ce5ee4f1cb3b72ead2062b3bd86c";>99861c0
 lang: bump minimum go version to 1.19 (https://redirect.github.com/aws/aws-sdk-go-v2/issues/2338";>#2338)
   https://github.com/aws/aws-sdk-go-v2/commit/2ac0a53ac45acaadc4526fd25b643dc46032b02a";>2ac0a53
 Release 2023-10-30
   https://github.com/aws/aws-sdk-go-v2/commit/c10aa0ad45a155d7a6a9968894aed0d8e1cb4e81";>c10aa0a
 Regenerated Clients
   https://github.com/aws/aws-sdk-go-v2/commit/9c456c10923952d6bd1d7d59ded3d70588e1ff36";>9c456c1
 Update API model
   https://github.com/aws/aws-sdk-go-v2/commit/3cb5dc1d777c4e28cd360728c45e8b5aa2a7b2b0";>3cb5dc1
 Release 2023-10-27
   https://github.com/aws/aws-sdk-go-v2/commit/9b3ad7b1e6ce72730896fe7c9d165543ff158ed3";>9b3ad7b
 Regenerated Clients
   Additional commits viewable in https://github.com/aws/aws-sdk-go-v2/compare/v1.19.1...v1.22.0";>compare 
view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/aws/aws-sdk-go-v2/config&package-manager=go_modules&previous-version=1.19.1&new-version=1.22.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Substitue in memory data struct's timestamp type for DataTime rather i64 to simplify usage. [iceberg-rust]

2023-11-05 Thread via GitHub


my-vegetable-has-exploded commented on issue #90:
URL: https://github.com/apache/iceberg-rust/issues/90#issuecomment-1793775352

   I‘d like to have a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] Substitue in memory data struct's timestamp type for DataTime rather i64 to simplify usage. [iceberg-rust]

2023-11-05 Thread via GitHub


liurenjie1024 commented on issue #90:
URL: https://github.com/apache/iceberg-rust/issues/90#issuecomment-1793783924

   > I‘d like to have a try.
   
   Sure, welcome to contribute!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Replace black by Ruff Formatter [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on PR #127:
URL: https://github.com/apache/iceberg-python/pull/127#issuecomment-1793797091

   Looks fine overall, but it seems like too many changes with string 
normalization. Why force string normalization? That's going to cause a ton of 
pull requests to fail formatting validation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support of before and after actions in preorderschema traversal [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on PR #42:
URL: https://github.com/apache/iceberg-python/pull/42#issuecomment-1793797992

   Maybe it's me, but I don't understand the value of adding before and after 
callbacks to this visitor. A node's children are traversed when the future is 
called and that allows you to do whatever you want before and after further 
schema traversal. I think it makes more sense to consolidate the logic in the 
usual methods rather than use callbacks in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support of before and after actions in preorderschema traversal [iceberg-python]

2023-11-05 Thread via GitHub


MehulBatra commented on PR #42:
URL: https://github.com/apache/iceberg-python/pull/42#issuecomment-1793802279

   > Maybe it's me, but I don't understand the value of adding before and after 
callbacks to this visitor. A node's children are traversed when the future is 
called and that allows you to do whatever you want before and after further 
schema traversal. I think it makes more sense to consolidate the logic in the 
usual methods rather than use callbacks in this case.
   
   It's still half baked I am working on it, but thanks for the feedback, will 
that into consideration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382618622


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')

Review Comment:
   In Java, this is 
[`deleted-data-files`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L31)
 because the property was created before we had delete files (so delete and 
remove were the same thing).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382619167


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')

Review Comment:
   Looks like this is correct in the `_update_totals` call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382619372


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382619528


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382619653


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620295


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620579


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+self.added_size += data_file.file_size_in_bytes
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.deleted_records, 'deleted-records')
+set_non_zero(p

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620579


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+self.added_size += data_file.file_size_in_bytes
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.deleted_records, 'deleted-records')
+set_non_zero(p

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620743


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620705


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:
+if num > 0:
+properties[property_name] = str(num)
+
+properties: Dict[str, str] = {}
+set_non_zero(properties, self.added_size, 'added-files-size')
+set_non_zero(properties, self.removed_size, 'removed-files-size')
+set_non_zero(properties, self.added_files, 'added-data-files')
+set_non_zero(properties, self.removed_files, 'removed-data-files')
+set_non_zero(properties, self.added_eq_delete_files, 
'added-equality-delete-files')
+set_non_zero(properties, self.removed_eq_delete_files, 
'removed-equality-delete-files')
+set_non_zero(properties, self.added_pos_delete_files, 
'added-position-delete-files')
+set_non_zero(properties, self.removed_pos_delete_files, 
'removed-position-delete-files')
+set_non_zero(properties, self.added_delete_files, 'added-delete-files')
+set_non_zero(properties, self.removed_delete_files, 
'removed-delete-files')
+set_non_zero(properties, self.added_records, 'added-records')
+set_non_zero(properties, self.

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382620882


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:

Review Comment:
   `DataFile` is used for both data and delete files?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382621051


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:

Review Comment:
   Nit: `add_file` and `removed_file` use different tenses. It would be better 
to use `added_file` and `removed_file` or `add_file` and `remove_file`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382621288


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:

Review Comment:
   Since there is no way to add manifests right now, should we just remove this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on code in PR #61:
URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1382621381


##
pyiceberg/table/snapshots.py:
##
@@ -116,3 +144,202 @@ class MetadataLogEntry(IcebergBaseModel):
 class SnapshotLogEntry(IcebergBaseModel):
 snapshot_id: int = Field(alias="snapshot-id")
 timestamp_ms: int = Field(alias="timestamp-ms")
+
+
+class SnapshotSummaryCollector:
+added_size: int
+removed_size: int
+added_files: int
+removed_files: int
+added_eq_delete_files: int
+removed_eq_delete_files: int
+added_pos_delete_files: int
+removed_pos_delete_files: int
+added_delete_files: int
+removed_delete_files: int
+added_records: int
+deleted_records: int
+added_pos_deletes: int
+removed_pos_deletes: int
+added_eq_deletes: int
+removed_eq_deletes: int
+
+def __init__(self) -> None:
+self.added_size = 0
+self.removed_size = 0
+self.added_files = 0
+self.removed_files = 0
+self.added_eq_delete_files = 0
+self.removed_eq_delete_files = 0
+self.added_pos_delete_files = 0
+self.removed_pos_delete_files = 0
+self.added_delete_files = 0
+self.removed_delete_files = 0
+self.added_records = 0
+self.deleted_records = 0
+self.added_pos_deletes = 0
+self.removed_pos_deletes = 0
+self.added_eq_deletes = 0
+self.removed_eq_deletes = 0
+
+def add_file(self, data_file: DataFile) -> None:
+self.added_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.added_files += 1
+self.added_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.added_delete_files += 1
+self.added_pos_delete_files += 1
+self.added_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.added_delete_files += 1
+self.added_eq_delete_files += 1
+self.added_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def removed_file(self, data_file: DataFile) -> None:
+self.removed_size += data_file.file_size_in_bytes
+
+if data_file.content == DataFileContent.DATA:
+self.removed_files += 1
+self.deleted_records += data_file.record_count
+elif data_file.content == DataFileContent.POSITION_DELETES:
+self.removed_delete_files += 1
+self.removed_pos_delete_files += 1
+self.removed_pos_deletes += data_file.record_count
+elif data_file.content == DataFileContent.EQUALITY_DELETES:
+self.removed_delete_files += 1
+self.removed_eq_delete_files += 1
+self.removed_eq_deletes += data_file.record_count
+else:
+raise ValueError(f"Unknown data file content: {data_file.content}")
+
+def added_manifest(self, manifest: ManifestFile) -> None:
+if manifest.content == ManifestContent.DATA:
+self.added_files += manifest.added_files_count or 0
+self.added_records += manifest.added_rows_count or 0
+self.removed_files += manifest.deleted_files_count or 0
+self.deleted_records += manifest.deleted_rows_count or 0
+elif manifest.content == ManifestContent.DELETES:
+self.added_delete_files += manifest.added_files_count or 0
+self.removed_delete_files += manifest.deleted_files_count or 0
+else:
+raise ValueError(f"Unknown manifest file content: 
{manifest.content}")
+
+def build(self) -> Dict[str, str]:
+def set_non_zero(properties: Dict[str, str], num: int, property_name: 
str) -> None:

Review Comment:
   Nit: this checks for positive, not just non-zero.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Add flake8-pie to ruff [iceberg-python]

2023-11-05 Thread via GitHub


rdblue merged PR #86:
URL: https://github.com/apache/iceberg-python/pull/86


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update pre-commit [iceberg-python]

2023-11-05 Thread via GitHub


rdblue merged PR #85:
URL: https://github.com/apache/iceberg-python/pull/85


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Update pre-commit [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on PR #85:
URL: https://github.com/apache/iceberg-python/pull/85#issuecomment-1793808567

   Thanks, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Replace black by Ruff Formatter [iceberg-python]

2023-11-05 Thread via GitHub


Fokko commented on code in PR #127:
URL: https://github.com/apache/iceberg-python/pull/127#discussion_r1382622265


##
.pre-commit-config.yaml:
##
@@ -29,15 +29,11 @@ repos:
   - id: check-ast
   - repo: https://github.com/astral-sh/ruff-pre-commit
 # Ruff version (Used for linting)
-rev: v0.0.291

Review Comment:
   Does it come with the new version? I don't see any related config (for 
example, line length)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Bump version to 0.6.0 [iceberg-python]

2023-11-05 Thread via GitHub


rdblue commented on PR #72:
URL: https://github.com/apache/iceberg-python/pull/72#issuecomment-1793809215

   Looks good to me. Merge when you're ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Support of before and after actions in preorderschema traversal [iceberg-python]

2023-11-05 Thread via GitHub


Fokko commented on PR #42:
URL: https://github.com/apache/iceberg-python/pull/42#issuecomment-1793809220

   This was suggested here: 
https://github.com/apache/iceberg/pull/7831/files#r1285259053 I'll leave it up 
to @rdblue to decide if he thinks this is valuable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Consider moving to ParallelIterable in Deletes::toPositionIndex [iceberg]

2023-11-05 Thread via GitHub


rdblue commented on PR #6432:
URL: https://github.com/apache/iceberg/pull/6432#issuecomment-1793809960

   #8805 was merged so I'll close this. I should also note that @aokolnychyi 
raised some concerns about this approach instead of a more comprehensive fix. 
This is probably a good start if we don't add further caching.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Consider moving to ParallelIterable in Deletes::toPositionIndex [iceberg]

2023-11-05 Thread via GitHub


rdblue closed pull request #6432: Consider moving to ParallelIterable in 
Deletes::toPositionIndex 
URL: https://github.com/apache/iceberg/pull/6432


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] added contributing.md file [iceberg-python]

2023-11-05 Thread via GitHub


Fokko commented on PR #102:
URL: https://github.com/apache/iceberg-python/pull/102#issuecomment-1793815821

   But what are your thoughts on linking from the `CONTRIBUTING.md` to the 
website? Otherwise, it is abound to get out of sync.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Bump version to 0.6.0 [iceberg-python]

2023-11-05 Thread via GitHub


Fokko merged PR #72:
URL: https://github.com/apache/iceberg-python/pull/72


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Bump version to 0.6.0 [iceberg-python]

2023-11-05 Thread via GitHub


Fokko commented on PR #72:
URL: https://github.com/apache/iceberg-python/pull/72#issuecomment-1793816199

   👍 Thanks for the review @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] added contributing.md file [iceberg-python]

2023-11-05 Thread via GitHub


onemriganka commented on PR #102:
URL: https://github.com/apache/iceberg-python/pull/102#issuecomment-1793819378

   OK sir, if you think the website is more helpful then ok... Thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-05 Thread via GitHub


stevenzwu commented on code in PR #8803:
URL: https://github.com/apache/iceberg/pull/8803#discussion_r1382659616


##
api/src/main/java/org/apache/iceberg/Scan.java:
##
@@ -77,6 +78,21 @@ public interface Scan> {
*/
   ThisT includeColumnStats();
 
+  /**
+   * Create a new scan from this that loads the column stats for the specific 
columns with each data
+   * file. If the columns set is empty or null then all column 
stats will be kept, if
+   * {@link #includeColumnStats()} is set.
+   *
+   * Column stats include: value count, null value count, lower bounds, and 
upper bounds.
+   *
+   * @param columnsToKeepStats column ids from the table's schema

Review Comment:
   +1 on using string



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-05 Thread via GitHub


stevenzwu commented on code in PR #8803:
URL: https://github.com/apache/iceberg/pull/8803#discussion_r1382661567


##
core/src/main/java/org/apache/iceberg/util/ContentFileUtil.java:
##
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.util;
+
+import java.util.Set;
+import org.apache.iceberg.ContentFile;
+
+public class ContentFileUtil {
+  private ContentFileUtil() {}
+
+  /**
+   * Copies the {@link ContentFile} with the specific stat settings.
+   *
+   * @param file a generic data file to copy.
+   * @param withStats whether to keep any stats
+   * @param columnsToKeepStats a set of column ids to keep stats. If empty or 
null then
+   * every column stat is kept.

Review Comment:
   @aokolnychyi  the proposed interfaces for `Scan` and `ContentFile` make 
sense to me. 
   
   > A collection with a single element * means we will call copy() on files.
   > Null or empty collection means we will call copyWithoutStats() on files.
   
   Regarding the collection of a single `*` element, it feels a little tacky to 
me. The util method/class takes the  two configs from the `TableScanContext` 
and implements the copy logic. 
   
   > All of the logic above can be incapsulated in a single copy method in our 
base scan.
   
   This `ContentFileUtil` was intended for that purpose of incapsulating all 
those logic. Note that `ManifestGroup` class also leverages this util method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Spark 3.5: Don't throw exception when decoding dictionary of type INT96 [iceberg]

2023-11-05 Thread via GitHub


manuzhang commented on PR #8988:
URL: https://github.com/apache/iceberg/pull/8988#issuecomment-1793942486

   @yabola @nastra PTAL, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382704220


##
core/src/test/java/org/apache/iceberg/view/ViewCatalogTests.java:
##
@@ -400,8 +400,15 @@ public void 
replaceTableViaTransactionThatAlreadyExistsAsView() {
 .buildTable(viewIdentifier, SCHEMA)
 .replaceTransaction()
 .commitTransaction())
-.isInstanceOf(NoSuchTableException.class)
-.hasMessageStartingWith("Table does not exist: ns.view");
+.satisfiesAnyOf(
+throwable ->
+assertThat(throwable)
+.isInstanceOf(NoSuchTableException.class)
+.hasMessageStartingWith("Table does not exist: ns.view"),
+throwable ->
+assertThat(throwable)

Review Comment:
   So far, REST and inmemory catalog follows one pattern (NoSuchTableException) 
and all other catalogs follows one pattern (AlreadyExistsException).
   
   I tried making Nessie to follow as REST catalog, but it breaks other 
testcases. I will check more. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382710543


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,351 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema schema =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema altered =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
schema).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(schema)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+try (NessieCatalog ignore = initCatalog(testCaseBranch)) {
+
+  IcebergView conte

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382711672


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,351 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema schema =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema altered =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
schema).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(schema)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+try (NessieCatalog ignore = initCatalog(testCaseBranch)) {
+
+  IcebergView conte

Re: [I] Flink write iceberg bug(org.apache.iceberg.exceptions.NotFoundException) [iceberg]

2023-11-05 Thread via GitHub


pvary commented on issue #5846:
URL: https://github.com/apache/iceberg/issues/5846#issuecomment-1794170117

   @lirui-apache: For the record:
   To restore the state of the Flink job, you need the previous snapshot (to 
identify the last committed snapshot), and the new data files and temporary 
manifest files (if the failure happened between `snapshotState`,  and 
`notifySnapshotCompltete`).
   
   So snapshot expiration and cleanup orphan files could also corrupt the state 
of the Flink job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382844294


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieViewOperations.java:
##
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.NoSuchViewException;
+import org.apache.iceberg.io.FileIO;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.view.BaseViewOperations;
+import org.apache.iceberg.view.ViewMetadata;
+import org.apache.iceberg.view.ViewMetadataParser;
+import org.projectnessie.client.http.HttpClientException;
+import org.projectnessie.error.NessieBadRequestException;
+import org.projectnessie.error.NessieConflictException;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Content;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergTable;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.Reference;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NessieViewOperations extends BaseViewOperations {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(NessieViewOperations.class);
+
+  private final NessieIcebergClient client;
+
+  private final ContentKey key;
+  private final FileIO fileIO;
+  private final Map catalogOptions;
+  private IcebergView icebergView;
+
+  NessieViewOperations(
+  ContentKey key,
+  NessieIcebergClient client,
+  FileIO fileIO,
+  Map catalogOptions) {
+this.key = key;
+this.client = client;
+this.fileIO = fileIO;
+this.catalogOptions = catalogOptions;
+  }
+
+  @Override
+  public void doRefresh() {
+try {
+  client.refresh();
+} catch (NessieNotFoundException e) {
+  throw new RuntimeException(
+  String.format(
+  "Failed to refresh as ref '%s' is no longer valid.", 
client.getRef().getName()),
+  e);
+}
+String metadataLocation = null;
+Reference reference = client.getRef().getReference();
+try {
+  Content content = 
client.getApi().getContent().key(key).reference(reference).get().get(key);
+  LOG.debug("Content '{}' at '{}': {}", key, reference, content);
+  if (content == null) {
+if (currentMetadataLocation() != null) {
+  throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+}
+  } else {
+this.icebergView =
+content
+.unwrap(IcebergView.class)
+.orElseThrow(
+() -> {
+  if (content instanceof IcebergTable) {
+return new AlreadyExistsException(
+"Table with same name already exists: %s in %s", 
key, reference);
+  } else {
+return new IllegalStateException(
+String.format(
+"Cannot refresh Iceberg view: Nessie points to 
a non-Iceberg object for path: %s.",
+key));
+  }
+});
+metadataLocation = icebergView.getMetadataLocation();
+  }
+} catch (NessieNotFoundException ex) {
+  if (currentMetadataLocation() != null) {
+throw new NoSuchViewException("View does not exist: %s in %s", key, 
reference);
+  }
+}
+refreshFromMetadataLocation(metadataLocation, null, 2, l -> 
loadViewMetadata(l, reference));
+  }
+
+  private ViewMetadata loadViewMetadata(String metadataLocation, Reference 
reference) {
+ViewMetadata metadata = 
ViewMetadataParser.read(io().newInputFile(metadataLocation));
+Map newProperties = Maps.newHashMap(metadata.properties());
+newProperties.put(NessieTableOperations.NESSIE_COMMIT_ID_PROPERTY, 
reference.getHash());
+
+return ViewMetadata.buildFrom(
+
ViewMetadata.buildFrom(metadata).setProperties(newProperties).build()

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r138284


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -135,71 +135,26 @@ protected void doCommit(TableMetadata base, TableMetadata 
metadata) {
 boolean newTable = base == null;
 String newMetadataLocation = writeNewMetadataIfRequired(newTable, 
metadata);
 
-String refName = client.refName();
-boolean failure = false;
+AtomicBoolean failure = new AtomicBoolean(false);

Review Comment:
   removed `AtomicBoolean` as it can be simplified. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382844582


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieUtil.java:
##
@@ -165,4 +180,77 @@ public static TableMetadata 
updateTableMetadataWithNessieSpecificProperties(
 
 return builder.discardChanges().build();
   }
+
+  static void handleExceptionsForCommits(
+  Exception exception, String refName, AtomicBoolean failure, Content.Type 
type) {
+if (exception instanceof NessieConflictException) {
+  failure.set(true);

Review Comment:
   removed `AtomicBoolean` as it can be simplified. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382844788


##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -132,74 +131,36 @@ protected void doRefresh() {
 
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
+Content content = null;
+try {
+  content =
+  
client.getApi().getContent().key(key).reference(client.getReference()).get().get(key);
+} catch (NessieNotFoundException e) {

Review Comment:
   updated.



##
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##
@@ -132,74 +131,36 @@ protected void doRefresh() {
 
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
+Content content = null;
+try {
+  content =
+  
client.getApi().getContent().key(key).reference(client.getReference()).get().get(key);
+} catch (NessieNotFoundException e) {
+  // Ignore the exception as the first commit may not have the content 
present for the key.
+}
+
+if (content != null && content.getType() == Content.Type.ICEBERG_VIEW) {

Review Comment:
   updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-05 Thread via GitHub


ajantha-bhat commented on code in PR #8909:
URL: https://github.com/apache/iceberg/pull/8909#discussion_r1382845319


##
nessie/src/test/java/org/apache/iceberg/nessie/TestNessieView.java:
##
@@ -0,0 +1,351 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.nessie;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.types.Types;
+import org.apache.iceberg.view.SQLViewRepresentation;
+import org.apache.iceberg.view.View;
+import org.assertj.core.api.Assertions;
+import org.junit.jupiter.api.AfterEach;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.projectnessie.client.ext.NessieClientFactory;
+import org.projectnessie.client.ext.NessieClientUri;
+import org.projectnessie.error.NessieNotFoundException;
+import org.projectnessie.model.Branch;
+import org.projectnessie.model.CommitMeta;
+import org.projectnessie.model.ContentKey;
+import org.projectnessie.model.IcebergView;
+import org.projectnessie.model.ImmutableTableReference;
+import org.projectnessie.model.LogResponse.LogEntry;
+
+public class TestNessieView extends BaseTestIceberg {
+
+  private static final String BRANCH = "iceberg-view-test";
+
+  private static final String DB_NAME = "db";
+  private static final String VIEW_NAME = "view";
+  private static final TableIdentifier VIEW_IDENTIFIER = 
TableIdentifier.of(DB_NAME, VIEW_NAME);
+  private static final ContentKey KEY = ContentKey.of(DB_NAME, VIEW_NAME);
+  private static final Schema schema =
+  new Schema(Types.StructType.of(required(1, "id", 
Types.LongType.get())).fields());
+  private static final Schema altered =
+  new Schema(
+  Types.StructType.of(
+  required(1, "id", Types.LongType.get()),
+  optional(2, "data", Types.LongType.get()))
+  .fields());
+
+  private String viewLocation;
+
+  public TestNessieView() {
+super(BRANCH);
+  }
+
+  @Override
+  @BeforeEach
+  public void beforeEach(NessieClientFactory clientFactory, @NessieClientUri 
URI nessieUri)
+  throws IOException {
+super.beforeEach(clientFactory, nessieUri);
+this.viewLocation =
+createView(catalog, VIEW_IDENTIFIER, 
schema).location().replaceFirst("file:", "");
+  }
+
+  @Override
+  @AfterEach
+  public void afterEach() throws Exception {
+// drop the view data
+if (viewLocation != null) {
+  try (Stream walk = Files.walk(Paths.get(viewLocation))) {
+
walk.sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
+  }
+  catalog.dropView(VIEW_IDENTIFIER);
+}
+
+super.afterEach();
+  }
+
+  private IcebergView getView(ContentKey key) throws NessieNotFoundException {
+return getView(BRANCH, key);
+  }
+
+  private IcebergView getView(String ref, ContentKey key) throws 
NessieNotFoundException {
+return 
api.getContent().key(key).refName(ref).get().get(key).unwrap(IcebergView.class).get();
+  }
+
+  /** Verify that Nessie always returns the globally-current global-content w/ 
only DMLs. */
+  @Test
+  public void verifyStateMovesForDML() throws Exception {
+//  1. initialize view
+View icebergView = catalog.loadView(VIEW_IDENTIFIER);
+icebergView
+.replaceVersion()
+.withQuery("spark", "some query")
+.withSchema(schema)
+.withDefaultNamespace(VIEW_IDENTIFIER.namespace())
+.commit();
+
+//  2. create 2nd branch
+String testCaseBranch = "verify-global-moving";
+api.createReference()
+.sourceRefName(BRANCH)
+.reference(Branch.of(testCaseBranch, catalog.currentHash()))
+.create();
+try (NessieCatalog ignore = initCatalog(testCaseBranch)) {
+
+  IcebergView conte

Re: [I] Ability to the write Metadata JSON [iceberg-python]

2023-11-05 Thread via GitHub


HonahX commented on issue #22:
URL: https://github.com/apache/iceberg-python/issues/22#issuecomment-1794191459

   Hi @Fokko. Is there an update on this issue? I am interested in taking this 
if it's still open.
   
   In terms of implementation, I was thinking of something like this:
   ```python
   def update_table_metadata(base_metadata: TableMetadata, updates: 
Tuple[TableUpdate, ...]) -> TableMetadata:
   builder = TableMetadataUpdateBuilder(base_metadata)
   for update in updates:
   builder.update_table_metadata(update)
   return builder.build()
   ```
   Does this approach align with your expectations?
   Thanks!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [PR] Clarify which columns can be used for equality delete files. [iceberg]

2023-11-05 Thread via GitHub


liurenjie1024 commented on code in PR #8981:
URL: https://github.com/apache/iceberg/pull/8981#discussion_r1382867987


##
format/spec.md:
##
@@ -842,7 +842,8 @@ The rows in the delete file must be sorted by `file_path` 
then `pos` to optimize
 
 Equality delete files identify deleted rows in a collection of data files by 
one or more column values, and may optionally contain additional columns of the 
deleted row.
 
-Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). Float and double columns cannot 
be used as delete columns in equality delete files.
+Equality delete files store any subset of a table's columns and use the 
table's field ids. The _delete columns_ are the columns of the delete file used 
to match data rows. Delete columns are identified by id in the delete file 
[metadata column `equality_ids`](#manifests). The column restrictions for 
columns used in equality delete files are the same as those for [identifier 
fields](#identifier-field-ids) with the exception that optional columns and 
columns nested under optional structs are allowed (if a 
+parent struct column is null it implies the leaf column is null).

Review Comment:
   I think there is one missing part: how null values treated in equality ids? 
Are they treat as equal or unequal? Identity ids don't allow null values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org