cccs-jc commented on PR #8980:
URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1853964957
so I did more digging. On our production tables I search for all manifests
which have a `existing_data_files_count > 0` and `added_data_files_count > 0`
and I find none. This leads me to believe that a commit will either be an
append with `added_data_files_count` **or** a rewrite with
`existing_data_files_count` .
This query returns no results:
```sql
select
distinct added_snapshot_id
from
catalog1.schema1.table1.manifests
where
existing_data_files_count > 0
and added_data_files_count > 0
```
I can search for manifests which have `existing_data_files_count > 0` and
join those results to the snapshots.
```sql
select
*
from
catalog1.schema1.table1.snapshots
where
snapshot_id in (
select
distinct added_snapshot_id
from
catalog1.schema1.table1.manifests
where
existing_data_files_count > 0
)
```
Manifests with the snapshot_id they belong to

Their corresponding snapshots are all rewrite snapshots:

When streaming we skip over rewrites snapshots. Thus we will never encounter
a manifest with an `existing_data_files_count > 0`.
So this calling this in the code does nothing `+ existingFilesCount();`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]