gaborkaszab commented on PR #16207:
URL: https://github.com/apache/iceberg/pull/16207#issuecomment-4389536373

   @grantatspothero 
   I agree with 1). In fact I've done some research and Polaris for instance 
sends a reduced snapshotLog not just reduced list of snapshots in REFS mode. 
This ATM is a broken contract between Polaris and Iceberg IMO. You're right, if 
we don't include snapshotLog into the lazy loading mechanism then the internal 
state of TableMetadata will be broken after we lazily load the snapshots.
   The reason why we haven't caught this is that in the reference IRC we always 
populated the full snapshotLog even in REFS mode (see 
`suppressHistoricalSnapshots`), so no tests actually covered a reduced 
snapshotLog.
   cc @amogh-jahagirdar to double-check but I think we have an issue here 
between Polaris and Iceberg that this PR can fix.
   
   I disagree with 2) though, or I misunderstand the motivation :) Let me 
explain:
   That change of `suppressHistoricalSnapshots ` on the REST server side is for 
testing, to simulate the behavior of a potential REST server implementation 
(unless you use the reference IRC for your production REST catalog, that I 
wouldn't advise). So what you did with that change is you simulated a REST 
catalog server that reduces the snapshotLog not just the list of snapshots.
   This doesn't mean that all the REST server implementations will work like 
this once the PR is merged. There could be ones that still send a full 
snapshotLog. That's why I said that for REST servers that reduced the 
snapshotLog, your memory concerns are already gone (we'd still need the lazy 
loading, though), but for REST servers that send the full snapshotLog we still 
load the full log on the client side. This is what I tried to show with my 
commit, that if a server still sends the full list, then this PR doesn't solve 
the caching issue.
   
   I might overthink this. In case you have full control on the REST catalog 
server implementation (i.e. you have your own proprietary implementation) then 
you're fine here. In case you don't and you connect to any REST catalogs out 
there, then you have no control on whether they send filtered or full 
snapshotLog in REFS mode.
   
   Sorry if I got this too long or complicated, hope it makes sense!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to