jeesou commented on PR #11040: URL: https://github.com/apache/iceberg/pull/11040#issuecomment-2477244116
Hi @karuppayya , @amogh-jahagirdar as per our discussion to introduce a config to let users decide if they are fine with best effort search, I was thinking of adding a kind of threshold that the user can decide, as per the amount of data change. I have written some code as example, the diff can be seen here - https://github.com/karuppayya/iceberg/compare/fix_snapshot...jeesou:fix_snapshot_modifications?expand=1 i created the config as OLD_STATISTICS_USAGE_THRESHOLD_PERCENTAGE Basically it tries and finds the last Snapshot for which statistics is present, and the amount of data changed in between. Currently if any deletion is happening I am not using the old statistics, as deletion can be unpredictable, and this needs fine-tuning, for other operations, I make a record of amount of data change and check whether the change is within the specified threshold. Default value is 100 which means by default it will never use the old existing stats. kindly check once and please do suggest improvements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org