jeesou commented on PR #11040:
URL: https://github.com/apache/iceberg/pull/11040#issuecomment-2477244116

   Hi @karuppayya , @amogh-jahagirdar as per our discussion to introduce a 
config to let users decide if they are fine with best effort search, I was 
thinking of adding a kind of threshold that the user can decide, as per the 
amount of data change.
   
   I have written some code as example, the diff can be seen here - 
https://github.com/karuppayya/iceberg/compare/fix_snapshot...jeesou:fix_snapshot_modifications?expand=1
   
   i created the config as OLD_STATISTICS_USAGE_THRESHOLD_PERCENTAGE
   
   Basically it tries and finds the last Snapshot for which statistics is 
present, and the amount of data changed in between.
   
   Currently if any deletion is happening I am not using the old statistics, as 
deletion can be unpredictable, and this needs fine-tuning, for other 
operations, I make a record of amount of data change and check whether the 
change is within the specified threshold. Default value is 100 which means by 
default it will never use the old existing stats.
   
   kindly check once and please do suggest improvements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to