dchristle commented on issue #3703:
URL: https://github.com/apache/iceberg/issues/3703#issuecomment-1402362942

   I'm following up to say I got `deleteOrphanFiles` to complete successfully. 
After bumping the memory, I was confused why I didn't see any output in the 
logs from an occasional `RetryHttpInitializer: Encountered status code 503 when 
sending DELETE request to URL` error. I let it run for more than 24 hours; it 
seemed like the driver was hung rather than deleting any orphan files. 
   
   In other GitHub issues on deleting orphan files, increasing the number of 
threads is mentioned. I modified my Spark job to do this with 
`.executeDeleteWith`:
   
   ```
   val executorService = Executors.newFixedThreadPool(30)
   
   SparkActions
       .get()
       .deleteOrphanFiles(icebergTable)
       .executeDeleteWith(executorService)
       .execute()
   ```
   
   The frequency of the 503 retry errors went up. My interpretation is these 
errors have some small fixed probability of occurring on a Google Storage 
delete operation. Since there are now 30 concurrent delete operations, the log 
message is seen more frequently.
   
   I let this new job run for about 36 hours & it finished deleting orphan 
files successfully. I wonder if there's some way to emit periodic log messages 
indicating the number of files that have been deleted, perhaps every 5 minutes. 
Once my driver had sufficient memory, the deletes were likely happening 
correctly, but as a user, I was confused when I didn't see any log output. The 
delete orphan files operation is different from other maintenance operations -- 
it can't be seen in the Spark UI as a job or stage. 
   
   Any thoughts on adding some periodic log outputs? @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to