danielhumanmod opened a new pull request, #2003:
URL: https://github.com/apache/polaris/pull/2003

   Fix #774 
   # Context
   
   Polaris uses async tasks to perform operations such as table and manifest 
file cleanup. These tasks are executed asynchronously in a separate thread 
within the same JVM, and retries are handled inline within the task execution. 
However, this mechanism does not guarantee eventual execution in the following 
cases:
   - The task fails repeatedly and hits the maximum retry limit.
   - The service crashes or shuts down before retrying.
   
   # Implementation
   Persist failed tasks and introduce a retry mechanism triggered during 
Polaris startup and via periodic background checks, changes included:
   1. **Metastore Layer:** 
      - Exposes a new API `getMetaStoreManagerMap`
      - Ensures `LAST_ATTEMPT_START_TIME` set for each task entity creation, 
which is important for time-out filtering when `loadTasks()` from metastore, so 
that prevent multiple executors from picking the same task
   2. **`TaskRecoveryManager`:** New class responsible for task recovery logic, 
including:
      - Constructing execution`PolarisCallContext`
      - Loading tasks from metastore
      - Triggering task execution
   3. **`QuarkusTaskExecutorImpl`:** Hook into application lifecycle to 
initiate task recovery.
   4. **Task Retry Strategy:** Failed tasks remain persisted in the metastore 
and are retried by the recovery manager.
   5. **Tests:** Adjusted existing tests and added new coverage for recovery 
behavior.
   
   # Recommended Review Order
   1. Metastore Layer related code
   2. `TaskRecoveryManager`
   3. `QuarkusTaskExecutorImpl` and `TaskExecutorImpl`
   4. Task cleanup handlers
   5. Tests
   <!--
       Possible security vulnerabilities: STOP here and contact 
[email protected] instead!
   
       Please update the title of the PR with a meaningful message - do not 
leave it "empty" or "generated"
       Please update this summary field:
   
       The summary should cover these topics, if applicable:
       * the motivation for the change
       * a description of the status quo, for example the current behavior
       * the desired behavior
       * etc
   
       PR checklist:
       - Do a self-review of your code before opening a pull request
       - Make sure that there's good test coverage for the changes included in 
this PR
       - Run tests locally before pushing a PR (./gradlew check)
       - Code should have comments where applicable. Particularly 
hard-to-understand
         areas deserve good in-line documentation.
       - Include changes and enhancements to the documentation (in 
site/content/in-dev/unreleased)
       - For Work In Progress Pull Requests, please use the Draft PR feature.
   
       Make sure to add the information BELOW this comment.
       Everything in this comment will NOT be added to the PR description.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to