Ryan19929 opened a new pull request, #61710:
URL: https://github.com/apache/doris/pull/61710

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   Doris currently only allows one backup/restore job per database at a time, 
which becomes a bottleneck in CCR scenarios where dozens of tables need 
concurrent synchronization.
   
   This PR implements table-level backup/restore concurrency control (gated by 
`enable_table_level_backup_concurrency`, default `false`):
   
   1. **Dual-queue scheduling** — Splits the single job queue into Running 
queue (active jobs) + History queue (finished jobs). New jobs enter PENDING 
state; the scheduler promotes PENDING jobs to `allowedJobIds` for execution 
based on the following rules. Jobs exceeding `backup_pending_job_timeout_ms` in 
queue are auto-cancelled.
   
   2. **OOM protection** — Extends `max_backup_tablets_per_job` to RestoreJob; 
adds `max_concurrent_snapshot_tasks_total` for global snapshot task cap across 
all concurrent jobs.
   
   3. **CANCEL label filter** — Supports `CANCEL BACKUP/RESTORE WHERE LABEL = 
'xxx'` and `WHERE LABEL LIKE 'xxx%'` to cancel specific jobs. Without WHERE 
clause, cancels all matching jobs in the database (behavior change only in 
concurrency mode; legacy mode unchanged).
   
   4. **Observability** — Adds `QueuePos` and `BlockReason` columns to `SHOW 
BACKUP/RESTORE`.
   
   5. **CCR compatibility** — Derives `tableRefs` from `BackupJobInfo` when RPC 
requests omit `table_refs`, ensuring correct table-level concurrency control.
   
   **Scheduling Rules:**
   
   | Rule | Condition | Behavior |
   |------|-----------|----------|
   | Concurrency limit | `runningBackups + runningRestores >= 
max_backup_restore_concurrent_num_per_db` | Block (PENDING) |
   | Backup/Restore mutual exclusion | Backup cannot activate while any restore 
is running, and vice versa | Block (PENDING) |
   | Full-db backup exclusivity | Full-db backup already pending/running → new 
backup submitted | **Reject** |
   | Full-db backup waiting | Full-db backup submitted while table-level 
backups are running | Block (PENDING) |
   | Full-db restore exclusivity | Full-db restore running → other restores 
submitted | Block (PENDING) |
   | Restore table conflict | Two restores targeting the same table | Block 
(PENDING) |
   
   **Design Note:** Full-database backup submissions are hard-rejected when one 
already exists, because the backup snapshot is reusable — any subsequent backup 
would be redundant. Full-database restore submissions are queued (PENDING) 
instead, because each restore may target a different snapshot and has 
independent business value — they just cannot run concurrently.
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [X] Regression test
       - [X] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [X] Yes. <!-- Explain the behavior change -->
   concurrency mode only: `CANCEL BACKUP/RESTORE` without WHERE clause cancels 
all unfinished jobs of that type in the database, instead of just the single 
running job. This only applies when `enable_table_level_backup_concurrency = 
true`; legacy mode behavior is unchanged.
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to