GitHub user jiaqizho edited a discussion: [Proposal] ORCA: remove the CJobQueue

### Proposers

jiaqizho

### Proposal Status

Under Discussion

### Abstract

About CJobQueue
----
**As I understand it, `CJobQueue` is used to prevent do the same `Cjob`**. 

In CJob.h:L51 desc:
```
//              Job Queue:
//              Each job maintains a job queue CJob::m_pjq of other identical 
jobs that
//              are created while a given job is executing. For example, when 
exploring
//              a group, a group exploration job J1 would be executing. 
Concurrently,
//              another group exploration job J2 (for the same group) may be 
triggered
//              by another worker. The job J2 would be added in a pending state 
to the
//              job queue of J1. When J1 terminates, all jobs in its queue are 
notified
//              to pick up J1 results.
```

Also let me explain the logic of `JobQueue` in detail.

Before we start, let me introduce two type of `CJob` in `CJobQueue`
- `Queued Cjob`: the position in  `CJobQuque` is >= 1
- `Main CJob`: the position in  `CJobQuque` is 0

If the `CJob` being executed is the `Queued Cjob`
- Only the same `Cjob` Type(Exploration/Implementation) in the same `Cgroup` 
will be added to the `CJobQuque`
- Whatever the result of `Cjob->FExecute()` is , the `Queued CJob` will always 
return the false in `CScheduler::FExecute`.
- After `CScheduler::FExecute`, the `Queued CJob` will change to the  
`EjrSuspended` state, until the `Main CJob` release it.
- Once `Main CJob` call the `NotifyCompleted`, pop the `Queued CJob`  and it 
will resume the parent. it's parent still push its StateMachine, because the 
`Main CJob` already do the same work in the `CGroup`

If the `CJob` being executed is the `Main Cjob`, it can be executed after 
StateMachine finished and call the  `NotifyCompleted` logic.

Git history
----
I looked through the git history of `CJobQuque`. it's added in the first 
version of ORCA(76feb99efdc92bf2a779d58c656d8894908223ab gpdb). In early 
versions of ORCA, concurrent execution existed until the commit Remove 
multi-threading code (#510)(61c7405ac737ce74804d57d8cd6c930219e8b124). Before 
this PR, `CScheduler` can get the `Queued Cjob`, cause `CScheduler` will use 
the multi-threading to process the waitting list, and some of job may insert 
into the same `CJobQueue`.  

Proposal 
----

In current version of CBDB, **the ORCA is single-thread mode**. So the size of 
each `CJobQuque` is always 1, also in  `CScheduler` won't get any `Queued 
Cjob`. I have verified this pointer in 
PR(https://github.com/apache/cloudberry/pull/742, Although ORCA icw tests are 
not enabled in GITHUB CI, I have verified on my own machine that ICW-orca test 
does not trigger these assert(false)). 

There are two reasons to remove `CJobQueue`:
- `CJobQueue` is dead logic of in current version ORCA.
- GP has proven that parallelization in `CSchedule` itself is a non-profitable 
endeavor. Maybe we should think of other acceleration processes.

Impact on cherry-pick GP
----

This content has not been changed in subsequent GP versions.  left is CBDB(show 
46 search results), right is GPDB(show 46 search results).

<img width="698" alt="image" 
src="https://github.com/user-attachments/assets/ba75b6ee-9dfc-4d62-bf43-3639084465c1";>




### Motivation

pass

### Implementation

pass

### Rollout/Adoption Plan

_No response_

### Are you willing to submit a PR?

- [X] Yes I am willing to submit a PR!

GitHub link: https://github.com/apache/cloudberry/discussions/743

----
This is an automatically sent email for dev@cloudberry.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@cloudberry.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: dev-h...@cloudberry.apache.org

Reply via email to