richardstartin opened a new pull request, #8603:
URL: https://github.com/apache/pinot/pull/8603

   Helix executes the following logic every time a `RuntimeJobDag` is created:
   
   ```java
     public void generateJobList() {
       resetJobListAndDependencyMaps();
       computeIndependentNodes();
       _readyJobList.addAll(_independentNodes);
       if (_isJobQueue && _readyJobList.size() > 0) {
         // For job queue, only get number of parallel jobs to run in the ready 
list.
         for (int i = 1; i < _numParallelJobs; i++) {
           if (_parentsToChildren.containsKey(_readyJobList.peekLast())) {
             
_readyJobList.offer(_parentsToChildren.get(_readyJobList.peekLast()).iterator().next());
           }
         }
       }
       _hasDagChanged = false;
     }
   ```
   
   when `_numParallelJobs` is `Integer.MAX_VALUE`, this code takes a very long 
time to execute, and most of our integration tests spend time in this method, 
because when no `WorkflowConfig` is configured, we default to 
`Integer.MAX_VALUE` 
   <img width="746" alt="Screenshot 2022-04-27 at 11 56 41" 
src="https://user-images.githubusercontent.com/16439049/165503460-518899fc-7b0b-48de-903e-203e1af9459d.png";>
   Instrumenting Helix to time RuntimeJobDag construction shows it can take up 
to 20s with very small numbers of jobs to execute!
   <img width="1141" alt="Screenshot 2022-04-27 at 12 00 01" 
src="https://user-images.githubusercontent.com/16439049/165504005-7189a737-f4d2-4aeb-9144-6054cb0eebe3.png";>
   
   Reverting to default parallelism halves the time taken to execute the 
integration test and removes `RuntimeJobDag` construction from the profile
   <img width="819" alt="Screenshot 2022-04-27 at 12 01 52" 
src="https://user-images.githubusercontent.com/16439049/165504307-95afdd9f-0ef5-468f-89de-600174f7dd01.png";>
   Traced construction times are reduced to negligible timespans.
   <img width="1008" alt="Screenshot 2022-04-27 at 12 02 49" 
src="https://user-images.githubusercontent.com/16439049/165504490-dc1e849f-c3d2-4e1b-8ade-f9e5ddb44ebb.png";>
   
   However, this prevents any parallelism so needs to be fixed in Helix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to