siddharthteotia commented on a change in pull request #6886: URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r627771754
########## File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java ########## @@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() { DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable(); mainThreadTimer.stop(); - long totalThreadCpuTimeNs = - intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs(); + long endWallClockTimeNs = System.nanoTime(); + + long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs(); + long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs(); + long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs; + long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs; + /* + * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process + * a query, this is because system activities time such as OS paging, GC, context switching are not captured by + * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases. + * - phase 1: single thread preparing. Time used: T1 + * - phase 2: N threads processing segments in parallel, each thread use time T2 + * - phase 3: GC/OS paging. Time used: T3 + * - phase 4: single thread merging intermediate results blocks. Time used: T4 + * + * Then we have following equations: + * - singleThreadCpuTimeNs = T1 + T4 + * - multipleThreadCpuTimeNs = T2 * N + * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3 + * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N + * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1) + * + * Thus: + * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs + * = totalThreadCpuTimeNsWithoutSystemActivities + T3 + * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1) + * = totalWallClockTimeNs + T2 * (N - 1) + * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1) + */ + if (totalThreadCpuTimeNs < totalWallClockTimeNs) { Review comment: System activity (GC, OS paging , lock wait etc) that consumes CPU is always there. Based on the verification that we did in production, for the case when totalThreadCpuTime << wallClock, there is more of it. However, even if totalThreadCpuTime > wallClock which we saw in many cases, there is still some cpu activity unaccounted for and we should try to include it. So, let's do this adjustment in both the cases. Let's not have this if check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org