[PR] Query thread context and cid [pinot]

via GitHub Tue, 04 Mar 2025 02:30:37 -0800


gortiz opened a new pull request, #15180:
URL: https://github.com/apache/pinot/pull/15180


   # Thread context
   
   This PR adds two new classes: `QueryThreadContext` and 
`MseWorkerThreadContext`. As their name implies, these classes are designed to 
be context attached to the current thread. These classes include some 
information I consider useful, but we can discuss them if needed. 
   
   This opens the ability to change the programming model. We could remove some 
parameters or attributes from contextual objects. For example, the request id 
is already present in several contextual objects we usually send to other 
methods. With this new introduction, these contextual objects could be removed 
or simplified as long as we assume a thread will only be executing a single 
query, which seems a natural assumption. In this PR, I tried not to apply that 
change. Contextual objects are not modified. I did that mainly because I want 
to keep the code clean but also because it is difficult to be 100% sure that 
the thread local object is initialized, especially on tests.
   
   As explained in the Javadoc of both `QueryThreadContext` and 
`MseWorkerThreadContext`, these classes are a bit more complex than expected 
internally to create a nice and especially safe API that simplifies their use. 
These classes need to be open when the request is received, and then they may 
be copied into other threads as needed (for example, when the combine operator 
is parallelized for each segment). The cleanest way to do this is to decorate 
an executor to copy the thread local context from the parent. That is done in 
`QueryThreadContext.contextAwareExecutorService` (and the same method in 
`MseWorkerThreadContext`). Given that we do not always control the thread 
context being used, sometimes we need to explicitly add try-with-resources. 
This is the case in some of the _framework hooks_, for example, when Netty or 
GPRC calls our code.
   
   # CID
   
   Another change in the PR is to define a new _cid_, which could be understood 
as _client id_ or _correlation id_. The difference between this id and the 
request id is explained in javadoc, but the TL;DR: is:
   1. Cid can be optionally set by clients. If they don't, the broker assigned 
request id is used instead.
   2. Cid doesn't have to be unique. Clients can decide to use the same cid for 
different queries.
   3. Cid can be any string, while request ids are longs.
   4. Cid is constant for all query execution, while request id is usually set 
in the broker but changes during the query lifecycle (ie hybrid tables or MSE 
and leaves).
   
   This cid is sent from brokers to servers in the netty SSE protocol and 
worker MSE (and GRPC-based) protocol. We still don't have support for setting 
cid in all the different protocols being used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[PR] Query thread context and cid [pinot]

Reply via email to