[GitHub] [pinot] richardstartin edited a comment on pull request #7820: push JSON Path evaluation down to storage layer

GitBox Mon, 29 Nov 2021 16:00:46 -0800


richardstartin edited a comment on pull request #7820:
URL: https://github.com/apache/pinot/pull/7820#issuecomment-982140750



   > Conceptually, we are moving from a Column to Column + path but modelling 
the path as an evaluator. But the query layer is deciding the evaluator instead 
of the storage layer.
   >
   > The evaluator seems to be set at the JVM level but in practice, it's 
possible for each table to have a different evaluator and in fact, it's 
possible that each segment can have a different evaluator.
   
   The query layer does not decide, the query layer just _produces_ the 
evaluator from the query. It does not know what evaluator implementation it is 
producing, just whatever has been registered. It just won’t push the evaluator 
down unless the identifier is a column identifier rather than an expression.
   
   The evaluator, having captured information from the query, is pushed down to 
the storage layer. The storage layer then presents storage layer capabilities 
(I.e. dictionary + forward index) to the evaluator, which can interrogate and 
choose what to do with them. 
   
   Think of the evaluator as a _mediator_ between the query and storage layers 
for which any implementation can be registered via SPI. It encapsulates the 
logic of the function, and needs to know all relevant storage capabilities 
present in the deployment. The right evaluator to register for a given Pinot 
_deployment_ is the one that matches all storage capabilities in the deployment.
   
   The evaluator must decide how to evaluate its function based on what storage 
capabilities it is presented with in evaluateBlock. If evaluateBlock is 
presented with, say, a BytesDictionary, it must be able handle extracting JSON 
encoded as bytes from the dictionary and evaluating a JSON path against it, 
just as it must be able to handle reading JSON from a raw forward index, or 
from whatever ForwardIndexReader from whatever storage plugin it is presented 
with. 
   
   To illustrate the evaluator’s responsibility to understand the storage, the 
DefaultJsonPathEvaluator in this PR knows how to work with all storage 
capabilities for storing JSON currently present in Pinot (dictionarised string, 
bytes, raw string, bytes, JSON type column or not) but adding a storage plugin 
requires creating a new evaluator  which understands the custom storage but 
delegating to a default evaluator for everything else, and registering it via 
the SPI.
   
   This has the ergonomic benefit of being able to implement storage plugins 
which deviate from the ForwardIndexReader interface. Completely new access 
patterns and the necessary interfaces can be implemented without changing the 
ForwardIndexReader interface: 
   1. Implement new ForwardIndexReader with a previously inconceivable API
   2. Implement an evaluator which uses the new API for instances of that 
ForwardIndexReader, otherwise delegate to default implementation 
   3. Register the evaluator 
   4. (Optional) later enhance the ForwardIndexReader interface with method 
supporting a tried and proven access pattern, explain to everybody why it’s a 
good idea with data to refer to
   
   This circumnavigates the need to form design consensus before having a 
proven storage implementation and should reduce the number of changes to the 
ForwardIndexReader interface to those which have already been **proven to make 
sense**, while never touching the query layer once the relevant 
TransformFunction has been modified to push an evaluator down. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] richardstartin edited a comment on pull request #7820: push JSON Path evaluation down to storage layer

Reply via email to