richardstartin edited a comment on pull request #7820: URL: https://github.com/apache/pinot/pull/7820#issuecomment-982140750
> Conceptually, we are moving from a Column to Column + path but modelling the path as an evaluator. But the query layer is deciding the evaluator instead of the storage layer. > > The evaluator seems to be set at the JVM level but in practice, it's possible for each table to have a different evaluator and in fact, it's possible that each segment can have a different evaluator. The query layer does not decide, the query layer just _produces_ the evaluator from the query. It does not know what evaluator implementation it is producing, just whatever has been registered. It just won’t push the evaluator down unless the identifier is a column identifier rather than an expression. The evaluator, having captured information from the query, is pushed down to the storage layer. The storage layer then presents storage layer capabilities (I.e. dictionary + forward index) to the evaluator, which can interrogate and choose what to do with them. Think of the evaluator as a _mediator_ between the query and storage layers for which any implementation can be registered via SPI. It encapsulates the logic of the function, and needs to know all relevant storage capabilities present in the deployment. The right evaluator to register for a given Pinot _deployment_ is the one that matches all storage capabilities in the deployment. The evaluator must decide how to evaluate its function based on what storage capabilities it is presented with in evaluateBlock. If evaluateBlock is presented with, say, a BytesDictionary, it must be able handle extracting JSON encoded as bytes from the dictionary and evaluating a JSON path against it, just as it must be able to handle reading JSON from a raw forward index, or from whatever ForwardIndexReader from whatever storage plugin it is presented with. To illustrate the evaluator’s responsibility to understand the storage, the DefaultJsonPathEvaluator in this PR knows how to work with all storage capabilities for storing JSON currently present in Pinot (dictionarised string, bytes, raw string, bytes, JSON type column or not) but adding a storage plugin requires creating a new evaluator which understands the custom storage but delegating to a default evaluator for everything else, and registering it via the SPI. This has the ergonomic benefit of being able to implement storage plugins which deviate from the ForwardIndexReader interface. Completely new access patterns and the necessary interfaces can be implemented without changing the ForwardIndexReader interface: 1. Implement new ForwardIndexReader with a previously inconceivable API 2. Implement an evaluator which uses the new API for instances of that ForwardIndexReader, otherwise delegate to default implementation 3. Register the evaluator 4. (Optional) later enhance the ForwardIndexReader interface with method supporting a tried and proven access pattern, explain to everybody why it’s a good idea with data to refer to This circumnavigates the need to form design consensus before having a proven storage implementation and should reduce the number of changes to the ForwardIndexReader interface to those which have already been **proven to make sense**, while never touching the query layer once the relevant TransformFunction has been modified to push an evaluator down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org