cbalci opened a new pull request #6346:
URL: https://github.com/apache/incubator-pinot/pull/6346


   ## Description
   Adding `DimensionTableDataManager` for managing data access for 'Dimension 
Tables'. It will be used by the upcoming `LookupTransformUDF` as outlined in 
[Lookup UDF Join In 
Pinot](https://docs.google.com/document/d/1InWmxbRqwcqIakzvoEWHLxtX4XR9H5L01256EbAUHV8/edit)
 document.
   
   This is a followup to the PR: [Adding offline dimension table creation and 
segment assignment](https://github.com/apache/incubator-pinot/pull/6286/files) 
and will be followed by a third PR to create the `LookupTransformFunction` 
soon. If you'd like to see a full end-to-end proof of concept implementation 
please take a look at [here](https://github.com/cbalci/incubator-pinot/pull/1).
   
   `DimensionTableDataManager` is implemented as an extension of 
`OfflineTableDataManager` since Dimension tables are modeled as Offline tables 
with a couple additional features. `DimensionTableDataManager` has a private 
constructor and its Instances are 'per-table' singletons which are 
created/accessed via static methods: `createInstanceByTableName` and 
`getInstanceByTableName`. This is to make it possible for the UDF functions to 
be able to access Dimension tables, without having to change the 
`TransformFunction` interface, which would have been very intrusive.
   
   In this implementation, `DimensionTableDataManager` simply loads the 
contents of a Dimension table into a HashMap on `addSegment` hook. Entries are 
keyed by table's PrimaryKey, and are available for querying via method 
`lookupRowByPrimaryKey`.
   
   A couple of known shortcomings:
   - Size of the table is not bounded: This will be addressed in a separate 
work as quota config enforcement at table creation flow.
   - Schema changes require server restart: This can be addressed in the next 
iteration of the feature, once basic feature is finalized.
   
   ## Testing
   
   - Unit tests are added
   - A `JoinQuickStart` is added for local manual testing.
   - A manual end-to-end test is done in the [POC 
implementation](https://github.com/cbalci/incubator-pinot/pull/1) by loading 
>1MM items on a trivial dim table.
   
   Please take a look
   
   ## Documentation
   * Will be added with the next PR, 'LookupTransformUDF'.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to