mcvsubbu commented on issue #10712:
URL: https://github.com/apache/pinot/issues/10712#issuecomment-1732365406

   Can you fill me in on what "to solve the hybrid table" means?
   
   My suggestion would be to NOT change the hybrid table definition. Instead, 
keep it the same. The logical table binding should happen _before_ we branch 
between realtime/offline. 
   
   So, the query comes into the broker, we lookup if there is a physical table 
defined, substitute the physical table name(s) and then do further query 
processing. Each of the underlying physical tables could be hybrid, or 
realtime-only or offline-only.
   
   In terms of allowed mapping, we should have something that enables mapping 
one logical table to one or more physical tables. If multiple physical tables 
are configured, then another config could say whether it the code should pick 
_any_ or _all_ (@egalpin 's requirement). Not sure if there is need for 
specific additional (configurable) logic depending on which table is picked, 
but we can let that ride for now.
   
   +1 on the brokers should recognize immediately when mapping is changed.  To 
that effect, maybe the mapping should be stored in zookeeper, away from 
TableConfig. Maybe it can be under the PROPERTYSTORE/CONFIGS/CLUSTER ? It is OK 
if the brokers do not set a watch (perhaps preferred that way). The mapping can 
be updated via a controller API, and the brokers informed by the controller.
   
   Some other random thoughts:
   - The logical table should not be a Helix resource (intuitively). Let me 
know if there is a problem with this, and we can discuss further.
   - As a consequence, the logical table cannot have a 
`logicalTableName_REALTIME` physical table, ever.
   - How will table metrics be emitted?  Ideally, all table level metrics 
should be emitted under the logical table name. Code may become a bit messy at 
place (emit physical table, logical table, and global metrics)
   - Operational tools need to be examined: If a logical table maps to a 
different physical table, then some of the table APIs should be modified to 
reflect that there is a different physical table. Not sure how this will work 
if there is more than one physical table.
   - At least for a start, let us assume that all physical tables have the same 
schema. This can throw a wrenh into having multiple copies of the same schema, 
since we insist now that schema name is the same as table name. Either the 
restriction should be relaxed, or some way provided so that schema changes are 
updated for all physical tables at the same time (e.g. a schama change is 
allowed only on the logical table).
   
   Instead of thinking this through peace meal,  I strongly suggest we start 
writing a design doc, with at least the requirements part clearly identified. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to