Jackie-Jiang commented on issue #10712:
URL: https://github.com/apache/pinot/issues/10712#issuecomment-1732447841

   > My suggestion would be to NOT change the hybrid table definition. Instead, 
keep it the same. The logical table binding should happen _before_ we branch 
between realtime/offline.
   
   This is basically the idea. Currently hybrid table is an implicit logical 
table, where it always consist of 2 physical tables - one offline and one 
real-time. We can keep it implicit and connecting them by the raw table name. 
I'm thinking we may also introduce an explicit hybrid logical table concept in 
addition to the implicit one where we allow connecting multiple offline tables 
with multiple real-time tables, but we need to design a way to represent the 
time boundary.
   
   > So, the query comes into the broker, we lookup if there is a physical 
table defined, substitute the physical table name(s) and then do further query 
processing. Each of the underlying physical tables could be hybrid, or 
realtime-only or offline-only.
   
   Hybrid table itself is already not a physical representation, so I'd prefer 
modeling it as a logical table, even though it is implicit. Then the 
abstraction would be logical table can point to multiple tables, either 
physical or logical. This way, we can share the same abstraction for the hybrid 
table management (currently it is hard-coded into 2 parts, one for real-time 
and one for offline).
   
   > In terms of allowed mapping, we should have something that enables mapping 
one logical table to one or more physical tables. If multiple physical tables 
are configured, then another config could say whether it the code should pick 
_any_ or _all_ (@egalpin 's requirement). Not sure if there is need for 
specific additional (configurable) logic depending on which table is picked, 
but we can let that ride for now.
   > 
   > +1 on the brokers should recognize immediately when mapping is changed. To 
that effect, maybe the mapping should be stored in zookeeper, away from 
TableConfig. Maybe it can be under the PROPERTYSTORE/CONFIGS/CLUSTER ? It is OK 
if the brokers do not set a watch (perhaps preferred that way). The mapping can 
be updated via a controller API, and the brokers informed by the controller.
   
   Yes, mapping needs to be stored in ZK under property store. We can discuss 
the path in the detailed design. Broker doesn't need a watch, when the mapping 
is updated, controller can send a message to broker to refresh the routing 
(same as current table config update).
   
   > * The logical table should not be a Helix resource (intuitively). Let me 
know if there is a problem with this, and we can discuss further.
   
   Logical table should be a partition under `BrokerResource`, so that broker 
can load the mapping.
   
   > * As a consequence, the logical table cannot have a 
`logicalTableName_REALTIME` physical table, ever.
   
   I don't follow this. Logical table is just a mapping from a logical table 
name to n table resources, and it can contain any table type.
   
   > * How will table metrics be emitted?  Ideally, all table level metrics 
should be emitted under the logical table name. Code may become a bit messy at 
place (emit physical table, logical table, and global metrics)
   
   Good point. We need to design this properly. Physical computation metrics 
can be associated with the physical table, query stats can be associated with 
the logical table.
   
   > * Operational tools need to be examined: If a logical table maps to a 
different physical table, then some of the table APIs should be modified to 
reflect that there is a different physical table. Not sure how this will work 
if there is more than one physical table.
   
   I'd imagine adding a new set of APIs for logical tables, where we only allow 
modifying the mapping. Physical table management might remain the same. E.g. we 
don't really have any API associated with the hybrid table.
   
   > * At least for a start, let us assume that all physical tables have the 
same schema. This can throw a wrenh into having multiple copies of the same 
schema, since we insist now that schema name is the same as table name. Either 
the restriction should be relaxed, or some way provided so that schema changes 
are updated for all physical tables at the same time (e.g. a schama change is 
allowed only on the logical table).
   
   This actually depends on the design. If the logical table is simply a 
mapping, then there is no schema associated with it. Schema only associates 
with physical table.
   
   > Instead of thinking this through peace meal, I strongly suggest we start 
writing a design doc, with at least the requirements part clearly identified.
   
   Yes. We just brainstorm and put some random thoughts here to be covered in 
the design. This should be carefully designed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to