Jackie-Jiang commented on issue #10712: URL: https://github.com/apache/pinot/issues/10712#issuecomment-1732447841
> My suggestion would be to NOT change the hybrid table definition. Instead, keep it the same. The logical table binding should happen _before_ we branch between realtime/offline. This is basically the idea. Currently hybrid table is an implicit logical table, where it always consist of 2 physical tables - one offline and one real-time. We can keep it implicit and connecting them by the raw table name. I'm thinking we may also introduce an explicit hybrid logical table concept in addition to the implicit one where we allow connecting multiple offline tables with multiple real-time tables, but we need to design a way to represent the time boundary. > So, the query comes into the broker, we lookup if there is a physical table defined, substitute the physical table name(s) and then do further query processing. Each of the underlying physical tables could be hybrid, or realtime-only or offline-only. Hybrid table itself is already not a physical representation, so I'd prefer modeling it as a logical table, even though it is implicit. Then the abstraction would be logical table can point to multiple tables, either physical or logical. This way, we can share the same abstraction for the hybrid table management (currently it is hard-coded into 2 parts, one for real-time and one for offline). > In terms of allowed mapping, we should have something that enables mapping one logical table to one or more physical tables. If multiple physical tables are configured, then another config could say whether it the code should pick _any_ or _all_ (@egalpin 's requirement). Not sure if there is need for specific additional (configurable) logic depending on which table is picked, but we can let that ride for now. > > +1 on the brokers should recognize immediately when mapping is changed. To that effect, maybe the mapping should be stored in zookeeper, away from TableConfig. Maybe it can be under the PROPERTYSTORE/CONFIGS/CLUSTER ? It is OK if the brokers do not set a watch (perhaps preferred that way). The mapping can be updated via a controller API, and the brokers informed by the controller. Yes, mapping needs to be stored in ZK under property store. We can discuss the path in the detailed design. Broker doesn't need a watch, when the mapping is updated, controller can send a message to broker to refresh the routing (same as current table config update). > * The logical table should not be a Helix resource (intuitively). Let me know if there is a problem with this, and we can discuss further. Logical table should be a partition under `BrokerResource`, so that broker can load the mapping. > * As a consequence, the logical table cannot have a `logicalTableName_REALTIME` physical table, ever. I don't follow this. Logical table is just a mapping from a logical table name to n table resources, and it can contain any table type. > * How will table metrics be emitted? Ideally, all table level metrics should be emitted under the logical table name. Code may become a bit messy at place (emit physical table, logical table, and global metrics) Good point. We need to design this properly. Physical computation metrics can be associated with the physical table, query stats can be associated with the logical table. > * Operational tools need to be examined: If a logical table maps to a different physical table, then some of the table APIs should be modified to reflect that there is a different physical table. Not sure how this will work if there is more than one physical table. I'd imagine adding a new set of APIs for logical tables, where we only allow modifying the mapping. Physical table management might remain the same. E.g. we don't really have any API associated with the hybrid table. > * At least for a start, let us assume that all physical tables have the same schema. This can throw a wrenh into having multiple copies of the same schema, since we insist now that schema name is the same as table name. Either the restriction should be relaxed, or some way provided so that schema changes are updated for all physical tables at the same time (e.g. a schama change is allowed only on the logical table). This actually depends on the design. If the logical table is simply a mapping, then there is no schema associated with it. Schema only associates with physical table. > Instead of thinking this through peace meal, I strongly suggest we start writing a design doc, with at least the requirements part clearly identified. Yes. We just brainstorm and put some random thoughts here to be covered in the design. This should be carefully designed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org