egalpin commented on issue #10712:
URL: https://github.com/apache/pinot/issues/10712#issuecomment-1693798924

   > What is the use case that drives mapping mulitple physical tables to the 
same logical table? Can you elaborate a bit? 
   
   Here are some example use cases that piqued my interest:
   
   1. I have a use case where I can liken the data to user sessions, where a 
session can be either active or closed.  I would like to be able to have _3_ 
tables which represent the total data:  upsert-enabled realtime table 
representing an active sessions, plus a hybrid table to account for realtime 
ingestion of newly closed sessions as well as historical closed sessions.  It 
isn't currently possible to query all of these tables at once, but it would be 
very nice to do so.
   2. "whale" or VIP tables, also "Priority queue".  Sometimes, certain 
customers or set of customers represent an outsized portion of data which might 
not work well to overcome with Pinot's existing partitioning.  Being able to 
isolate a certain set of customer data in a separate table that would still be 
queryable via a single table name such that those issuing queries do not need 
to have awareness of DB organization details to conditionally target the 
correct table
   3. User-managed time partitioning.  Imagine a time series dataset.  Being 
able to have a collection of tables which each holds a given time-period of 
data would be helpful operationally.
   
   > Do the physical tables have the same schema? 
   Yes I would guess so (like in the case of a hybrid table today). Or at very 
least, mutually shared columns would have the same types.  It might be ideal to 
be able to provide support for tables having a subset/superset of columns, but 
that's not a "must" feature for a v1 IMO.
   
   >How does a given query (that may only have the logical tablename) choose 
between the physical tables to run the query in?
   I believe that, at least initially, the query would strictly choose all 
physical tables with the same logical name. There might be ways to optimize 
that in the future Ex. in the above example of VIP tables, where we might be 
able to select only 1 out of many physical tables based on some fact we know 
about the table architecture and query inputs. But I don't think that would be 
a requirement of an initial version.
   
   My main priority would be the use case of being able to replace a table 
easily and seamlessly.  That wouldn't require the ability to support multiple 
physical tables with the same logical name.  That said, I can foresee making 
use of the ability to have multiple physical tables with the same logical name, 
so it would be nice to do all in one go if feasible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to