XN137 commented on PR #2262:
URL: https://github.com/apache/polaris/pull/2262#issuecomment-3175447981

   @dennishuo thanks a lot of the detailed reply!
   
   `BasePersistence.listEntities` has 3 variants:
   
   ```
   Page<EntityNameLookupRecord> listEntities(..., PageToken);
   
   Page<EntityNameLookupRecord> listEntities(..., Predicate<PolarisBaseEntity>, 
PageToken)
   
   <T> Page<T> listEntities(..., Predicate<PolarisBaseEntity>, 
Function<PolarisBaseEntity, T>, PageToken);
   ```
   
   I understand your point that method 1 was supposed to be used by 
"name-listing" polaris code to only retrieve the entity properties required for 
building a `EntityNameLookupRecord` (i.e. most prominently `listTables` or 
`listNamespaces`).
   while method 3 needs to load the full entity properties to support the 
predicate and transformer parameters.
   
   so method 1 can be faster and cheaper based on the persistence 
implementation.
   
   as your comments also pointed out method 2 is weird as it supports a 
`Predicate<PolarisBaseEntity>` which requires loading the full entity, but then 
throws most of the loaded entity properties away to only return a 
`EntityNameLookupRecord`.
   so any performance benefit of the latter are lost due to the former.
   
   due to the above fact, in our current codebase method 2 is frequently 
implemented as forwarding to method 3.
   
   also on current `main` method 1 has no callers at all, as `listTables` for 
example needs to pass in a `PolarisEntitySubType subType` filter and is thus 
using method 2:
   
https://github.com/apache/polaris/blob/2f985ab93ccd65b0c98fdd8902e217792c622a68/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java#L2574-L2591
   I am guessing this changed with the introduction of GenericTables and/or the 
rework of pagination.
   
   this also means that even if a private `PolarisMetaStoreManager` impl is 
using method 1, its a bit of a private implementation detail as no general 
polaris code is calling method 1 directly.
   
   the above is an explanation as to how I arrived at this PR suggesting the 
removal of method 1 (which is unused) and method 2 (which is forwarding to 
method 3).
   
   As for a proper way forward that keeps the performance benefit in mind I 
have put up 2 PRs:
   
   https://github.com/apache/polaris/pull/2317
   allows the removal of method 2 by letting the other methods support a 
`PolarisEntitySubType subType` parameter that can even be pushed down into the 
queries.
   
   https://github.com/apache/polaris/pull/2290
   removes a common anti-pattern in the codebase where callers do `listEntites 
- stream - loadEntity` which also defeats any performance optimizations that 
`listEntites` might have.
   
   I would appreciate it, if you took the time to look at these PRs as they 
should address a lot of the shortcomings identified as part of this discussion.
   After receiving your feedback we can just close this PR (as it's mostly 
replaced by the first PR mentioned above).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to