Manis99803 opened a new issue #7843:
URL: https://github.com/apache/pinot/issues/7843


   Should the Dimension table honor replication at all or not (for quota), 
because looks like it don't seem to apply it.
   
   For example, say if I am having a pinot cluster with 4 servers, on creating 
a dimension table with storage configured to be 200 MB. Since this is a 
dimension table, the table is replicated on all the servers. 
   
   Now, if we populate this dimension table with segments having 100k records 
and the uncompressed size of this segment being ~23 MB (232 byte for each 
record in the segment).
   So as per the calculation, the expectation was the table will be able to 
hold around ~900k records. However, that is not the case. The table is not 
processing more than 2 segments, i.e, when trying to push 3rd segment Pinot is 
throwing an error, looks like the following calculations are happening:
   
   ```
   23 MB (Size of each segment) * 2 (Number of segment already present) * 4 
(since we have 4 nodes and dimension table is copied on all the nodes):
   = 23 * 2 * 4 = 184 MB
   Size of the new incoming uncompressed segment = 23 MB,
   so the newly estimated size = 184 + 23 = 207, this is greater than 200 MB 
that is why Pinot is not processing any more segment of 100k records.
   ```
   
   On checking the code configuration [Code 
Link](https://github.com/apache/pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/validation/StorageQuotaChecker.java#L83),
 numReplicas is being fetched from the table configuration and the allowedSize 
is getting calculated. 
   If a table is replicated on 4 servers then the replication factor should be 
taken to be 4 while computing the allowedSize and not to be taken from the 
table configuration as the user might just specify 1 as the replication factor 
despite having more than 1 server in the setup
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to