sajjad-moradi opened a new pull request #6546: URL: https://github.com/apache/incubator-pinot/pull/6546
## Description Currently Real Time Provisioning Helper tool takes a completed segment as an input. With the changes in this PR, a user can provide data characteristics instead of an actual segment. With this option, the tool does a preprocessing step and generates a segment based on the provided characteristics. After the segment is generated, it just uses that segment to provide insight on the memory footprint as usual. That main changes in the code: - refactored a few existing `Generator`s in `data/generator` package and also added a couple of new ones - added `Segment Generator` to `Memory Estimator` - modified `RealtimeProvisioingHelperCommand` ## Testing Done - Unit tests - Ran `pinot-admin RealtimeProvisioningHelper` locally with the same files provided in the unit tests (1M rows): ```bash 2021/02/04 15:13:39.243 INFO [RealtimeProvisioningHelperCommand] [main] Executing command: RealtimeProvisioningHelper -tableConfigFile table-config.json -numPartitions 10 -pushFrequency null -numHosts 2,4,6,8,10,12,14,16 -numHours 2,3,4,5,6,7,8,9,10,11,12 -schemaFile schema.json -dataCharacteristicsFile data-characteristics.json -ingestionRate 150 -maxUsableHostMemory 48G -retentionHours 48 2021/02/04 15:13:41.549 INFO [MemoryEstimator$SegmentGenerator] [main] Successfully generated data file: /var/folders/sd/fgc60hhj2994pk9vm1xw235h000xqy/T/2021-02-04_15:13:39-csv/output_0.csv 2021/02/04 15:13:41.549 INFO [MemoryEstimator$SegmentGenerator] [main] Started creating segment from file: /var/folders/sd/fgc60hhj2994pk9vm1xw235h000xqy/T/2021-02-04_15:13:39-csv/output_0.csv 2021/02/04 15:13:49.084 INFO [MemoryEstimator$SegmentGenerator] [main] Successfully created segment: testTable_18667_18766_0 at directory: /var/folders/sd/fgc60hhj2994pk9vm1xw235h000xqy/T/2021-02-04_15:13:39-segment/testTable_18667_18766_0 2021/02/04 15:13:49.085 INFO [MemoryEstimator$SegmentGenerator] [main] Verifying the segment by loading it 2021/02/04 15:13:49.161 INFO [MemoryEstimator$SegmentGenerator] [main] Successfully loaded segment: testTable_18667_18766_0 of size: 18766286 bytes ============================================================ RealtimeProvisioningHelper -tableConfigFile table-config.json -numPartitions 10 -pushFrequency null -numHosts 2,4,6,8,10,12,14,16 -numHours 2,3,4,5,6,7,8,9,10,11,12 -schemaFile schema.json -dataCharacteristicsFile data-characteristics.json -ingestionRate 150 -maxUsableHostMemory 48G -retentionHours 48 Note: * Table retention and push frequency ignored for determining retentionHours since it is specified in command * See https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime 2021/02/04 15:13:53.141 INFO [RealtimeProvisioningHelperCommand] [main] Memory used per host (Active/Mapped) numHosts --> 2 |4 |6 |8 |10 |12 |14 |16 | numHours 2 --------> 7.67G/17.86G |4.09G/9.53G |2.56G/5.95G |2.05G/4.76G |1.53G/3.57G |1.53G/3.57G |1.53G/3.57G |1.02G/2.38G | 3 --------> 8.03G/18.22G |4.28G/9.72G |2.68G/6.07G |2.14G/4.86G |1.61G/3.64G |1.61G/3.64G |1.61G/3.64G |1.07G/2.43G | 4 --------> 8.38G/18.58G |4.47G/9.91G |2.79G/6.19G |2.24G/4.95G |1.68G/3.72G |1.68G/3.72G |1.68G/3.72G |1.12G/2.48G | 5 --------> 9.02G/18.93G |4.81G/10.1G |3.01G/6.31G |2.41G/5.05G |1.8G/3.79G |1.8G/3.79G |1.8G/3.79G |1.2G/2.52G | 6 --------> 9.1G/19.29G |4.85G/10.29G |3.03G/6.43G |2.43G/5.14G |1.82G/3.86G |1.82G/3.86G |1.82G/3.86G |1.21G/2.57G | 7 --------> 9.59G/20.5G |5.12G/10.93G |3.2G/6.83G |2.56G/5.47G |1.92G/4.1G |1.92G/4.1G |1.92G/4.1G |1.28G/2.73G | 8 --------> 9.81G/20G |5.23G/10.67G |3.27G/6.67G |2.62G/5.33G |1.96G/4G |1.96G/4G |1.96G/4G |1.31G/2.67G | 9 --------> 11.01G/21.21G |5.87G/11.31G |3.67G/7.07G |2.94G/5.66G |2.2G/4.24G |2.2G/4.24G |2.2G/4.24G |1.47G/2.83G | 10 --------> 10.8G/20.71G |5.76G/11.05G |3.6G/6.9G |2.88G/5.52G |2.16G/4.14G |2.16G/4.14G |2.16G/4.14G |1.44G/2.76G | 11 --------> 11.87G/21.21G |6.33G/11.31G |3.96G/7.07G |3.16G/5.66G |2.37G/4.24G |2.37G/4.24G |2.37G/4.24G |1.58G/2.83G | 12 --------> 11.23G/21.43G |5.99G/11.43G |3.74G/7.14G |3G/5.71G |2.25G/4.29G |2.25G/4.29G |2.25G/4.29G |1.5G/2.86G | 2021/02/04 15:13:53.142 INFO [RealtimeProvisioningHelperCommand] [main] Optimal segment size numHosts --> 2 |4 |6 |8 |10 |12 |14 |16 | numHours 2 --------> 19.33M |19.33M |19.33M |19.33M |19.33M |19.33M |19.33M |19.33M | 3 --------> 29M |29M |29M |29M |29M |29M |29M |29M | 4 --------> 38.66M |38.66M |38.66M |38.66M |38.66M |38.66M |38.66M |38.66M | 5 --------> 48.33M |48.33M |48.33M |48.33M |48.33M |48.33M |48.33M |48.33M | 6 --------> 57.99M |57.99M |57.99M |57.99M |57.99M |57.99M |57.99M |57.99M | 7 --------> 67.66M |67.66M |67.66M |67.66M |67.66M |67.66M |67.66M |67.66M | 8 --------> 77.32M |77.32M |77.32M |77.32M |77.32M |77.32M |77.32M |77.32M | 9 --------> 86.99M |86.99M |86.99M |86.99M |86.99M |86.99M |86.99M |86.99M | 10 --------> 96.65M |96.65M |96.65M |96.65M |96.65M |96.65M |96.65M |96.65M | 11 --------> 106.32M |106.32M |106.32M |106.32M |106.32M |106.32M |106.32M |106.32M | 12 --------> 115.98M |115.98M |115.98M |115.98M |115.98M |115.98M |115.98M |115.98M | 2021/02/04 15:13:53.144 INFO [RealtimeProvisioningHelperCommand] [main] Consuming memory numHosts --> 2 |4 |6 |8 |10 |12 |14 |16 | numHours 2 --------> 1.16G |632.16M |395.1M |316.08M |237.06M |237.06M |237.06M |158.04M | 3 --------> 1.66G |904.1M |565.06M |452.05M |339.04M |339.04M |339.04M |226.02M | 4 --------> 2.15G |1.15G |735.02M |588.02M |441.01M |441.01M |441.01M |294.01M | 5 --------> 2.65G |1.41G |904.98M |723.99M |542.99M |542.99M |542.99M |361.99M | 6 --------> 3.15G |1.68G |1.05G |859.96M |644.97M |644.97M |644.97M |429.98M | 7 --------> 3.65G |1.95G |1.22G |995.93M |746.94M |746.94M |746.94M |497.96M | 8 --------> 4.15G |2.21G |1.38G |1.11G |848.92M |848.92M |848.92M |565.95M | 9 --------> 4.64G |2.48G |1.55G |1.24G |950.9M |950.9M |950.9M |633.93M | 10 --------> 5.14G |2.74G |1.71G |1.37G |1.03G |1.03G |1.03G |701.92M | 11 --------> 5.64G |3.01G |1.88G |1.5G |1.13G |1.13G |1.13G |769.9M | 12 --------> 6.14G |3.27G |2.05G |1.64G |1.23G |1.23G |1.23G |837.89M | 2021/02/04 15:13:53.145 INFO [RealtimeProvisioningHelperCommand] [main] Total number of segments queried per host (for all partitions) numHosts --> 2 |4 |6 |8 |10 |12 |14 |16 | numHours 2 --------> 360 |192 |120 |96 |72 |72 |72 |48 | 3 --------> 240 |128 |80 |64 |48 |48 |48 |32 | 4 --------> 180 |96 |60 |48 |36 |36 |36 |24 | 5 --------> 150 |80 |50 |40 |30 |30 |30 |20 | 6 --------> 120 |64 |40 |32 |24 |24 |24 |16 | 7 --------> 105 |56 |35 |28 |21 |21 |21 |14 | 8 --------> 90 |48 |30 |24 |18 |18 |18 |12 | 9 --------> 90 |48 |30 |24 |18 |18 |18 |12 | 10 --------> 75 |40 |25 |20 |15 |15 |15 |10 | 11 --------> 75 |40 |25 |20 |15 |15 |15 |10 | 12 --------> 60 |32 |20 |16 |12 |12 |12 |8 | ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org