[PR] Implement benchmark scenario `WeightedWorkloadOnTreeDataset` [polaris-tools]

via GitHub Wed, 30 Apr 2025 01:29:00 -0700


eric-maynard opened a new pull request, #21:
URL: https://github.com/apache/polaris-tools/pull/21


   This implements a new scenario, `WeightedWorkloadOnTreeDataset`, that 
supports the configuration of multiple **distributions** over which to weight 
reads & writes against the catalog. 
   
   Compared with `ReadUpdateTreeDataset`, this allows us to understand how 
performance changes when reads or writes frequently hit the same tables.
   
   ### Sampling
   
   The distributions are defined in the config file like so:
   ```
       # Distributions for readers
       # Each distribution will have `count` threads assigned to it
       # mean / variance describe the properties of the normal distribution
       # Readers will read a random table in the table space based on sampling
       # Default: [{ count = 8, mean = 0.3, variance = 0.0278 }]
       readers = [
         { count = 8, mean = 0.3, variance = 0.0278 }
       ]
   ```
   
   `count` is simply the number of threads which will sample from the 
distribution, while `mean` and `variance` describe the Gaussian distribution to 
sample from. These values are generally expected to fall between 0 and 1.0 and 
when they don't the distribution will be repeatedly **resampled**.
   
   For an extreme example, refer to the following:
   <img width="400" alt="Screenshot 2025-04-30 at 1 27 43 AM" 
src="https://github.com/user-attachments/assets/d77e98f1-7a94-463d-be82-0c47bbda92a1";
 />
   
   In this case, about 50% of samples should fall below 0.0 and therefore be 
resampled.
   
   Once a value between 0 and 1 is obtained, this is mapped to a table, where 
1.0 is the highest table (e.g. T_2048) in the tree dataset and 0.0 is T_0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Implement benchmark scenario `WeightedWorkloadOnTreeDataset` [polaris-tools]

Reply via email to