Hello,

Suppose I have the dataset shown below. The amount of observations is too 
massive to get a nice geom_point and smoother on top. What I would like to do 
is to bin the data first. The data is indexed by Time (minutes from 1 to 120 
i.e. two hours of System benchmarking).

Option 1) group the data by Time i.e. minute 1, minute 2, etc and within each 
group create bins of N consecutive observations and average them into one 
observation, the bins become the new data points to use for the geom_point 
plot. How can I do this? Shingle? how to do that? 

Option 2)  Another option is to again group by Time i.e. minute 1, minute 2, 
etc and within each group draw a random observation to be the representative 
for the corresponding bin. I could not clearly see how to use Random.

> dfs <- subset(df, Partitioning == "Sharding")
> head(dfs)
  Time Partitioning Workload Runtime
1    1     Sharding    Query    3301
2    1     Sharding    Query    3268
3    1     Sharding    Query    2878
4    1     Sharding    Query    2819
5    1     Sharding    Query    3310
6    1     Sharding    Query    3428
> str(dfs)
'data.frame':   102384 obs. of  4 variables:
 $ Time        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Partitioning: Factor w/ 2 levels "Replication",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Workload    : Factor w/ 2 levels "Query","Refresh": 1 1 1 1 1 1 1 1 1 1 ...
 $ Runtime     : int  3301 3268 2878 2819 3310 3428 2837 2954 2902 2936 ...
> 

Many thanks in advance,
Best regards,
Giovanni
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to