Hello, Suppose I have the dataset shown below. The amount of observations is too massive to get a nice geom_point and smoother on top. What I would like to do is to bin the data first. The data is indexed by Time (minutes from 1 to 120 i.e. two hours of System benchmarking).
Option 1) group the data by Time i.e. minute 1, minute 2, etc and within each group create bins of N consecutive observations and average them into one observation, the bins become the new data points to use for the geom_point plot. How can I do this? Shingle? how to do that? Option 2) Another option is to again group by Time i.e. minute 1, minute 2, etc and within each group draw a random observation to be the representative for the corresponding bin. I could not clearly see how to use Random. > dfs <- subset(df, Partitioning == "Sharding") > head(dfs) Time Partitioning Workload Runtime 1 1 Sharding Query 3301 2 1 Sharding Query 3268 3 1 Sharding Query 2878 4 1 Sharding Query 2819 5 1 Sharding Query 3310 6 1 Sharding Query 3428 > str(dfs) 'data.frame': 102384 obs. of 4 variables: $ Time : int 1 1 1 1 1 1 1 1 1 1 ... $ Partitioning: Factor w/ 2 levels "Replication",..: 2 2 2 2 2 2 2 2 2 2 ... $ Workload : Factor w/ 2 levels "Query","Refresh": 1 1 1 1 1 1 1 1 1 1 ... $ Runtime : int 3301 3268 2878 2819 3310 3428 2837 2954 2902 2936 ... > Many thanks in advance, Best regards, Giovanni ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.