Hi: On Mon, Oct 24, 2011 at 2:01 AM, Giovanni Azua <brave...@gmail.com> wrote: > Hello, > > Suppose I have the dataset shown below. The amount of observations is too > massive to get a nice geom_point and smoother on top. What I would like to do > is to bin the data first. The data is indexed by Time (minutes from 1 to 120 > i.e. two hours of System benchmarking). > > Option 1) group the data by Time i.e. minute 1, minute 2, etc and within each > group create bins of N consecutive observations and average them into one > observation, the bins become the new data points to use for the geom_point > plot. How can I do this? Shingle? how to do that?
If necessary, create a variable for minute; if Time already represents minutes, you shouldn't need to do anything. To average Runtime by one or more factors, there are many ways to do it: aggregate() in base R, ddply() in plyr, summaryBy() in the doBy package or data.table. For example, with aggregate() [R-2.11.0 or later], you could do (assuming Time is in minutes; otherwise substitute the minute variable instead) aggregate(Runtime ~ Time + Partitioning, data = dfs, FUN = mean) > > Option 2) Another option is to again group by Time i.e. minute 1, minute 2, > etc and within each group draw a random observation to be the representative > for the corresponding bin. I could not clearly see how to use Random. # Example: # sampfun() samples one row of a data frame at random sampfun <- function(d) d[sample(seq_len(nrow(d)), 1), ] library('plyr') ddply(dfs, .(Time, Partitioning), sampfun) HTH, Dennis > >> dfs <- subset(df, Partitioning == "Sharding") >> head(dfs) > Time Partitioning Workload Runtime > 1 1 Sharding Query 3301 > 2 1 Sharding Query 3268 > 3 1 Sharding Query 2878 > 4 1 Sharding Query 2819 > 5 1 Sharding Query 3310 > 6 1 Sharding Query 3428 >> str(dfs) > 'data.frame': 102384 obs. of 4 variables: > $ Time : int 1 1 1 1 1 1 1 1 1 1 ... > $ Partitioning: Factor w/ 2 levels "Replication",..: 2 2 2 2 2 2 2 2 2 2 ... > $ Workload : Factor w/ 2 levels "Query","Refresh": 1 1 1 1 1 1 1 1 1 1 ... > $ Runtime : int 3301 3268 2878 2819 3310 3428 2837 2954 2902 2936 ... >> > > Many thanks in advance, > Best regards, > Giovanni > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.