Dear All First post so sorry for any breaches of etiquette. I have a csv containing the results for a series of experiments which record the time taken for various sizes of iterations.
"run_id","size","time" 1,100,1.00 2,200,2.100 3,100,1.100 4,200,2.100 5,200,1.900 6,300,4.00 7,200,2.5 ... I read the data set, extract the results for each "size" and return various statistics. The only problem is I would like to iterate over the distinct sizes to do a t.test My code has a section commented #manual t.test but I have no luck with the attempt labelled #attempt to automate t.test I'm assuming it's my attempt to pass the data as an argument to t.test() Any pointers gratefully accepted but as I'm a learner hints rather than a solution are preferred. Cheers Paul getwd() setwd("c:/work/R/experiment1") # read raw experimental data from results file data <- read.csv("data1.csv", header = TRUE) data #create a new dataframe which has space for a record for each unique size of experiment # this is to collect collated statistics for each experiment var_list <- c("num_obs", "size_run", "sample_mean","sample_var","std_dev","se") var_list_length <- length(var_list) num_experiments <- length(unique(data$size)) # create the dataframe df = data.frame(matrix(vector(), num_experiments , var_list_length, dimnames=list(c(),var_list)), stringsAsFactors=F) # it should be empty df # insert the experiment size df$size_run <- unique(data$size) # now it should have a single column filled # using df$size_run df # create a vector with the experiment sizes for (i in df$size_run) { # calculate the sample_variance of observations on a particular size df$sample_var[df$size == i] <- var(subset(data$time, data$size == i)) # calculate the mean of the returned values for all experiments of the same size df$sample_mean[df$size == i] <- mean(subset(data$time, data$size ==i)) # calculate the number of observations on a particular size df$num_obs[df$size == i] <- length(subset(data$time, data$size == i)) # calculate the sd of the data df$std_dev[df$size == i] <- sd(subset(data$time, data$size == i)) # calculate the standard error df$se[df$size == i] <- sd(subset(data$time, data$size ==i))/sqrt(length(subset(data$time, data$size ==i))) } df #manual t.test print("t.test for size = 100") t.test(subset(data$time, data$size == 100)) print("t.test for size = 200") t.test(subset(data$time, data$size == 200)) print("t.test for size = 300") t.test(subset(data$time, data$size == 300)) #attempt to automate t.test for (i in df$size_run) { print(i) a <- subset(data$time, data$size == i) print(a) t.test(a) } [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.