If you're interested in comparing empirical to simulation distributions, I see two alternatives to your density() approach (which will be sensitive to your choice of bandwidth). Both of the following have been used in my field to look at the fit of empirical response time data to models of human information processing:
(1) estimate some number of quantiles for each set of 1000 replicates, average the quantile points, then compare the results to a similar set of quantiles generated from your simulation data. (2) fit an a priori distribution to each set of 1000 replicates, within each distribution parameter, average across sets, then compare to parameters obtained from fitting the simulation data. #generate some fake simulation data sim.x = rnorm(1e5,100,20) #make up some fake empirical data a=data.frame( set = rep(1:8,each=1e3) ,x=rnorm(8*1e3,100,20)+rexp(8*1e3,50) ) #define a function to compute the geometric mean ##(fails with negative numbers!) geometric.mean = function(x) exp(mean(log(x))) ######## # Quantile approach ######## #set up a data frame to collect empirical quantile data emp.q=expand.grid( set=unique(a$set) ,q=seq(.1,.9,.1) ,x=NA ) #estimate empirical quantiles for(set in unique(a$set)){ emp.q$x[emp.q$set==set] = quantile(a$x[a$set==set],probs=unique(emp.q$q)) } #compute the geomentric mean for each empirical quantile emp.q.means = with( emp.q ,aggregate( x ,list( q=q ) ,geometric.mean ) ) #estimate the quantiles from the simulation data sim.q = quantile(sim.x,unique(emp.q$q)) #assess the fit using sum squared scaled error quantile.sum.sq.sc.err = sum(((emp.q.means$x-sim.q)/sim.q)^2) print(quantile.sum.sq.sc.err) ######## # Fitting approach using the Generalized Lambda distribution ## GLD chosen because it's flexible and easily fit using the ### gld package ######## #install the gld package if necessary if(!('gld' %in% installed.packages())){ install.packages('gld') } #load the gld package library(gld) #set up a data frame to collect empirical GLD parameters emp.gld.par=expand.grid( set=unique(a$set) ,lambda=1:4 ,x=NA ) #estimate empirical GLD parameters ## print-out keeps an eye on convergence (hopefully 0) ## takes a while to complete for(set in unique(a$set)){ fit = starship(a$x[a$set==set]) cat('Set:',set,', convergence:',fit$optim.results$convergence,'\n') emp.gld.par$x[emp.gld.par$set==set] = fit$lambda } #compute the geomentric mean for each empirical GLD parameter emp.gld.par.means = with( emp.gld.par ,aggregate( x ,list( lambda=lambda ) ,geometric.mean ) ) #estimate the GLD parameters from the simulation data sim.fit = starship(sim.x) cat('Sim convergence:',sim.fit$optim.results$convergence,'\n') sim.gld.par = sim.fit$lambda #assess the fit using sum squared scaled error gld.par.sum.sq.sc.err = sum(((emp.gld.par.means$x-sim.gld.par)/sim.gld.par)^2) print(gld.par.sum.sq.sc.err) -- Mike Lawrence Graduate Student Department of Psychology Dalhousie University Looking to arrange a meeting? Check my public calendar: http://tinyurl.com/mikes-public-calendar ~ Certainty is folly... I think. ~ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.