Dear Joseph, If you do not need to make any inferences, that is, you just want it to look pretty, then drawing a curve by hand is as good a solution as any. Plus, there is no reason for expert testimony to say that the curve does not mean anything.
Sincerely, KeithC. -----Original Message----- From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo....@gmail.com] Sent: Tuesday, April 27, 2010 2:33 PM To: Gabor Grothendieck Cc: r-help@r-project.org Subject: Re: [R] Curve Fitting/Regression with Multiple Observations Frankly speaking, I am not looking for such a framework. The system I'm studying is a communication network (like M/M/1 queue, but way too complicated to mathematically analyze it using classical queueing theory) and the conclusion I want to make is qualitative rather than quantatitive -- a high-level comparative study of various network architectures based on the "equivalence principle" (a concept specific to netwokring, not in the general sense). What l want in this regard is a smooth, non-decreasing (hence one-to-one) function built out of simulation data because later in my processing, I need an inverse function of the said curve to find out an x value given the y value. That was, in fact, the reason I used the exponential (i.e., non-decreasing function) curve fiting. Even though I don't need a statistical inference framework for my work, I want to make sure that my use of regression/curve fitting techniques with my simulation data (as a tool for getting the mentioned curve) is proper and a usual practice among experts like you. To get answer to my question, I digged a lot through the Internet but found no clear explanation so far. Your suggestions and providing examples (always!) are much appreciated, but I am still not sure the use of those regression procedures with the kind of data I described is a right way to do. Again, many thanks for your prompt and kind answers, Joseph On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > If you are looking for a framework for statistical inference you could > look at additive models as in the mgcv package which has a book > associated with it if you need more info. e.g. > > library(mgcv) > fm <- gam(dist ~ s(speed), data = cars) > summary(fm) > plot(dist ~ speed, cars, pch = 20) > fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 2*se.fit) + > c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2, > 2)) > > > On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim > <kyeongsoo....@gmail.com> wrote: >> Hello Gabor, >> >> Many thanks for providing actual examples for the problem! >> >> In fact I know how to apply and generate plots using various R >> functions including loess, lowess, and smooth.spline procedures. >> >> My question, however, is whether applying those procedures directly >> on the data with multiple observations/duplicate points(?) is on the >> sound basis or not. >> >> Before asking my question to the list, I checked smooth.spline manual >> pages and found the mentioning of "cv" option related with duplicate >> points, but I'm not sure "duplicate points" in the manual has the >> same meaning as "multiple observations" in my case. To me, the manual >> seems a bit unclear in this regard. >> >> Looking at "car" data, I found it has multiple points with the same >> "speed" but different "dist", which is exactly what I mean by >> multiple observations, but am still not sure. >> >> Regards, >> Joseph >> >> >> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck >> <ggrothendi...@gmail.com> wrote: >>> This will compute a loess curve and plot it: >>> >>> example(loess) >>> plot(dist ~ speed, cars, pch = 20) >>> lines(cars$speed, fitted(cars.lo)) >>> >>> Also this directly plots it but does not give you the values of the >>> curve separately: >>> >>> library(lattice) >>> xyplot(dist ~ speed, cars, type = c("p", "smooth")) >>> >>> >>> >>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim >>> <kyeongsoo....@gmail.com> wrote: >>>> I recently came to realize the true power of R for statistical >>>> analysis -- mainly for post-processing of data from large-scale >>>> simulations -- and have been converting many of existing >>>> Python(SciPy) scripts to those based on R and/or Perl. >>>> >>>> In the middle of this conversion, I revisited the problem of curve >>>> fitting for simulation data with multiple observations resulting >>>> from repetitions. >>>> >>>> In the past, I first processed simulation data (i.e., multiple y's >>>> from repetitions) to get a mean with a confidence interval for a >>>> given value of x (independent variable) and then applied spline >>>> procedure for those mean values only (i.e., unique pairs of (x_i, >>>> y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather >>>> large confidence intervals, however, the resulting curves were >>>> hardly smooth enough for my purpose, I had to fix the function to >>>> exponential and used least square methods to fit its parameters for data. >>>> >>>> >From a plot with confidence intervals, it's rather easy for one to >>>> visually and manually(?) figure out a smoothed curve for it. >>>> So I'm thinking right now of directly applying spline (or whatever >>>> regression procedures for this purpose) to the simulation data with >>>> repetitions rather than means. The simulation data in this case >>>> looks like this (assuming three repetitions): >>>> >>>> # x y >>>> 1 1.2 >>>> 1 0.9 >>>> 1 1.3 >>>> 2 2.2 >>>> 2 1.7 >>>> 2 2.0 >>>> ... .... >>>> >>>> So my idea is to let spline procedure handle the fluctuations in >>>> the data (i.e., in repetitions) by itself. >>>> But I wonder whether this direct application of spline procedures >>>> for data with multiple observations makes sense from the >>>> statistical analysis (i.e., theoretical) point of view. >>>> >>>> It may be a stupid question and quite obvious to many, but >>>> personally I don't know where to start. >>>> It would be greatly appreciated if anyone can shed a light on this >>>> in this regard. >>>> >>>> Many thanks in advance, >>>> Joseph >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.