... Realizing, of course, that after such data dredging, any subsequent inference is highly biased.
Cheers, Bert On Tuesday, April 28, 2015, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Lalitha, > If you want to find a reasonable model distribution for your data, try > plotting the histogram of the variable you want to predict and compare > this to the density curves of the distributions that you think will > fit. So for example: > > # plot a histogram of a uniform distribution > hist(seq(1,10,length.out=100)) > # overlay a normal density function with the same mean > lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*30) > > Not a very good fit, but: > > hist(rnorm(100,5.5)) > lines(seq(1,10,length.out=91),dnorm(seq(1,10,by=0.1),mean=5.5)*90) > > Much better. You can then perform a "goodness of fit" test if you need > it to justify your choice of distribution. In most cases, you will > have to find a "family" (link function) to use in a generalized linear > modeling (glm) test. > > Another approach is to use a non-parametric test if one gives an > appropriate answer to your question. > > Jim > > > On Tue, Apr 28, 2015 at 5:07 AM, David Winsemius <dwinsem...@comcast.net > <javascript:;>> wrote: > > > > On Apr 27, 2015, at 10:50 AM, Lalitha Viswanathan wrote: > > > >> Hi > >> I have a dataset as below > >> Price Country Reliability Mileage Type Weight Disp. HP > >> > >> > >> 8895 USA 4 33 Small 2560 97 113 > >> (Hundreds of rows) > >> > >> I am trying to find the best possible distribution to use, to find > p-values > >> and compute which factors most influence efficiency. > > > > "Finding p-values" is a task that requires research questions. You > obviously have some sort of meaning attached to the word "efficiency" but > have not stated what it is. This appears to be a request for a statistical > tutorial an a topic that has not been described. (And if this is course > homework, then it is off-topic for r-help.) > > > >> > >> Any starting points for the functions I could use, or similar examples I > >> could follow, would be a start. > >> I am a relative novice at R having used it many years ago and am now > >> getting back to it. > >> So looking for pointers > >> > >> Thanks > >> > >> [[alternative HTML version deleted]] > > > > The Posting Guide suggests that you create a small example in R code and > describe your question more clearly (if it's not homework.) > > > >> ______________________________________________ > >> R-help@r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius > > Alameda, CA, USA > > > > ______________________________________________ > > R-help@r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.