This would be better posted on a statistical list like stats.stackexchange.com, as it is largely about statistical methodology, not R code. Once you have determined what kinds of methods you want, you might then post back here -- or better yet, just search! -- for packages that implement those methods in R.
Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Mon, May 4, 2015 at 1:40 AM, Lalitha Viswanathan <lalitha.viswanatha...@gmail.com> wrote: > Hi > I used the MASS library > library(MASS) (by reading about examples at > http://www.statmethods.net/stats/regression.html > <http://s.bl-1.com/h/ofLlK27?url=http://www.statmethods.net/stats/regression.html> > ) > fit <- lm(Mileage~Disp+HP+Weight+Reliability,data=newx) > step <- stepAIC(fit, direction="both") > step$anova # display results > > It showed the most relevant variables affecting Mileage. > While that is a start, I am looking for a model that fits the entire data > (including Mileage), not factors that influence Mileage. > > Multi model inference / selection. > > I was reading about glmulti. > Are there any other packages I could look at, for infering models that best > fit the data. > > To use nlm / nls, I need a formula, as one of the parameters to best fit > the data and I am looking for functions that will help infer that formula > from the data. > > Thanks > lalitha > > On Sun, May 3, 2015 at 11:33 PM, Prashant Sethi <theseth.prash...@gmail.com> > wrote: > >> Hi, >> >> I'm not an expert in data analysis (a beginner still learning tricks of >> the trade) but I believe in your case since you're trying to determine the >> correlation of a dependent variable with a number of factor variables, you >> should try doing the regression analysis of your model. The function you'll >> use for that is the lm() function. You can use the forward building or the >> backward elimination method to build your model with the most critical >> factors included. >> >> Maybe you can give it a try. >> >> Thanks and regards, >> Prashant Sethi >> On 3 May 2015 23:18, "Lalitha Viswanathan" < >> lalitha.viswanatha...@gmail.com> wrote: >> >>> Hi >>> I am sorry, I saved the file removing the dot after the Disp (as I was >>> going wrong on a read.delim which threw an error about !header, etc...The >>> dot was not the culprit, but I continued to leave it out. >>> Let me paste the full code here. >>> x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", >>> header=TRUE, >>> sep="\t") >>> x<-data.frame(x) >>> for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i); >>> print(y); } >>> newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp, >>> HP)) >>> cor(newx, method="pearson") >>> my.cor <-cor.test(newx$Weight, newx$Price, method="spearman") >>> my.cor <-cor.test(newx$Weight, newx$HP, method="spearman") >>> my.cor <-cor.test(newx$Disp, newx$HP, method="spearman") >>> Putting exact=NULL still doesn't remove the warning >>> my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL) >>> I tried to find the correlation coeff for a various combination of >>> variables, but am unable to interpet the results. (Results pasted below in >>> an earlier post) >>> >>> Followed it up with a normality test >>> shapiro.test(newx$Disp) >>> shapiro.test(newx$HP) >>> >>> Then decided to do a kruskal.test(newx) >>> with the result >>> Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16 >>> >>> Question is : I am trying to find factors influencing efficiency (in this >>> case mileage) >>> >>> What are the range of functions / examples I should be looking at, to find >>> a factor or combination of factors influencing efficiency? >>> >>> Any pointers will be helpful >>> >>> Thanks >>> Lalitha >>> >>> On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan < >>> lalitha.viswanatha...@gmail.com> wrote: >>> >>> > Hi >>> > I have a dataset of the type attached. >>> > Here's my code thus far. >>> > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); >>> > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, >>> > Disp, HP)); >>> > cor(newData, method="pearson"); >>> > Results are >>> > Price Reliability Mileage Weight Disp >>> > HP >>> > Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 >>> > 0.6536433 >>> > Reliability NA 1 NA NA NA >>> > NA >>> > Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 >>> > -0.6667146 >>> > Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 >>> > 0.7629322 >>> > Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 >>> > 0.8181881 >>> > HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 >>> > 1.0000000 >>> > >>> > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP >>> and >>> > Price are strongly correlated. >>> > To find the statistical significance, >>> > I am trying sample.correln<-cor.test(newData$Disp, newData$HP, >>> > method="kendall", exact=NULL) >>> > Kendall's rank correlation tau >>> > >>> > data: newx$Disp and newx$HP >>> > z = 7.2192, p-value = 5.229e-13 >>> > alternative hypothesis: true tau is not equal to 0 >>> > sample estimates: >>> > tau >>> > 0.6563871 >>> > >>> > If I try the same with >>> > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", >>> > exact=NULL) >>> > I get Warning message: >>> > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = >>> NULL) >>> > : >>> > Cannot compute exact p-value with ties >>> > > sample.correln >>> > >>> > Spearman's rank correlation rho >>> > >>> > data: newx$Disp and newx$HP >>> > S = 5716.8, p-value < 2.2e-16 >>> > alternative hypothesis: true rho is not equal to 0 >>> > sample estimates: >>> > rho >>> > 0.8411566 >>> > >>> > I am not sure how to interpret these values. >>> > Basically, I am trying to figure out which combination of factors >>> > influences efficiency. >>> > >>> > Thanks >>> > Lalitha >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.