Hi I used the MASS library library(MASS) (by reading about examples at http://www.statmethods.net/stats/regression.html <http://s.bl-1.com/h/ofLlK27?url=http://www.statmethods.net/stats/regression.html> ) fit <- lm(Mileage~Disp+HP+Weight+Reliability,data=newx) step <- stepAIC(fit, direction="both") step$anova # display results
It showed the most relevant variables affecting Mileage. While that is a start, I am looking for a model that fits the entire data (including Mileage), not factors that influence Mileage. Multi model inference / selection. I was reading about glmulti. Are there any other packages I could look at, for infering models that best fit the data. To use nlm / nls, I need a formula, as one of the parameters to best fit the data and I am looking for functions that will help infer that formula from the data. Thanks lalitha On Sun, May 3, 2015 at 11:33 PM, Prashant Sethi <theseth.prash...@gmail.com> wrote: > Hi, > > I'm not an expert in data analysis (a beginner still learning tricks of > the trade) but I believe in your case since you're trying to determine the > correlation of a dependent variable with a number of factor variables, you > should try doing the regression analysis of your model. The function you'll > use for that is the lm() function. You can use the forward building or the > backward elimination method to build your model with the most critical > factors included. > > Maybe you can give it a try. > > Thanks and regards, > Prashant Sethi > On 3 May 2015 23:18, "Lalitha Viswanathan" < > lalitha.viswanatha...@gmail.com> wrote: > >> Hi >> I am sorry, I saved the file removing the dot after the Disp (as I was >> going wrong on a read.delim which threw an error about !header, etc...The >> dot was not the culprit, but I continued to leave it out. >> Let me paste the full code here. >> x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", >> header=TRUE, >> sep="\t") >> x<-data.frame(x) >> for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i); >> print(y); } >> newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp, >> HP)) >> cor(newx, method="pearson") >> my.cor <-cor.test(newx$Weight, newx$Price, method="spearman") >> my.cor <-cor.test(newx$Weight, newx$HP, method="spearman") >> my.cor <-cor.test(newx$Disp, newx$HP, method="spearman") >> Putting exact=NULL still doesn't remove the warning >> my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL) >> I tried to find the correlation coeff for a various combination of >> variables, but am unable to interpet the results. (Results pasted below in >> an earlier post) >> >> Followed it up with a normality test >> shapiro.test(newx$Disp) >> shapiro.test(newx$HP) >> >> Then decided to do a kruskal.test(newx) >> with the result >> Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16 >> >> Question is : I am trying to find factors influencing efficiency (in this >> case mileage) >> >> What are the range of functions / examples I should be looking at, to find >> a factor or combination of factors influencing efficiency? >> >> Any pointers will be helpful >> >> Thanks >> Lalitha >> >> On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan < >> lalitha.viswanatha...@gmail.com> wrote: >> >> > Hi >> > I have a dataset of the type attached. >> > Here's my code thus far. >> > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); >> > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, >> > Disp, HP)); >> > cor(newData, method="pearson"); >> > Results are >> > Price Reliability Mileage Weight Disp >> > HP >> > Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 >> > 0.6536433 >> > Reliability NA 1 NA NA NA >> > NA >> > Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 >> > -0.6667146 >> > Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 >> > 0.7629322 >> > Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 >> > 0.8181881 >> > HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 >> > 1.0000000 >> > >> > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP >> and >> > Price are strongly correlated. >> > To find the statistical significance, >> > I am trying sample.correln<-cor.test(newData$Disp, newData$HP, >> > method="kendall", exact=NULL) >> > Kendall's rank correlation tau >> > >> > data: newx$Disp and newx$HP >> > z = 7.2192, p-value = 5.229e-13 >> > alternative hypothesis: true tau is not equal to 0 >> > sample estimates: >> > tau >> > 0.6563871 >> > >> > If I try the same with >> > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", >> > exact=NULL) >> > I get Warning message: >> > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = >> NULL) >> > : >> > Cannot compute exact p-value with ties >> > > sample.correln >> > >> > Spearman's rank correlation rho >> > >> > data: newx$Disp and newx$HP >> > S = 5716.8, p-value < 2.2e-16 >> > alternative hypothesis: true rho is not equal to 0 >> > sample estimates: >> > rho >> > 0.8411566 >> > >> > I am not sure how to interpret these values. >> > Basically, I am trying to figure out which combination of factors >> > influences efficiency. >> > >> > Thanks >> > Lalitha >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.