Hi I am sorry, I saved the file removing the dot after the Disp (as I was going wrong on a read.delim which threw an error about !header, etc...The dot was not the culprit, but I continued to leave it out. Let me paste the full code here. x<-read.table("/Users/Documents/StatsTest/fuelEfficiency.txt", header=TRUE, sep="\t") x<-data.frame(x) for (i in unique(x$Country)) { print (i); y <- subset(x, x$Country == i); print(y); } newx <- subset (x, select = c(Price, Reliability, Mileage, Weight, Disp, HP)) cor(newx, method="pearson") my.cor <-cor.test(newx$Weight, newx$Price, method="spearman") my.cor <-cor.test(newx$Weight, newx$HP, method="spearman") my.cor <-cor.test(newx$Disp, newx$HP, method="spearman") Putting exact=NULL still doesn't remove the warning my.cor <-cor.test(newx$Disp, newx$HP, method="kendall", exact=NULL) I tried to find the correlation coeff for a various combination of variables, but am unable to interpet the results. (Results pasted below in an earlier post)
Followed it up with a normality test shapiro.test(newx$Disp) shapiro.test(newx$HP) Then decided to do a kruskal.test(newx) with the result Kruskal-Wallis chi-squared = 328.94, df = 5, p-value < 2.2e-16 Question is : I am trying to find factors influencing efficiency (in this case mileage) What are the range of functions / examples I should be looking at, to find a factor or combination of factors influencing efficiency? Any pointers will be helpful Thanks Lalitha On Sun, May 3, 2015 at 2:49 PM, Lalitha Viswanathan < lalitha.viswanatha...@gmail.com> wrote: > Hi > I have a dataset of the type attached. > Here's my code thus far. > dataset <-data.frame(read.delim("data", sep="\t", header=TRUE)); > newData<-subset(dataset, select = c(Price, Reliability, Mileage, Weight, > Disp, HP)); > cor(newData, method="pearson"); > Results are > Price Reliability Mileage Weight Disp > HP > Price 1.0000000 NA -0.6537541 0.7017999 0.4856769 > 0.6536433 > Reliability NA 1 NA NA NA > NA > Mileage -0.6537541 NA 1.0000000 -0.8478541 -0.6931928 > -0.6667146 > Weight 0.7017999 NA -0.8478541 1.0000000 0.8032804 > 0.7629322 > Disp 0.4856769 NA -0.6931928 0.8032804 1.0000000 > 0.8181881 > HP 0.6536433 NA -0.6667146 0.7629322 0.8181881 > 1.0000000 > > It appears that Wt and Price, Wt and Disp, Wt and HP, Disp and HP, HP and > Price are strongly correlated. > To find the statistical significance, > I am trying sample.correln<-cor.test(newData$Disp, newData$HP, > method="kendall", exact=NULL) > Kendall's rank correlation tau > > data: newx$Disp and newx$HP > z = 7.2192, p-value = 5.229e-13 > alternative hypothesis: true tau is not equal to 0 > sample estimates: > tau > 0.6563871 > > If I try the same with > sample.correln<-cor.test(newData$Disp, newData$HP, method="pearson", > exact=NULL) > I get Warning message: > In cor.test.default(newx$Disp, newx$HP, method = "spearman", exact = NULL) > : > Cannot compute exact p-value with ties > > sample.correln > > Spearman's rank correlation rho > > data: newx$Disp and newx$HP > S = 5716.8, p-value < 2.2e-16 > alternative hypothesis: true rho is not equal to 0 > sample estimates: > rho > 0.8411566 > > I am not sure how to interpret these values. > Basically, I am trying to figure out which combination of factors > influences efficiency. > > Thanks > Lalitha > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.