Hi everybody, I'm learning R and statistics and wanted to use a real case scenario for my study... however a colleague is convinced my use of the linear regression function is incorrect.
Could you please help clarify if what I'm doing is wrong? And in particular why is wrong? I was thinking to use a simple linear regression as a basic predictive model. I've a number of stores in a region, for each store I know the Tot Revenue in a given year & the Tot Active Customers for that year. I want to use the Tot Active Customers as my independent variable x and Tot Revenue as my dependent variable y. So for example: StoreName | Tot Revenue | Tot Active Customers Store A | 200,000 | 120 Store B | 230,000 | 129 Store C | 220,000 | 119 The sample data has about 65 stores in total. I don't know the average transaction value or if a customer has transacted more then once. > LineBestFit = lm(TotRevenues ~ TotActiveCustomers) > plot(TotRevenues ~ TotActiveCustomers) > abline(LineBestFit) I've plotted the data and the line and I get a strong positive linear pattern with a couple of outliers , nonetheless the plot shows that the more active customers I've in a store the more revenues (which is expected). Now my objective is to calculate the slope b (steepness of the line) so that I can say that for x active customers I've y increase in revenues, and consequently attempt to predict targets for new stores...is this right? Any help would be appreciated, L -- View this message in context: http://r.789695.n4.nabble.com/Can-I-use-a-simple-linear-regression-in-this-situation-tp4553384p4553384.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.