Hi, This might be due to the fact that factor levels are arbitary unless they are ordinal, even that quantitative relationships between levels are unclear. Therefore, the model has no way to predict unseen factor levels.
Does it make sense to treat 'No_databases' as numeric instead of a factor variable? Weidong On Mon, Dec 26, 2011 at 6:29 AM, Giovanni Azua <brave...@gmail.com> wrote: > Hello, > > I have tried reading the documentation and googling for the answer but > reviewing the online matches I end up more confused than before. > > My problem is apparently simple. I fit a glm model (2^k experiment), and then > I would like to predict the response variable (Throughput) for unseen factor > levels. > > When I try to predict I get the following error: >> throughput.pred <- predict(throughput.fit,experiments,type="response") > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = > object$xlevels) : > factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000 > > Of course these are new factor levels, it is exactly what I am trying to > achieve i.e. extrapolate the values of Throughput. > > Can anyone please advice? Below I include all details. > > Thanks in advance, > Best regards, > Giovanni > >> # define the extreme (factors and levels) >> experiments <- expand.grid(No_databases = seq(1000,100,by=-200), > + Partitioning = > c("sharding", "replication"), > + No_middlewares = > seq(500,100,by=-100), > + Queue_size = c(100)) >> experiments$No_databases <- as.factor(experiments$No_databases) >> experiments$Partitioning <- as.factor(experiments$Partitioning) >> experiments$No_middlewares <- as.factor(experiments$No_middlewares) >> experiments$Queue_size <- as.factor(experiments$Queue_size) >> str(experiments) > 'data.frame': 50 obs. of 4 variables: > $ No_databases : Factor w/ 5 levels "200","400","600",..: 5 4 3 2 1 5 4 3 2 > 1 ... > $ Partitioning : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 2 2 > 2 2 2 ... > $ No_middlewares: Factor w/ 5 levels "100","200","300",..: 5 5 5 5 5 5 5 5 5 > 5 ... > $ Queue_size : Factor w/ 1 level "100": 1 1 1 1 1 1 1 1 1 1 ... > - attr(*, "out.attrs")=List of 2 > ..$ dim : Named int 5 2 5 1 > .. ..- attr(*, "names")= chr "No_databases" "Partitioning" "No_middlewares" > "Queue_size" > ..$ dimnames:List of 4 > .. ..$ No_databases : chr "No_databases=1000" "No_databases= 800" > "No_databases= 600" "No_databases= 400" ... > .. ..$ Partitioning : chr "Partitioning=sharding" > "Partitioning=replication" > .. ..$ No_middlewares: chr "No_middlewares=500" "No_middlewares=400" > "No_middlewares=300" "No_middlewares=200" ... > .. ..$ Queue_size : chr "Queue_size=100" >> head(experiments) > No_databases Partitioning No_middlewares Queue_size > 1 1000 sharding 500 100 > 2 800 sharding 500 100 > 3 600 sharding 500 100 > 4 400 sharding 500 100 > 5 200 sharding 500 100 > 6 1000 replication 500 100 >> # or >> throughput.fit <- >> glm(log(Throughput)~(No_databases*No_middlewares)+Partitioning+Queue_size, > + data=throughput) >> summary(throughput.fit) > > Call: > glm(formula = log(Throughput) ~ (No_databases * No_middlewares) + > Partitioning + Queue_size, data = throughput) > > Deviance Residuals: > Min 1Q Median 3Q Max > -2.5966 -0.6612 -0.1944 0.5548 3.2136 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 5.74701 0.09127 62.970 < 2e-16 *** > No_databases4 0.43309 0.10985 3.943 8.66e-05 *** > No_middlewares2 -1.99374 0.11035 -18.067 < 2e-16 *** > No_middlewares4 -1.23004 0.10969 -11.214 < 2e-16 *** > Partitioningreplication 0.33291 0.06181 5.386 9.15e-08 *** > Queue_size100 0.15850 0.06181 2.564 0.0105 * > No_databases4:No_middlewares2 2.71525 0.15262 17.791 < 2e-16 *** > No_databases4:No_middlewares4 1.94191 0.15226 12.754 < 2e-16 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for gaussian family taken to be 0.8921778) > > Null deviance: 2175.58 on 936 degrees of freedom > Residual deviance: 828.83 on 929 degrees of freedom > AIC: 2562.2 > > Number of Fisher Scoring iterations: 2 > >> throughput.pred <- predict(throughput.fit,experiments,type="response") > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = > object$xlevels) : > factor 'No_databases' has new level(s) 200, 400, 600, 800, 1000 > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.