Good day,

I'm using a VAR model to forecast sales with some extra variables (google
trends data). I have divided my dataset into a trainingset (weekly sales +
vars in 2006 and 2007) and a holdout set (2008).
It is unclear to me how I should predict the out-of-sample data, because
using the predict() function in the vars package seems to estimate my
google trends vars as well. However, I want to forecast the sales figures,
with knowledge of the actual google trends data.

My questions:
1. How should I do this? I currently extract the linear model generated by
the VAR(3) function to predict the holdout set, but that seems
inappropriate?
2. In case that I am doing it right, how is it possible that a
automatically fitted model with more variables actually performs less good
(in terms of MAPE)? Shouldn't it at least predict just as well as the
simple AR(3) by finding that the extra variables have no added value?

My code:

        ts_Y <- ts(log_residuals[1:104]); # detrended sales data
        ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
        ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
        training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);

        ### Try VAR(3)
                var_model <- VAR (y=training_matrix, p=3, type="both", 
season=NULL,
exogen=NULL,  lag.max=NULL);

        ## Out of sample forecasting
                var.lm = lm(var_model$varresult$ts_Y); # the generated LM

                ts_Y <- ts(log_residuals[105:155]);
                ts_XGG <- ts(salesmodeldata$gtrends_global[105:155]);
                ts_XGL <- ts(salesmodeldata$gtrends_local[105:155]);

                # Notice how I manually create the lagged values to be used in 
the
Linear Model
                holdout_matrix <- na.omit(data.frame(ts.union(ts_Y, ts_XGG, 
ts_XGL,
ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 = lag(ts_Y,-3),
ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 =
lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 = lag(ts_XGL,-2),
ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194  )));

                var.predict = predict(object=var_model, n.ahead=52, 
dumvar=holdout_matrix);

        ## Assess accuracy
                calc_mape (holdout_matrix$ts_Y, var.predict, islog=T, print=T)

Some context:
For my Master's thesis I'm using R to test the predictive power of web
metrics (such as google trends data & pageviews) in sales forecasting. To
properly assess this, I employ a simple AR model (for time series without
the extra variables) and a VAR model for the predictions with the extra
variables. I also develop a random forest with, and without the buzz
variables and see if MAPE improves.

Many thanks in advance!

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to