Ruben,
Thankyou for the advice. I'll do what I can with it.
Ruben Wrote:
If I understand your problem correctly, you have that the magnitude of
deviations from the mean/median/mode in the volume of your requests for
background checks in month m predicts a multivariate response that
represents the macroeconomic situation in month m+1.
First, regarding your original question, a statistician judging your
product would like to see a measure of predictive success. If you have
a
model to relate your predictor (the deviation in volume of requests)
with your response (several variables representing the macroeconomic
status) then you could run the model for many months (say from Jan 2000
to Sep 2008) and predict the macroeconomic status with the model and
compare it with the actual macroeconomic status observed. This would be
framed into a measure of predictive success and predictive mean squared
error.
Second, regarding what method to use to fit the relation between your
predictor and the multivariate response, you have a number of options.
One simple alternative that would reduce your problem to a simple
univariate modeling problem would be to research the economic
literature
to define an index of macroeconomic status that would reduce your
multivariate response to an univariate response. Additionally, if the
variables in the multivariate response are strongly correlated, you can
define your own index by using principal component analysis on the
multivariate response, and later use the first principal component as a
univariate response. After that, many options are again available, such
as forecasting methods or regular time series analysis. A more complex
but probably more precise approach would be to model the multivariate
response as such. This depends on the nature of the variables in the
multivariate response. If they can be considered as multinomial counts
then you have a very good solution using multinomial logistic
regression
with function multinom in package nnet.
Maybe this can get you started.
Regards
Rubén
Gad Abraham explained :
Max wrote:
Hi everyone,
This is not so much of an R question as a statistics question. I currently
work for the largest pre employment screening company in Canada. Upper
management has noticed that noticed that usually a month or so before any
big kind of economic shock happens, that our incoming files (requests for
a background check) jump up or down.
As the company statistician, they've asked me to see if the relationship
is strong enough to put together a product that can be sold to any kind of
firm or organization (brokerages or any kind of investing firm, federal
ministry of finance, statistics canada (like the bureau of stats in the
USA), universities etc)
In Canada on the 10th of every month, statistics canada releases labour
statistics for the previous month. The way CFO sees it, *ideally* on the
(1st to 10th, something like that) every month, the firm I work for could
be releasing data for the rest of the month.
What I'm trying to figure out is if you were in the position of evaluating
the final product for purchase, what kind of information would make the
product credible/viable? Summary statistics? Variance covariance matrices?
Graphs of the data? Cross Correlation matrices for time series analysis?
It's frustrating because I can see a noticeable relationship between our
file volume and the unemployment rate (in particular,) but I'm not sure
how to appropriately frame it in a way that another statistician/modeler
would want the data.
Why not start with some simple plots of the relationships between your
variables? Once you have a feel for the problem, you can look into
modelling it more formally using a suitable regression model.
Gad, the issue I have is that I technically have one predictor for multiple
response. The data is not very clean for simple univariate models.
Unfortunately, my knowledge of multivariate response models is poor, and how
to set up the problem in R as a multivariate regression is a total mystery to
me. (Multivariate was the one course that I wasn't able to take in my
undergrad math/stats degree. )
The other issue is that if I view the problem as a time series problem, it's
multiple time series analysis, which I don't have any books on.
The more I look at the data and the problem the more I feel like I'm in way
over my head.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.