Ted,
Thanks for the reply.
For the example, I'm not looking to predict "THE winner", but to find
the best probabilities of winning.
It would seem that the process of iterating through possible
coefficients would be the same as a standard GLM, the "evalation" part
as you work through them would have to be adjusted to look "per group".
I would call this something like "grouped maximum liklihood" if I got
to make up the name.
-N
On 9/17/09 11:06 AM, (Ted Harding) wrote:
On 17-Sep-09 17:28:16, Noah Silverman wrote:
Hi,
I'm not sure of the correct nomenclature or function for what
I'm trying to do.
I'm interested in calculated a logistic regression on a binary
dependent variable (True,False).
There are a few ways to easily do this in R. Both SVM and GLM
work easily.
The part that I want to add is "group wise" awareness. So that
the algorithm computes the coefficients to maximize the liklihood
of of a "True" label per group.
An toy explanation is probably best. I've been looking at horse
racing models as a fun field to learn about statistics and R.
So, for this example, lets assume the following:
100 horses in our stable
10 horses per race
75 races this season (some horses race more than once.)
The independent variables are things about a horse (average speed,
number of past wins, etc.)
The dependent variable is (Win, Lose) represented by (1,0)
As mentioned above, an SVM or GLM will quickly work to estimate
coefficients and probability of a Win. I'd like to take it further
and estimate the probability of a win but look at the per race.
I'm NOT interested in the group label as a final part of the model.
I don't want a separate set of coefficients for each group. I just
want the iterative algorithm to work toward maximizing the liklihood
PER GROUP as an average.
I looked extensively through rseek.org for things like "grouped
logistic" and "nested logistic". I couldn't seem to find anything
do this. I'm probably naming it wrong.
I assume that a MANUAL iteration concept would be to :
1) Pick a coefficient
2) Calculate the resulting probability for each horse.
3) Measure the strength of the result for each race (sum them
together or average them?)
4) Adjust coefficient and repeat
Surely there must be some standard function in a library that will
do this.
Can any of the stat gurus here offer some suggestions?
Thanks!
--
Noah
In the context of your "fun example", you have a fundamental problem
in that (if I've understood your statement of it correctly) you will
have more than one of your horses in the same race (apparently 10).
Therefore, one of them winning excludes any of the others winning in
that same race, so their results are not independent of each
other.
Also, at least in real life, the probability that a given horse will
win in a particular race depends not only on the covariates "per horse"
(such as your average speed, number of past wins, etc.), and indeed
on the condition of the race-course at the time, but also (and usually
strongly) on the characteristics of the other horses in the same race.
So a simple logistic model of the kind you seem to be proposing would
certainly not be realistic!
I would be happier thinking about your problem in the context of a
different kind of example ...
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)<ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Sep-09 Time: 19:06:27
------------------------------ XFMail ------------------------------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.