Thanks Ted and Professor Ripley for the very helpful answers! Now I know what the problem is in my case.
All the best, Werner --- [EMAIL PROTECTED] schrieb: > On 11-Mar-08 08:58:55, Werner Wernersen wrote: > > Hi, > > > > could anyone explain to me what this warning > message > > exactly means and what the consequences are? > > Is it due to the fact that there are very extreme > > observations / outliers included or what is the > reason > > for it? > > > > Thanks so much, > > Werner > > What it means is exactly what it says. How it arises > will > probably be some variant of the following kind of > data > (I'm guessing that your application of glm() was to > data > with 0/1 responses, as in a logistic regression): > > X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... > Y = 0 0 0 1 1 1 1 ... > > i.e. all the 0's occur on one side of a value (say > 1.25) > of X, and all the 1's occur on the other side. > > If you take a model (e.g. logistic): > > P(Y=1 | X) = exp((X-a)*b)/(1 + exp((X-a)*b)) > > then, for any finite values of a and b, the formula > will > give a value >0 for P(Y=1 | X) where X < 1.25 (i.e. > where > Y=0) so P(Y=0 | X) < 1; and a value <1 for P(Y=1 | > X) > where X > 1.25 (i.e. Y=1). > > However, if you take say a=1.25 (a value which > separates the > 0's from the 1,s), and then let b -> infinity, then > you will > find that > > P(Y=0 | X) -> 1, P(Y=1 | X) -> 0, for X < 1.25 > P(Y=0 | X) -> 0, P(Y=1 | X) -> 1, for X > 1.25 > > so the limit as b -> inf perfectly predicts the > observed outcome. > > However, the value of a is indeterminate so long as > it is > between the largest X for the Y=0 observations, and > the smallest > X for the Y=1 observations. > > This situation cannot arise with data where the > largest X for > which Y=0 is greater than the smallest X for which > Y=1, e.g. > > X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... > Y = 0 0 1 0 1 1 1 ... > > The above example is a very simple example of what > is called > "linear separation". It arises more generally when > there are > several covariates X1, X2, ... , Xk and there is a > linear > function > > L = a1*X1 + a2*X2 + ... + ak*Xk > > for which (with the data as observed) there is a > value L0 > such that > > Y = 0 for all the data such that L < L0 > Y = 1 for all the data such that L > L0 > > In particular, if ever the number of covariates (k) > is greater > than (n-2), where n is the number of cases in your > data, then > you have (k+1) or fewer points in k dimensions, and > there will > be a k-dimensional plane (as given by L above) which > will > separate the (X1,...,Xk)-points where Y=0 from the > (X1,...,Xk)-points where Y=1. Regardless of how you > assign labels > "Y=0" and "Y=1" to (k+1) or fewer points, they will > be linearly > separable. > > Even if k < n-1, so that they are not *in general* > linearly > separated, there is still a a positive probability > that you > can get data for which they are linerally separated; > and > then the same situation arises. This probability > increases > as the number of covariates (k) increases. > > What the warning message is telling you is that a > perfect > fit is possible within the parametrisation of the > model: > a probability P(Y=1)=0 is fitted to cases where the > observed > Y = 0; and a probability P(Y=1)=1 is fitted to cases > where > the observed Y = 1. > > Best wishes, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <[EMAIL PROTECTED]> > Fax-to-email: +44 (0)870 094 0861 > Date: 11-Mar-08 > Time: 10:08:04 > ------------------------------ XFMail > ------------------------------ > Lesen Sie Ihre E-Mails auf dem Handy. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.