[Rd] Quadratic Programming
Hi all! I have a little question concerning quadprog. To make it simple I'll start by stating the problem: I want to minimize h(d,delta)=0.5d^T B d +nabla(f(x))^T d +rho*delta^2 With respect to d\in R^n and delta \in R. I obviously have constraints (depending on both d and delta). Solve.QP does give me a good result for d but I cannot obtain anything for delta. Simce dim(Dmat)=n and sol<-rep(0,n) it isn't particularly surprising. To set a diagonal matrix (B 0 ) Amat= (0 rho ) Is a crapy idea. Does anyone have an idea? Yours, Serge "Beatus qui prodest quibus potest" - (Lycklig är den som hjälper andra) Serge de Gosson de Varennes Försäkringskassan Swedish Social Insurance Agency +46-76 11 40 799 [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Minor documentation bug in R-exts (PR#10515)
This is a minor documentation bug. In the document: Writing R Extensions (R-exts) section 1.6.5 (Summary -- converting an existing package) the 3rd bullet is missing the end of the sentence. Thanks, --please do not edit the information below-- Version: platform =3D i386-pc-mingw32 arch =3D i386 os =3D mingw32 system =3D i386, mingw32 status =3D=20 major =3D 2 minor =3D 6.1 year =3D 2007 month =3D 11 day =3D 26 svn rev =3D 43537 language =3D R version.string =3D R version 2.6.1 (2007-11-26) Windows 2000 (build 2195) Service Pack 4.0 Locale: LC_COLLATE=3DEnglish_United States.1252;LC_CTYPE=3DEnglish_United States.1252;LC_MONETARY=3DEnglish_United States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base --=20 Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 =20 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rsquared bug lm() (PR#10516)
On 12/14/2007 8:10 AM, [EMAIL PROTECTED] wrote: > Full_Name: lieven clement > Version: R version 2.4.0 Patched (2006-11-25 r39997) > OS: i486-pc-linux-gnu > Submission from: (NULL) (157.193.193.180) > > > summary.lm() does not calculate R² accurately for models without intercepts > if > one of the predictor variables is a factor. > In order to avoid one of the factor levels to be considered as a reference > class > you can use the -1 option in a formula. When you use this, R² is not > correctly > calculated. This is not a bug. A model without an intercept should be using y=0 as a reference. Duncan Murdoch > >> x1<-rnorm(100) >> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) >> y<-10*x1+x2+rnorm(100,0,4) >> x2<-as.factor(x2) >> lmtest<-lm(y~-1+x1+x2) >> summary(lmtest)$r.sq > [1] 0.9650201 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > The R squared by summary is calculated as >> 1-sum(lmtest$res^2)/sum((y)^2) > [1] 0.9650201 > apparently because lm.summary assumes the mean of y to be zero. > > In case of an intercept model everything seems ok >> lmtest<-lm(y~x1+x2) >> summary(lmtest)$r.sq > [1] 0.9342672 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] End of whiskers of boxplots are repeated on PDF device (PR#10499)
> "MT" == Michael Toews <[EMAIL PROTECTED]> > on Thu, 13 Dec 2007 09:00:37 -0800 writes: MT> I've identified the problem for this issue, which is MT> simple to fix. Please see and apply the attached MT> patch. Thanks. +mt Excellent, Michael! This will be fixed in R-patched and R-devel from tomorrow. Thank you, and regards! Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Minor documentation bug in R-exts (PR#10515)
On 12/13/2007 1:15 PM, [EMAIL PROTECTED] wrote: > This is a minor documentation bug. In the document: Writing R > Extensions (R-exts) section 1.6.5 (Summary -- converting an existing > package) the 3rd bullet is missing the end of the sentence. Thanks, I'll fix it. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rsquared bug lm() (PR#10516)
Full_Name: lieven clement Version: R version 2.4.0 Patched (2006-11-25 r39997) OS: i486-pc-linux-gnu Submission from: (NULL) (157.193.193.180) summary.lm() does not calculate R² accurately for models without intercepts if one of the predictor variables is a factor. In order to avoid one of the factor levels to be considered as a reference class you can use the -1 option in a formula. When you use this, R² is not correctly calculated. > x1<-rnorm(100) > x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) > y<-10*x1+x2+rnorm(100,0,4) > x2<-as.factor(x2) > lmtest<-lm(y~-1+x1+x2) > summary(lmtest)$r.sq [1] 0.9650201 > 1-sum(lmtest$res^2)/sum((y-mean(y))^2) [1] 0.9342672 The R squared by summary is calculated as > 1-sum(lmtest$res^2)/sum((y)^2) [1] 0.9650201 apparently because lm.summary assumes the mean of y to be zero. In case of an intercept model everything seems ok > lmtest<-lm(y~x1+x2) > summary(lmtest)$r.sq [1] 0.9342672 > 1-sum(lmtest$res^2)/sum((y-mean(y))^2) [1] 0.9342672 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] windows rtools missing gfortran.exe?
Hi, I replaced my Rtools today as posted at http://www.murdoch-sutherland.com/Rtools/Rtools.exe Trying to build R-devel_2007-12-13.tar.gz without modifying MkRules gives the gfortran command not found error below. I am wondering if gfortran.exe is missing from (recent?) Rtools.exe or I am doing something wrong. Thanks to hints at Duncan's site, I worked around the error by adding "GCC4_SUFF=-sjlj" in MkRules (below the == end of user-customizable parts == line). gcc -std=gnu99 -I../include -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall -pedantic -c zeroin.c -o zeroin.o gfortran -O3 -c ch2inv.f -o ch2inv.o make[4]: gfortran: Command not found make[4]: *** [ch2inv.o] Error 127 make[3]: *** [rlibs] Error 2 make[2]: *** [../../bin/R.dll] Error 2 make[1]: *** [rbuild] Error 2 make: *** [all] Error 2 -- -- Hiroyuki Kawakatsu Business School Dublin City University Dublin 9, Ireland Tel +353 (0)1 700 7496 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] windows rtools missing gfortran.exe?
On 12/14/2007 10:23 AM, Hiroyuki Kawakatsu wrote: > Hi, > > I replaced my Rtools today as posted at > http://www.murdoch-sutherland.com/Rtools/Rtools.exe > > Trying to build R-devel_2007-12-13.tar.gz without modifying MkRules > gives the gfortran command not found error below. I am wondering if > gfortran.exe is missing from (recent?) Rtools.exe or I am doing > something wrong. > > Thanks to hints at Duncan's site, I worked around the error by adding > "GCC4_SUFF=-sjlj" in MkRules (below the == end of user-customizable > parts == line). > > gcc -std=gnu99 -I../include -DHAVE_CONFIG_H -DR_DLL_BUILD -O3 -Wall > -pedantic > -c zeroin.c -o zeroin.o > gfortran -O3 -c ch2inv.f -o ch2inv.o > make[4]: gfortran: Command not found > make[4]: *** [ch2inv.o] Error 127 > make[3]: *** [rlibs] Error 2 > make[2]: *** [../../bin/R.dll] Error 2 > make[1]: *** [rbuild] Error 2 > make: *** [all] Error 2 > It's supposed to be in there; I'm just downloading a copy myself to verify... You're right! Somehow Rtools.exe got linked to Rtools26.exe, instead of the new one. Now fixed. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Do not misuse R-devel {was "Quadratic Programming"}
Please use the R-help mailing for such *Questions*. R-devel has a very different purpose. Probably you should also read the posting guide, --> http://www.r-project.org/posting-guide.html Regards, Martin Maechler, ETH Zurich > "dGdVS" == de Gosson de Varennes Serge (4100) <[EMAIL PROTECTED]> > on Fri, 14 Dec 2007 09:15:26 +0100 writes: dGdVS> Hi all! I have a little question concerning dGdVS> quadprog. To make it simple I'll start by stating the dGdVS> problem: dGdVS> I want to minimize dGdVS> h(d,delta)=0.5d^T B d +nabla(f(x))^T d dGdVS> +rho*delta^2 dGdVS> With respect to d\in R^n and delta \in R. I obviously dGdVS> have constraints (depending on both d and delta). dGdVS> Solve.QP does give me a good result for d but I dGdVS> cannot obtain anything for delta. Simce dim(Dmat)=n dGdVS> and sol<-rep(0,n) it isn't particularly surprising. dGdVS> To set a diagonal matrix dGdVS> (B 0 ) Amat= (0 rho ) dGdVS> Is a crapy idea. Does anyone have an idea? dGdVS> Yours, dGdVS> Serge dGdVS> "Beatus qui prodest quibus potest" - (Lycklig �r dGdVS> den som hj�lper andra) dGdVS> Serge de Gosson de Varennes F�rs�kringskassan dGdVS> Swedish Social Insurance Agency +46-76 11 40 799 dGdVS> [EMAIL PROTECTED] dGdVS> [[alternative HTML version deleted]] dGdVS> __ dGdVS> R-devel@r-project.org mailing list dGdVS> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] segfault isoreg with NAs
Dear list, As can be seen below, adding a NA to the y values in a call to isoreg results in a segfault. ir4 <- isoreg(1:10, y4 <- c(5, 9, 1:2, 5:8, NA, 8)) Adding missing values to the x values, on the contrary, gives an error, but maybe the error message could be tailored to this particular situation. y <- c(5, 9, 1:2, 5:8, 3, 8) x <- c(1:9, NA) isoreg(x, y) ## error message: Error in if (!isOrd) { : missing value where TRUE/FALSE needed Please find below a (temporary) patch (against Revision 43692) for both the R source and the help file. Kind regards, Tobias ### patch isoreg.R ### --- isoreg.R2007-12-14 19:07:47.0 +0100 +++ isoreg2.R 2007-12-14 19:11:20.0 +0100 @@ -18,6 +18,9 @@ ## isoreg <- function(x, y=NULL) { +if (any(is.na(x))) stop("x may not contain NA values") +if (any(is.na(y))) stop("y may not contain NA values") + xy <- xy.coords(x,y) x <- xy$x isOrd <- (!is.null(xy$xlab) && xy$xlab == "Index") || !is.unsorted(x) ### patch isoreg.Rd ### --- isoreg.Rd 2007-12-14 19:08:12.0 +0100 +++ isoreg2.Rd 2007-12-14 19:15:00.0 +0100 @@ -20,6 +20,7 @@ \item{x, y}{%in \code{isoreg}, coordinate vectors of the regression points. Alternatively a single plotting structure can be specified: see \code{\link{xy.coords}}. +The coordinate vectors may not contain missing values. } } \details{ ### sessionInfo() information segfault ### > sessionInfo() R version 2.6.0 (2007-10-03) i486-pc-linux-gnu locale: LC_CTYPE=en_US.ISO-8859-15;LC_NUMERIC=C;LC_TIME=en_US.ISO-8859-15;LC_COLLATE=en_US.ISO-8859-15;LC_MONETARY=en_US.ISO-8859-15\ ;LC_MESSAGES=en_US.ISO-8859-15;LC_PAPER=en_US.ISO-8859-15;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.ISO-885\ 9-15;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base > ir4 <- isoreg(1:10, y4 <- c(5, 9, 1:2, 5:8, NA, 8)) *** caught segfault *** address 0x24, cause 'memory not mapped' Process R segmentation fault (core dumped) at Fri Dec 14 17:48:22 2007 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rapid Random Access
I have some code that can potentially produce a huge number of large-ish R data frames, each of a different number of rows. All the data frames together will be way too big to keep in R's memory, but we'll assume a single one is manageable. It's just when there's a million of them that the machine might start to burn up. However I might, for example, want to compute some averages over the elements in the data frames. Or I might want to sample ten of them at random and do some plots. What I need is rapid random access to data stored in external files. Here's some ideas I've had: * Store all the data in an HDF-5 file - problem here is that the current HDF package for R reads the whole file in at once. * Store the data in some other custom binary format with an index for rapid access to the N-th elements. Problems: feels like reinventing HDF, cross-platform issues, etc. * Store the data in a number of .RData files in a directory. Hence to get the N-th element just attach(paste("foo/A-",n,'.RData')) give or take a parameter or two. * Use a database. Seems a bit heavyweight, but maybe using RSQLite could work in order to keep it local. What I'm currently doing is keeping it OO enough that I can in theory implement all of the above. At the moment I have an implementation that does keep them all in R's memory as a list of data frames, which is fine for small test cases but things are going to get big shortly. Any other ideas or hints are welcome. thanks Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rapid Random Access
On Dec 14, 2007 1:01 PM, Barry Rowlingson <[EMAIL PROTECTED]> wrote: > I have some code that can potentially produce a huge number of > large-ish R data frames, each of a different number of rows. All the > data frames together will be way too big to keep in R's memory, but > we'll assume a single one is manageable. It's just when there's a > million of them that the machine might start to burn up. > > However I might, for example, want to compute some averages over the > elements in the data frames. Or I might want to sample ten of them at > random and do some plots. What I need is rapid random access to data > stored in external files. > > Here's some ideas I've had: > > * Store all the data in an HDF-5 file - problem here is that the > current HDF package for R reads the whole file in at once. > > * Store the data in some other custom binary format with an index for > rapid access to the N-th elements. Problems: feels like reinventing HDF, > cross-platform issues, etc. > > * Store the data in a number of .RData files in a directory. Hence to > get the N-th element just attach(paste("foo/A-",n,'.RData')) give or > take a parameter or two. > > * Use a database. Seems a bit heavyweight, but maybe using RSQLite > could work in order to keep it local. > Unless you really need this to be a general solution, I would suggest using a database. And if you use one that allows you to create functions within it, you can even keep some of the calculations on the server side (which may be a performance advantage). If you are doing a lot of this, you might consider Postgres and pl/R, which embeds R in the database. Sean [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rapid Random Access
Tony Plate wrote: > This is exactly the type of situation that the trackObjs package is > designed for. Ooh, I'm having deja-vu - yes I think I saw this package a while back and wondered what magic it did. I shall go play with it later. Thanks Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rapid Random Access
Barry Rowlingson wrote: > I have some code that can potentially produce a huge number of > large-ish R data frames, each of a different number of rows. All the > data frames together will be way too big to keep in R's memory, but > we'll assume a single one is manageable. It's just when there's a > million of them that the machine might start to burn up. > This is exactly the type of situation that the trackObjs package is designed for. It will automatically (and invisibly) store each object in its own .RData file so that objects can be accessed as ordinary R objects, but are not kept in memory (actually, there are options to control whether or not objects are cached in memory). It also caches some characteristics of objects so that a brief summary of objects can be provided without having to read each object. The g.data package and the filehash package also do similar things wrt to providing automatic access to objects in .RData files (and were part of the inspiration for the trackObjs package.) -- Tony Plate > However I might, for example, want to compute some averages over the > elements in the data frames. Or I might want to sample ten of them at > random and do some plots. What I need is rapid random access to data > stored in external files. > > Here's some ideas I've had: > > * Store all the data in an HDF-5 file - problem here is that the > current HDF package for R reads the whole file in at once. > > * Store the data in some other custom binary format with an index for > rapid access to the N-th elements. Problems: feels like reinventing HDF, > cross-platform issues, etc. > > * Store the data in a number of .RData files in a directory. Hence to > get the N-th element just attach(paste("foo/A-",n,'.RData')) give or > take a parameter or two. > > * Use a database. Seems a bit heavyweight, but maybe using RSQLite > could work in order to keep it local. > > What I'm currently doing is keeping it OO enough that I can in theory > implement all of the above. At the moment I have an implementation that > does keep them all in R's memory as a list of data frames, which is fine > for small test cases but things are going to get big shortly. Any other > ideas or hints are welcome. > > thanks > > Barry > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Wrong length of POSIXt vectors (PR#10507)
Duncan Murdoch wrote: > On 12/13/2007 1:59 PM, Tony Plate wrote: >> Duncan Murdoch wrote: >>> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote: Full_Name: Petr Simecek Version: 2.5.1, 2.6.1 OS: Windows XP Submission from: (NULL) (195.113.231.2) Several times I have experienced that a length of a POSIXt vector has not been computed right. Example: tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst" ), class = c("POSIXt", "POSIXlt")) print(tv) # print 11 time points (right) length(tv) # returns 9 (wrong) >>> >>> tv is a list of length 9. The answer is right, your expectation is >>> wrong. I have tried that on several computers with/without switching to English locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages but I cannot imagine how that could be OK. >>> >>> See this in ?POSIXt: >>> >>> Class '"POSIXlt"' is a named list of vectors... >>> >>> You could define your own length measurement as >>> >>> length.POSIXlt <- function(x) length(x$sec) >>> >>> and you'll get the answer you expect, but be aware that length.XXX >>> methods are quite rare, and you may surprise some of your users. >>> >> >> On the other hand, isn't the fact that length() currently always >> returns 9 for POSIXlt objects likely to be a surprise to many users >> of POSIXlt? >> >> The back of "The New S Language" says "Easy-to-use facilities allow >> you to organize, store and retrieve all sorts of data. ... S >> functions and data organization make applications easy to write." >> >> Now, POSIXlt has methods for c() and vector subsetting "[" (and many >> other vector-manipulation methods - see methods(class="POSIXlt")). >> Hence, from the point of view of intending to supply "easy-to-use >> facilities ... [for] all sorts of data", isn't it a little >> incongruous that length() is not also provided -- as 3 functions (any >> others?) comprise a core set of vector-manipulation functions? >> >> Would it make sense to have an informal prescription (e.g., in >> R-exts) that a class that implements a vector-like object and >> provides at least of one of functions 'c', '[' and 'length' should >> provide all three? It would also be easy to describe a test-suite >> that should be included in the 'test' directory of a package >> implementing such a class, that had some tests of the basic >> vector-manipulation functionality, such as: >> >> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the >> > # class being tested, of length 0, 1, 3, and 10, and they should >> > # contain no duplicate elements >> > length(x0) >> [1] 1 >> > length(c(x0, x1)) >> [1] 2 >> > length(c(x1,x10)) >> [1] 11 >> > all(x3 == x3[seq(len=length(x3))]) >> [1] TRUE >> > all(x3 == c(x3[1], x3[2], x3[3])) >> [1] TRUE >> > length(c(x3[2], x10[5:7])) >> [1] 4 >> > >> >> It would also be possible to describe a larger set of vector >> manipulation functions that should be implemented together, including >> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', >> head, tail ... (many of which are provided for POSIXlt). >> >> Or is there some good reason that length() cannot be provided (while >> 'c' and '[' can) for some vector-like classes such as "POSIXlt"? > > What you say sounds good in general, but the devil is in the details. > Changing the meaning of length(x) for some objects has fairly > widespread effects. Are they all positive? I don't know. > > Adding a prescription like the one you suggest would be good if it's > easy to implement, but bad if it's already widely violated. How many > base or CRAN or Bioconductor packages violate it currently? Do the > ones that provide all 3 methods do so in a consistent way, i.e. does > "length(x)" mean the same thing in all of them? I'm not sure doing something like this would be so bad even if it is already widely violated. R has evolved significantly over time, and many rough edges have been cleaned up, sometimes in ways that were not backward compatible. This is a great thing & my thanks go to the people working on R. If some base or CRAN or Bioconductor packages currently don't implement vector operations consistently, wouldn't it be good to know that? Wouldn't it be us
[Rd] Improvement of SignRank functions
I took some time and liberty and tried to improve existing implementation of SignRank functions in R. (dsignrank, ...) As I have seen they've been based on csignrank. So I modified csignrank and, I believe, improved calculation time and memory efficiency. The idea is basically the same. I use the same recursion as original author used with one slight modification. I am generating Wilcoxon SignRank density from the beginning (for n=2,3,...) with the help of recursion formula. What is changed? There is no need for SINGRANK_MAX in src/nmath/nmath.h anymore. Only functions: static void w_free() void signrank_free() static void w_init_maybe(int n) static double csignrank(int k, int n) in src/nmath/signrank.c have been altered. There was no change to dsignrank, psignrank, ... I've tried to make as little changes as possible. So, to compile with new functions only src/nmath/signrank.c needs to be changed with the one given in attachment. I hope it really is an improvement. With respect, -- Ivo Ugrina ICQ: 47508335 | www.iugrina.com --- baza matematickih pojmova http://baza.iugrina.com --- anime, manga, Japan fanzin http://yoshi.iugrina.com /* * Mathlib : A C Library of Special Functions * Copyright (C) 1999-2007 The R Development Core Team * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, a copy is available at * http://www.r-project.org/Licenses/ * * SYNOPSIS * *#include *double dsignrank(double x, double n, int give_log) *double psignrank(double x, double n, int lower_tail, int log_p) *double qsignrank(double x, double n, int lower_tail, int log_p) *double rsignrank(double n) * * DESCRIPTION * *dsignrankThe density of the Wilcoxon Signed Rank distribution. *psignrankThe distribution function of the Wilcoxon Signed Rank * distribution. *qsignrankThe quantile function of the Wilcoxon Signed Rank * distribution. *rsignrankRandom variates from the Wilcoxon Signed Rank * distribution. */ #include "nmath.h" #include "dpq.h" static double *w; static int allocated_n; static void w_free() { if( !w) return; free( (void *) w); w = 0; allocated_n = 0; } void signrank_free() { w_free(); } static void w_init_maybe(int n) { int u, c; u = n * (n + 1) / 2; c = (int) (u / 2); if( w){ if( n != allocated_n){ w_free(); } else return; } if( !w){ w = (double *) calloc(c + 1, sizeof(double)); #ifdef MATHLIB_STANDALONE if (!w) MATHLIB_ERROR("%s", _("signrank allocation error")); #endif allocated_n = n; } } static double csignrank(int k, int n) { int c, u, i, j, end; #ifndef MATHLIB_STANDALONE R_CheckUserInterrupt(); #endif u = n * (n + 1) / 2; c = (int) (u / 2); if ((k < 0) || (k > u)) return 0; if (k > c) k = u - k; if ( n == 1) return 1.0; if ( w[0]==1.0 ) return w[k]; w[0]=w[1]=1.0; for( j=2; j=j; --i){ w[i] += w[i-j]; } } return w[k]; } double dsignrank(double x, double n, int give_log) { double d; #ifdef IEEE_754 /* NaNs propagated correctly */ if (ISNAN(x) || ISNAN(n)) return(x + n); #endif n = floor(n + 0.5); if (n <= 0) ML_ERR_return_NAN; if (fabs(x - floor(x + 0.5)) > 1e-7) return(R_D__0); x = floor(x + 0.5); if ((x < 0) || (x > (n * (n + 1) / 2))) return(R_D__0); w_init_maybe(n); d = R_D_exp(log(csignrank(x, n)) - n * M_LN2); return(d); } double psignrank(double x, double n, int lower_tail, int log_p) { int i; double f, p; #ifdef IEEE_754 if (ISNAN(x) || ISNAN(n)) return(x + n); #endif if (!R_FINITE(n)) ML_ERR_return_NAN; n = floor(n + 0.5); if (n <= 0) ML_ERR_return_NAN; x = floor(x + 1e-7); if (x < 0.0) return(R_DT_0); if (x >= n * (n + 1) / 2) return(R_DT_1); w_init_maybe(n); f = exp(- n * M_LN2); p = 0; if (x <= (n * (n + 1) / 4)) { for (i = 0; i <= x; i++) p += csignrank(i, n) * f; } else { x = n * (n + 1) / 2 - x; for (i = 0; i < x; i++) p += csignrank(i, n) * f; lower_tail = !lower_tail; /* p = 1 - p; */ } return(R_DT_val(p)); } /* psig
Re: [Rd] Rsquared bug lm() (PR#10516)
This is deliberate and as documented in ?summary.lm. It is not a bug. -thomas On Fri, 14 Dec 2007 [EMAIL PROTECTED] wrote: > Full_Name: lieven clement > Version: R version 2.4.0 Patched (2006-11-25 r39997) > OS: i486-pc-linux-gnu > Submission from: (NULL) (157.193.193.180) > > > summary.lm() does not calculate R² accurately for models without intercepts > if > one of the predictor variables is a factor. > In order to avoid one of the factor levels to be considered as a reference > class > you can use the -1 option in a formula. When you use this, R² is not > correctly > calculated. > >> x1<-rnorm(100) >> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25)) >> y<-10*x1+x2+rnorm(100,0,4) >> x2<-as.factor(x2) >> lmtest<-lm(y~-1+x1+x2) >> summary(lmtest)$r.sq > [1] 0.9650201 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > The R squared by summary is calculated as >> 1-sum(lmtest$res^2)/sum((y)^2) > [1] 0.9650201 > apparently because lm.summary assumes the mean of y to be zero. > > In case of an intercept model everything seems ok >> lmtest<-lm(y~x1+x2) >> summary(lmtest)$r.sq > [1] 0.9342672 >> 1-sum(lmtest$res^2)/sum((y-mean(y))^2) > [1] 0.9342672 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > Thomas Lumley Assoc. Professor, Biostatistics [EMAIL PROTECTED] University of Washington, Seattle __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel