[Rd] Quadratic Programming

2007-12-14 Thread de Gosson de Varennes Serge (4100)
Hi all!

I have a little question concerning quadprog. To make it simple I'll start by 
stating the problem:

I want to minimize 

h(d,delta)=0.5d^T B d +nabla(f(x))^T d +rho*delta^2

With respect to d\in R^n and delta \in R. I obviously have constraints 
(depending on both d and delta). 

Solve.QP does give me a good result for d but I cannot obtain anything for 
delta. Simce dim(Dmat)=n and sol<-rep(0,n) it isn't particularly surprising.
To set a diagonal matrix

(B  0   )
Amat=   (0  rho )

Is a crapy idea. Does anyone have an idea? 

Yours,

Serge

"Beatus qui prodest quibus potest"  -   (Lycklig är den som hjälper andra) 

Serge de Gosson de Varennes
Försäkringskassan 
Swedish Social Insurance Agency
+46-76 11 40 799
[EMAIL PROTECTED]





[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Minor documentation bug in R-exts (PR#10515)

2007-12-14 Thread Greg . Snow
This is a minor documentation bug.  In the document:  Writing R
Extensions (R-exts)  section 1.6.5 (Summary -- converting an existing
package) the 3rd bullet is missing the end of the sentence.

Thanks,

--please do not edit the information below--

Version:
 platform =3D i386-pc-mingw32
 arch =3D i386
 os =3D mingw32
 system =3D i386, mingw32
 status =3D=20
 major =3D 2
 minor =3D 6.1
 year =3D 2007
 month =3D 11
 day =3D 26
 svn rev =3D 43537
 language =3D R
 version.string =3D R version 2.6.1 (2007-11-26)

Windows 2000 (build 2195) Service Pack 4.0

Locale:
LC_COLLATE=3DEnglish_United States.1252;LC_CTYPE=3DEnglish_United
States.1252;LC_MONETARY=3DEnglish_United
States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices,
package:utils, package:datasets, package:methods, Autoloads,
package:base

--=20
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
=20

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rsquared bug lm() (PR#10516)

2007-12-14 Thread Duncan Murdoch
On 12/14/2007 8:10 AM, [EMAIL PROTECTED] wrote:
> Full_Name: lieven clement
> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
> OS: i486-pc-linux-gnu
> Submission from: (NULL) (157.193.193.180)
> 
> 
> summary.lm() does not calculate R² accurately for models without intercepts 
> if
> one of the predictor variables is a factor.
> In order to avoid one of the factor levels to be considered as a reference 
> class
> you can use the -1 option in a formula. When you use this, R² is not 
> correctly
> calculated.

This is not a bug.  A model without an intercept should be using y=0 as 
a reference.

Duncan Murdoch

> 
>>  x1<-rnorm(100)
>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>> y<-10*x1+x2+rnorm(100,0,4)
>> x2<-as.factor(x2)
>> lmtest<-lm(y~-1+x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9650201
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
> 
> The R squared by summary is calculated as
>> 1-sum(lmtest$res^2)/sum((y)^2)
> [1] 0.9650201
> apparently because lm.summary assumes the mean of y to be zero.
> 
> In case of an intercept model everything seems ok
>> lmtest<-lm(y~x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9342672
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] End of whiskers of boxplots are repeated on PDF device (PR#10499)

2007-12-14 Thread Martin Maechler
> "MT" == Michael Toews <[EMAIL PROTECTED]>
> on Thu, 13 Dec 2007 09:00:37 -0800 writes:

MT> I've identified the problem for this issue, which is
MT> simple to fix.  Please see and apply the attached
MT> patch. Thanks.  +mt

Excellent, Michael!

This will be fixed in R-patched and R-devel from tomorrow.
Thank you, and regards!

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Minor documentation bug in R-exts (PR#10515)

2007-12-14 Thread Duncan Murdoch
On 12/13/2007 1:15 PM, [EMAIL PROTECTED] wrote:
> This is a minor documentation bug.  In the document:  Writing R
> Extensions (R-exts)  section 1.6.5 (Summary -- converting an existing
> package) the 3rd bullet is missing the end of the sentence.

Thanks, I'll fix it.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rsquared bug lm() (PR#10516)

2007-12-14 Thread lieven . clement
Full_Name: lieven clement
Version:  R version 2.4.0 Patched (2006-11-25 r39997)
OS: i486-pc-linux-gnu
Submission from: (NULL) (157.193.193.180)


summary.lm() does not calculate R² accurately for models without intercepts if
one of the predictor variables is a factor.
In order to avoid one of the factor levels to be considered as a reference class
you can use the -1 option in a formula. When you use this, R² is not correctly
calculated.

>  x1<-rnorm(100)
> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
> y<-10*x1+x2+rnorm(100,0,4)
> x2<-as.factor(x2)
> lmtest<-lm(y~-1+x1+x2)
> summary(lmtest)$r.sq
[1] 0.9650201
> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
[1] 0.9342672

The R squared by summary is calculated as
> 1-sum(lmtest$res^2)/sum((y)^2)
[1] 0.9650201
apparently because lm.summary assumes the mean of y to be zero.

In case of an intercept model everything seems ok
> lmtest<-lm(y~x1+x2)
> summary(lmtest)$r.sq
[1] 0.9342672
> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
[1] 0.9342672

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] windows rtools missing gfortran.exe?

2007-12-14 Thread Hiroyuki Kawakatsu
Hi,

I replaced my Rtools today as posted at
http://www.murdoch-sutherland.com/Rtools/Rtools.exe

Trying to build R-devel_2007-12-13.tar.gz without modifying MkRules
gives the gfortran command not found error below. I am wondering if
gfortran.exe is missing from (recent?) Rtools.exe or I am doing
something wrong.

Thanks to hints at Duncan's site, I worked around the error by adding
"GCC4_SUFF=-sjlj" in MkRules (below the == end of user-customizable
parts  == line).

gcc  -std=gnu99 -I../include -DHAVE_CONFIG_H -DR_DLL_BUILD  -O3 -Wall -pedantic
 -c zeroin.c -o zeroin.o
gfortran -O3  -c ch2inv.f -o ch2inv.o
make[4]: gfortran: Command not found
make[4]: *** [ch2inv.o] Error 127
make[3]: *** [rlibs] Error 2
make[2]: *** [../../bin/R.dll] Error 2
make[1]: *** [rbuild] Error 2
make: *** [all] Error 2

-- 
--
Hiroyuki Kawakatsu
Business School
Dublin City University
Dublin 9, Ireland
Tel +353 (0)1 700 7496

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] windows rtools missing gfortran.exe?

2007-12-14 Thread Duncan Murdoch
On 12/14/2007 10:23 AM, Hiroyuki Kawakatsu wrote:
> Hi,
> 
> I replaced my Rtools today as posted at
> http://www.murdoch-sutherland.com/Rtools/Rtools.exe
> 
> Trying to build R-devel_2007-12-13.tar.gz without modifying MkRules
> gives the gfortran command not found error below. I am wondering if
> gfortran.exe is missing from (recent?) Rtools.exe or I am doing
> something wrong.
> 
> Thanks to hints at Duncan's site, I worked around the error by adding
> "GCC4_SUFF=-sjlj" in MkRules (below the == end of user-customizable
> parts  == line).
> 
> gcc  -std=gnu99 -I../include -DHAVE_CONFIG_H -DR_DLL_BUILD  -O3 -Wall 
> -pedantic
>  -c zeroin.c -o zeroin.o
> gfortran -O3  -c ch2inv.f -o ch2inv.o
> make[4]: gfortran: Command not found
> make[4]: *** [ch2inv.o] Error 127
> make[3]: *** [rlibs] Error 2
> make[2]: *** [../../bin/R.dll] Error 2
> make[1]: *** [rbuild] Error 2
> make: *** [all] Error 2
> 

It's supposed to be in there; I'm just downloading a copy myself to 
verify...  You're right!  Somehow Rtools.exe got linked to Rtools26.exe, 
instead of the new one.  Now fixed.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Do not misuse R-devel {was "Quadratic Programming"}

2007-12-14 Thread Martin Maechler
Please use the R-help mailing for such *Questions*.

R-devel has a very different purpose.
Probably you should also read the posting guide,
--> http://www.r-project.org/posting-guide.html

Regards,
Martin Maechler, ETH Zurich

> "dGdVS" == de Gosson de Varennes Serge (4100) <[EMAIL PROTECTED]>
> on Fri, 14 Dec 2007 09:15:26 +0100 writes:

dGdVS> Hi all!  I have a little question concerning
dGdVS> quadprog. To make it simple I'll start by stating the
dGdVS> problem:

dGdVS> I want to minimize

dGdVS>  h(d,delta)=0.5d^T B d +nabla(f(x))^T d
dGdVS> +rho*delta^2

dGdVS> With respect to d\in R^n and delta \in R. I obviously
dGdVS> have constraints (depending on both d and delta).

dGdVS> Solve.QP does give me a good result for d but I
dGdVS> cannot obtain anything for delta. Simce dim(Dmat)=n
dGdVS> and sol<-rep(0,n) it isn't particularly surprising.
dGdVS> To set a diagonal matrix

dGdVS>  (B 0 ) Amat= (0 rho )

dGdVS> Is a crapy idea. Does anyone have an idea?

dGdVS> Yours,

dGdVS> Serge

dGdVS> "Beatus qui prodest quibus potest" - (Lycklig �r
dGdVS> den som hj�lper andra)

dGdVS> Serge de Gosson de Varennes F�rs�kringskassan
dGdVS> Swedish Social Insurance Agency +46-76 11 40 799
dGdVS> [EMAIL PROTECTED]





dGdVS>  [[alternative HTML version deleted]]

dGdVS> __
dGdVS> R-devel@r-project.org mailing list
dGdVS> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] segfault isoreg with NAs

2007-12-14 Thread Tobias Verbeke
Dear list,

As can be seen below, adding a NA to the y values
in a call to isoreg results in a segfault.

ir4 <- isoreg(1:10, y4 <- c(5, 9, 1:2, 5:8, NA, 8))

Adding missing values to the x values, on the contrary,
gives an error, but maybe the error message could be
tailored to this particular situation.

y <- c(5, 9, 1:2, 5:8, 3, 8)
x <- c(1:9, NA)
isoreg(x, y)
## error message: Error in if (!isOrd) { : missing value where 
TRUE/FALSE needed

Please find below a (temporary) patch (against Revision 43692)
for both the R source and the help file.

Kind regards,
Tobias

### patch isoreg.R ###

--- isoreg.R2007-12-14 19:07:47.0 +0100
+++ isoreg2.R   2007-12-14 19:11:20.0 +0100
@@ -18,6 +18,9 @@
  ##
  isoreg <- function(x, y=NULL)
  {
+if (any(is.na(x))) stop("x may not contain NA values")
+if (any(is.na(y))) stop("y may not contain NA values")
+
  xy <- xy.coords(x,y)
  x <- xy$x
  isOrd <- (!is.null(xy$xlab) && xy$xlab == "Index") || !is.unsorted(x)

### patch isoreg.Rd ###

--- isoreg.Rd   2007-12-14 19:08:12.0 +0100
+++ isoreg2.Rd  2007-12-14 19:15:00.0 +0100
@@ -20,6 +20,7 @@
\item{x, y}{%in \code{isoreg},
  coordinate vectors of the regression points.  Alternatively a single
  plotting structure can be specified: see \code{\link{xy.coords}}.
+The coordinate vectors may not contain missing values.
}
  }
  \details{


### sessionInfo() information segfault ###

> sessionInfo()
R version 2.6.0 (2007-10-03)
i486-pc-linux-gnu

locale:
LC_CTYPE=en_US.ISO-8859-15;LC_NUMERIC=C;LC_TIME=en_US.ISO-8859-15;LC_COLLATE=en_US.ISO-8859-15;LC_MONETARY=en_US.ISO-8859-15\
;LC_MESSAGES=en_US.ISO-8859-15;LC_PAPER=en_US.ISO-8859-15;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.ISO-885\
9-15;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
> ir4 <- isoreg(1:10, y4 <- c(5, 9, 1:2, 5:8, NA, 8))

  *** caught segfault ***
address 0x24, cause 'memory not mapped'

Process R segmentation fault (core dumped) at Fri Dec 14 17:48:22 2007

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rapid Random Access

2007-12-14 Thread Barry Rowlingson
  I have some code that can potentially produce a huge number of 
large-ish R data frames, each of a different number of rows. All the 
data frames together will be way too big to keep in R's memory, but 
we'll assume a single one is manageable. It's just when there's a 
million of them that the machine might start to burn up.

  However I might, for example, want to compute some averages over the 
elements in the data frames. Or I might want to sample ten of them at 
random and do some plots. What I need is rapid random access to data 
stored in external files.

  Here's some ideas I've had:

  * Store all the data in an HDF-5 file - problem here is that the 
current HDF package for R reads the whole file in at once.

  * Store the data in some other custom binary format with an index for 
rapid access to the N-th elements. Problems: feels like reinventing HDF, 
cross-platform issues, etc.

  * Store the data in a number of .RData files in a directory. Hence to 
get the N-th element just attach(paste("foo/A-",n,'.RData')) give or 
take a parameter or two.

  * Use a database. Seems a bit heavyweight, but maybe using RSQLite 
could work in order to keep it local.

  What I'm currently doing is keeping it OO enough that I can in theory 
implement all of the above. At the moment I have an implementation that 
does keep them all in R's memory as a list of data frames, which is fine 
for small test cases but things are going to get big shortly. Any other 
ideas or hints are welcome.

thanks

Barry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rapid Random Access

2007-12-14 Thread Sean Davis
On Dec 14, 2007 1:01 PM, Barry Rowlingson <[EMAIL PROTECTED]>
wrote:

>  I have some code that can potentially produce a huge number of
> large-ish R data frames, each of a different number of rows. All the
> data frames together will be way too big to keep in R's memory, but
> we'll assume a single one is manageable. It's just when there's a
> million of them that the machine might start to burn up.
>
>  However I might, for example, want to compute some averages over the
> elements in the data frames. Or I might want to sample ten of them at
> random and do some plots. What I need is rapid random access to data
> stored in external files.
>
>  Here's some ideas I've had:
>
>  * Store all the data in an HDF-5 file - problem here is that the
> current HDF package for R reads the whole file in at once.
>
>  * Store the data in some other custom binary format with an index for
> rapid access to the N-th elements. Problems: feels like reinventing HDF,
> cross-platform issues, etc.
>
>  * Store the data in a number of .RData files in a directory. Hence to
> get the N-th element just attach(paste("foo/A-",n,'.RData')) give or
> take a parameter or two.
>
>  * Use a database. Seems a bit heavyweight, but maybe using RSQLite
> could work in order to keep it local.
>

Unless you really need this to be a general solution, I would suggest using
a database.  And if you use one that allows you to create functions within
it, you can even keep some of the calculations on the server side (which may
be a performance advantage).  If you are doing a lot of this, you might
consider Postgres and pl/R, which embeds R in the database.

Sean

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rapid Random Access

2007-12-14 Thread Barry Rowlingson
Tony Plate wrote:

> This is exactly the type of situation that the trackObjs package is 
> designed for. 

  Ooh, I'm having deja-vu - yes I think I saw this package a while back 
and wondered what magic it did. I shall go play with it later.

Thanks

Barry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rapid Random Access

2007-12-14 Thread Tony Plate
Barry Rowlingson wrote:
>   I have some code that can potentially produce a huge number of 
> large-ish R data frames, each of a different number of rows. All the 
> data frames together will be way too big to keep in R's memory, but 
> we'll assume a single one is manageable. It's just when there's a 
> million of them that the machine might start to burn up.
>   
This is exactly the type of situation that the trackObjs package is 
designed for.  It will automatically (and invisibly) store each object 
in its own .RData file so that objects can be accessed as ordinary R 
objects, but are not kept in memory (actually, there are options to 
control whether or not objects are cached in memory). It also caches 
some characteristics of objects so that a brief summary of objects can 
be provided without having to read each object.  The g.data package and 
the filehash package also do similar things wrt to providing automatic 
access to objects in .RData files (and were part of the inspiration for 
the trackObjs package.)

-- Tony Plate
>   However I might, for example, want to compute some averages over the 
> elements in the data frames. Or I might want to sample ten of them at 
> random and do some plots. What I need is rapid random access to data 
> stored in external files.
>
>   Here's some ideas I've had:
>
>   * Store all the data in an HDF-5 file - problem here is that the 
> current HDF package for R reads the whole file in at once.
>
>   * Store the data in some other custom binary format with an index for 
> rapid access to the N-th elements. Problems: feels like reinventing HDF, 
> cross-platform issues, etc.
>
>   * Store the data in a number of .RData files in a directory. Hence to 
> get the N-th element just attach(paste("foo/A-",n,'.RData')) give or 
> take a parameter or two.
>
>   * Use a database. Seems a bit heavyweight, but maybe using RSQLite 
> could work in order to keep it local.
>
>   What I'm currently doing is keeping it OO enough that I can in theory 
> implement all of the above. At the moment I have an implementation that 
> does keep them all in R's memory as a list of data frames, which is fine 
> for small test cases but things are going to get big shortly. Any other 
> ideas or hints are welcome.
>
> thanks
>
> Barry
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrong length of POSIXt vectors (PR#10507)

2007-12-14 Thread Tony Plate
Duncan Murdoch wrote:
> On 12/13/2007 1:59 PM, Tony Plate wrote:
>> Duncan Murdoch wrote:
>>> On 12/11/2007 6:20 AM, [EMAIL PROTECTED] wrote:
 Full_Name: Petr Simecek
 Version: 2.5.1, 2.6.1
 OS: Windows XP
 Submission from: (NULL) (195.113.231.2)


 Several times I have experienced that a length of a POSIXt vector 
 has not been
 computed right.

 Example:

 tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
 ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
 ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L), 
 mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon 
 = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L, 
 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = 
 c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L, 
 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min", 
 "hour", "mday", "mon", "year", "wday", "yday", "isdst"
 ), class = c("POSIXt", "POSIXlt"))

 print(tv)
 # print 11 time points (right)

 length(tv)
 # returns 9 (wrong)
>>>
>>> tv is a list of length 9.  The answer is right, your expectation is 
>>> wrong.
 I have tried that on several computers with/without switching to 
 English
 locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a 
 help pages but I
 cannot imagine how that could be OK.
>>>
>>> See this in ?POSIXt:
>>>
>>> Class '"POSIXlt"' is a named list of vectors...
>>>
>>> You could define your own length measurement as
>>>
>>> length.POSIXlt <- function(x) length(x$sec)
>>>
>>> and you'll get the answer you expect, but be aware that length.XXX 
>>> methods are quite rare, and you may surprise some of your users.
>>>
>>
>> On the other hand, isn't the fact that length() currently always 
>> returns 9 for POSIXlt objects likely to be a surprise to many users 
>> of POSIXlt?
>>
>> The back of "The New S Language" says "Easy-to-use facilities allow 
>> you to organize, store and retrieve all sorts of data. ... S 
>> functions and data organization make applications easy to write."
>>
>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many 
>> other vector-manipulation methods - see methods(class="POSIXlt")).  
>> Hence, from the point of view of intending to supply "easy-to-use 
>> facilities ... [for] all sorts of data", isn't it a little 
>> incongruous that length() is not also provided -- as 3 functions (any 
>> others?) comprise a core set of vector-manipulation functions?
>>
>> Would it make sense to have an informal prescription (e.g., in 
>> R-exts) that a class that implements a vector-like object and 
>> provides at least of one of functions 'c', '[' and 'length' should 
>> provide all three?  It would also be easy to describe a test-suite 
>> that should be included in the 'test' directory of a package 
>> implementing such a class, that had some tests of the basic 
>> vector-manipulation functionality, such as:
>>
>>  > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
>>  > # class being tested, of length 0, 1, 3, and 10, and they should
>>  > # contain no duplicate elements
>>  > length(x0)
>> [1] 1
>>  > length(c(x0, x1))
>> [1] 2
>>  > length(c(x1,x10))
>> [1] 11
>>  > all(x3 == x3[seq(len=length(x3))])
>> [1] TRUE
>>  > all(x3 == c(x3[1], x3[2], x3[3]))
>> [1] TRUE
>>  > length(c(x3[2], x10[5:7]))
>> [1] 4
>>  >
>>
>> It would also be possible to describe a larger set of vector 
>> manipulation functions that should be implemented together, including 
>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', 
>> head, tail ... (many of which are provided for POSIXlt).
>>
>> Or is there some good reason that length() cannot be provided (while 
>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"?
>
> What you say sounds good in general, but the devil is in the details. 
> Changing the meaning of length(x) for some objects has fairly 
> widespread effects.  Are they all positive?  I don't know.
>
> Adding a prescription like the one you suggest would be good if it's 
> easy to implement, but bad if it's already widely violated.  How many 
> base or CRAN or Bioconductor packages violate it currently?   Do the 
> ones that provide all 3 methods do so in a consistent way, i.e. does 
> "length(x)" mean the same thing in all of them?
I'm not sure doing something like this would be so bad even if it is 
already widely violated.  R has evolved significantly over time, and 
many rough edges have been cleaned up, sometimes in ways that were not 
backward compatible.  This is a great thing & my thanks go to the people 
working on R.

If some base or CRAN or Bioconductor packages currently don't implement 
vector operations consistently, wouldn't it be good to know that?  
Wouldn't it be us

[Rd] Improvement of SignRank functions

2007-12-14 Thread Ivo Ugrina

I took some time and liberty and tried to improve
existing implementation of SignRank functions
in R. (dsignrank, ...)

As I have seen they've been based on csignrank.
So I modified csignrank and, I believe,
improved calculation time and memory efficiency.

The idea is basically the same. I use the same recursion
as original author used with one slight modification.
I am generating Wilcoxon SignRank density from the
beginning (for n=2,3,...) with the help of recursion formula.

What is changed?
There is no need for SINGRANK_MAX in src/nmath/nmath.h anymore.
Only functions:
static void w_free()
void signrank_free()
static void w_init_maybe(int n)
static double csignrank(int k, int n)
in src/nmath/signrank.c have been altered.
There was no change to dsignrank, psignrank, ...

I've tried to make as little changes as possible.
So, to compile with new functions only src/nmath/signrank.c
needs to be changed with the one given in attachment.

I hope it really is an improvement.

With respect,
--
Ivo Ugrina
ICQ: 47508335 | www.iugrina.com
---
baza matematickih pojmova
http://baza.iugrina.com
---
anime, manga, Japan fanzin
http://yoshi.iugrina.com
/*
 *  Mathlib : A C Library of Special Functions
 *  Copyright (C) 1999-2007  The R Development Core Team
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, a copy is available at
 *  http://www.r-project.org/Licenses/
 *
 *  SYNOPSIS
 *
 *#include 
 *double dsignrank(double x, double n, int give_log)
 *double psignrank(double x, double n, int lower_tail, int log_p)
 *double qsignrank(double x, double n, int lower_tail, int log_p)
 *double rsignrank(double n)
 *
 *  DESCRIPTION
 *
 *dsignrankThe density of the Wilcoxon Signed Rank distribution.
 *psignrankThe distribution function of the Wilcoxon Signed Rank
 * distribution.
 *qsignrankThe quantile function of the Wilcoxon Signed Rank
 * distribution.
 *rsignrankRandom variates from the Wilcoxon Signed Rank
 * distribution.
 */

#include "nmath.h"
#include "dpq.h"

static double *w;
static int allocated_n;

static void
w_free()
{
if( !w) return;

free( (void *) w);
w = 0;
allocated_n = 0;
}

void signrank_free()
{
w_free();
}

static void
w_init_maybe(int n)
{
int u, c;

u = n * (n + 1) / 2;
c = (int) (u / 2);

if( w){
if( n != allocated_n){
w_free();
}
else return;
}

if( !w){
w = (double *) calloc(c + 1, sizeof(double));
#ifdef MATHLIB_STANDALONE
if (!w) MATHLIB_ERROR("%s", _("signrank allocation error"));
#endif
allocated_n = n;
}
}

static double
csignrank(int k, int n)
{
int c, u, i, j, end;

#ifndef MATHLIB_STANDALONE
R_CheckUserInterrupt();
#endif

u = n * (n + 1) / 2;
c = (int) (u / 2);

if ((k < 0) || (k > u))
return 0;
if (k > c)
k = u - k;

if ( n == 1)
return 1.0;
if ( w[0]==1.0 )
return w[k];

w[0]=w[1]=1.0;
for( j=2; j=j; --i){
w[i] += w[i-j];
}
}

return w[k];
}

double dsignrank(double x, double n, int give_log)
{
double d;

#ifdef IEEE_754
/* NaNs propagated correctly */
if (ISNAN(x) || ISNAN(n)) return(x + n);
#endif
n = floor(n + 0.5);
if (n <= 0)
ML_ERR_return_NAN;

if (fabs(x - floor(x + 0.5)) > 1e-7)
return(R_D__0);
x = floor(x + 0.5);
if ((x < 0) || (x > (n * (n + 1) / 2)))
return(R_D__0);

w_init_maybe(n);
d = R_D_exp(log(csignrank(x, n)) - n * M_LN2);

return(d);
}

double psignrank(double x, double n, int lower_tail, int log_p)
{
int i;
double f, p;

#ifdef IEEE_754
if (ISNAN(x) || ISNAN(n))
return(x + n);
#endif
if (!R_FINITE(n)) ML_ERR_return_NAN;
n = floor(n + 0.5);
if (n <= 0) ML_ERR_return_NAN;

x = floor(x + 1e-7);
if (x < 0.0)
return(R_DT_0);
if (x >= n * (n + 1) / 2)
return(R_DT_1);

w_init_maybe(n);
f = exp(- n * M_LN2);
p = 0;
if (x <= (n * (n + 1) / 4)) {
for (i = 0; i <= x; i++)
p += csignrank(i, n) * f;
}
else {
x = n * (n + 1) / 2 - x;
for (i = 0; i < x; i++)
p += csignrank(i, n) * f;
lower_tail = !lower_tail; /* p = 1 - p; */
}

return(R_DT_val(p));
} /* psig

Re: [Rd] Rsquared bug lm() (PR#10516)

2007-12-14 Thread Thomas Lumley

This is deliberate and as documented in ?summary.lm. It is not a bug.

 -thomas

On Fri, 14 Dec 2007 [EMAIL PROTECTED] wrote:

> Full_Name: lieven clement
> Version:  R version 2.4.0 Patched (2006-11-25 r39997)
> OS: i486-pc-linux-gnu
> Submission from: (NULL) (157.193.193.180)
>
>
> summary.lm() does not calculate R² accurately for models without intercepts 
> if
> one of the predictor variables is a factor.
> In order to avoid one of the factor levels to be considered as a reference 
> class
> you can use the -1 option in a formula. When you use this, R² is not 
> correctly
> calculated.
>
>>  x1<-rnorm(100)
>> x2<-c(rep(0,25),rep(10,25),rep(20,25),rep(30,25))
>> y<-10*x1+x2+rnorm(100,0,4)
>> x2<-as.factor(x2)
>> lmtest<-lm(y~-1+x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9650201
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> The R squared by summary is calculated as
>> 1-sum(lmtest$res^2)/sum((y)^2)
> [1] 0.9650201
> apparently because lm.summary assumes the mean of y to be zero.
>
> In case of an intercept model everything seems ok
>> lmtest<-lm(y~x1+x2)
>> summary(lmtest)$r.sq
> [1] 0.9342672
>> 1-sum(lmtest$res^2)/sum((y-mean(y))^2)
> [1] 0.9342672
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel