[Rd] relist.Rd patch

2008-08-16 Thread Dan Davison
There are a few typos in the documentation for relist(). I've also
made a few other changes to the file which I believe are
improvements. I've attached a patch against the version under the
'trunk' branch on the svn server checked out today. It was produced by

diff -u /usr/local/src/R/R-svn-trunk/src/library/utils/man/relist.Rd 
~/relist-new.Rd

I'd also suggest identical() rather than "==" in the equalities at the
bottom of the documentation, but that may be overly pedantic.

Dan

-- 
www.stats.ox.ac.uk/~davison
--- /usr/local/src/R/R-svn-trunk/src/library/utils/man/relist.Rd
2008-08-16 13:41:50.0 +0100
+++ /home/dan/relist-new.Rd 2008-08-16 17:15:54.0 +0100
@@ -13,7 +13,7 @@
 \alias{is.relistable}
 \alias{unlist.relistable}
 %
-\title{Allow Re-Listing an unlisted() Object}
+\title{Allow Re-Listing an unlisted Object}
 \description{
   \code{relist()} is an S3 generic function with a few methods in order
   to allow easy inversion of \code{\link{unlist}(obj)} when that is used
@@ -33,8 +33,9 @@
 }
 
 \arguments{
-  \item{flesh}{ .}
-  \item{skeleton}{ .}
+  \item{flesh}{a vector to be relisted}
+  \item{skeleton}{a list, the structure of which determines the structure
+  of the result}
   \item{x}{an \R object, typically a list (or vector).}
   \item{recursive}{logical.  Should unlisting be applied to list
 components of \code{x}?}
@@ -42,13 +43,13 @@
 }
 \details{
   Some functions need many parameters, which are most easily represented in
-  complex structures.  Unfortunately, many mathematical functions in \R,
+  nested list structures.  Unfortunately, many mathematical functions in \R,
   including \code{\link{optim}} and \code{\link{nlm}} can only operate on
   functions whose domain is
-  a vector.  \R has \code{\link{unlist}()} to convert complex objects into a
-  vector representation.  \code{relist()}, it's methods and the
+  a vector.  \R has \code{\link{unlist}()} to convert nested list objects into 
a
+  vector representation.  \code{relist()}, its methods, and the
   functionality mentioned here provide the inverse operation to convert
-  vectors back to the convenient structural representation.
+  vectors back to the convenient structured representation.
   This allows structured functions (such as \code{optim()}) to have simple
   mathematical interfaces.
 
@@ -60,7 +61,9 @@
   list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0))).
   }
   However, \code{\link{optim}} cannot operate on functions that take lists as 
input; it
-  only likes numeric vectors.  The solution is conversion:
+  only likes numeric vectors.  The solution is conversion. Given a
+  function mvdnorm(x, mean, vcov, log=FALSE) which computes the required
+  probability density, then
   \preformatted{
ipar <- list(mean=c(0, 1), vcov=cbind(c(1, 1), c(1, 0)))
initial.param <- as.relistable(ipar)
@@ -68,9 +71,8 @@
ll <- function(param.vector)
{
   param <- relist(param.vector)
-  -sum(dnorm(x, mean = param$mean, vcov = param$vcov,
+  -sum(mvdnorm(x, mean = param$mean, vcov = param$vcov,
  log = TRUE))
-  ## NB: dnorm() has no vcov... you should get the point
}
 
optim(unlist(initial.param), ll)
@@ -83,14 +85,14 @@
   }
   will put the content of flesh on the skeleton.  You don't need to specify
   skeleton explicitly if the skeleton is stored as an attribute inside flesh.
-  In particular, flesh was created from some object obj with
+  In particular, if flesh was created from some object obj with
   \code{unlist(as.relistable(obj))}
   then the skeleton attribute is automatically set.
 
   As long as \code{skeleton} has the right shape, it should be a precise 
inverse
   of \code{\link{unlist}}.  These equalities hold:
   \preformatted{
-   relist(unlist(x), skeleton) == x
+   relist(unlist(x), x) == x
unlist(relist(y, skeleton)) == y
 
x <- as.relistable(x)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unique.default problem (PR#12551)

2008-08-16 Thread prokaj
Full_Name: Vilmos Prokaj
Version: R 2.7.1
OS: windows
Submission from: (NULL) (213.181.195.84)


Dear developers,

The following line of code (produced by a mistake) caused an infinite loop

unique("a",c("a","b"))

or also

unique(1,1:2)

I made   a little investigation, and it seems to be that the following function
from unique.c is looping infinitely


static int isDuplicated(SEXP x, int indx, HashData *d)
{
int i, *h;

h = INTEGER(d->HashTable);
i = d->hash(x, indx, d);
while (h[i] != NIL) {
if (d->equal(x, h[i], x, indx))
return h[i] >= 0 ? 1 : 0;
i = (i + 1) % d->M;
}
h[i] = indx;
return 0;
}
In this case h contains only one negative value, which causes d->equal(=requal)
to return 0.

static int requal(SEXP x, int i, SEXP y, int j)
{
if (i < 0 || j < 0) return 0;
if (!ISNAN(REAL(x)[i]) && !ISNAN(REAL(y)[j]))
return (REAL(x)[i] == REAL(y)[j]);
else if (R_IsNA(REAL(x)[i]) && R_IsNA(REAL(y)[j])) return 1;
else if (R_IsNaN(REAL(x)[i]) && R_IsNaN(REAL(y)[j])) return 1;
else return 0;
}

I do not claim that the situation above is frequent or even meaningful, however
it should not cause a crash of R.

Sincerely yours
Vilmos Prokaj

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dendrapply.Rd patch

2008-08-16 Thread Dan Davison
One typo, and added rapply() to 'See also' list.

Dan

-- 
www.stats.ox.ac.uk/~davison
--- /usr/local/src/R/R-svn-trunk/src/library/stats/man/dendrapply.Rd
2008-08-16 13:41:48.0 +0100
+++ /home/dan/dendrapply-new.Rd 2008-08-16 19:10:51.0 +0100
@@ -9,7 +9,7 @@
 \description{
   Apply function \code{FUN} to each node of a \code{\link{dendrogram}}
   recursively.  When  \code{y <- dendrapply(x, fn)}, then \code{y} is a
-  dendrogram of the same graph structure as \code{x} and each for each node,
+  dendrogram of the same graph structure as \code{x} and for each node,
   \code{y.node[j] <- FUN( x.node[j], ...)} (where \code{y.node[j]} is an
   (invalid!) notation for the j-th node of y.
 }
@@ -33,7 +33,8 @@
 \note{this is still somewhat experimental, and suggestions for
   enhancements (or nice examples of usage) are very welcome.}
 \seealso{\code{\link{as.dendrogram}}, \code{\link{lapply}} for applying
-  a function to each component of a  \code{list}.
+  a function to each component of a  \code{list}, \code{\link{rapply}}
+  for doing so to each non-list component of a nested list.
 }
 \examples{
 require(graphics)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] RNG Cycle and Duplication (PR#12540)

2008-08-16 Thread Duncan Murdoch

On 16/08/2008 6:09 PM, Shengqiao Li wrote:
Knuth's double version RNG rng-double.c dose a great job. No ties were 
observed for 10M numbers ( totally 2^52 possible different values?). 
In rng-double, double modulo mod_sum replaced the integer version mod_diff 
in the integer version rng.c that is adopted by R.


The integer version uses modulus 2^30. Therefore there are only 2^30 
distinct numbers, which is confirmed by my previous test in R.


If someday Knuth's double version is also included in R, it will be great.


I don't see what the problem is -- why not just do it?  That's what 
"user-supplied" is for.


Duncan Murdoch




Shengqiao Li


On Fri, 15 Aug 2008, Duncan Murdoch wrote:


On 15/08/2008 10:28 AM, Shengqiao Li wrote:
Thank you for your reply and for your suggestion. So the note in man page 
could be more accurate since for an end user, man page should be more 
helpful and source code is mainly for developers.


I was also adviced to use Knuth's  double version RANARRAY from
http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of the 
integer versions in R. I'm a R user. So why not also include the double 
verion in R implementation?
You can try it using kind="user-supplied" if you like, but I suspect it's the 
same as "Knuth-TAOCP-2002".


Duncan Murdoch


Thanks again,


Shengqiao Li

Research Associate
The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330



On Fri, 15 Aug 2008, Duncan Murdoch wrote:


[EMAIL PROTECTED] wrote:
  This message is in MIME format.  The first part should be readable 
text,
  while the remaining parts are likely unreadable without MIME-aware 
tools.


---559023410-851401618-1218751024=:15885
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


I didn't describe the problem clearly. It's about the number of 
distinct=20

values. So just ignore cycle issue.

My tests were:

RNGkind(kind=3D"Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552

RNGkind(kind=3D"Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415

#These collision frequency suggested there were 2^30 distinct values 
by=20

birthday problem.

The birthday problem distribution applies to independent draws, but they 
are only pseudo-independent.  I think the only ways to know for sure if 
there are 2^30 values are to look at the code, or run through a complete 
cycle.  And you need to determine the cycle by looking at .Random.seed, 
not at the returned value.

RNGkind(kind=3D"Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682

RNGkind(kind=3D"Super-Duper");
sum(duplicated(runif(1e7))); #return 11542

RNGkind(kind=3D"Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656

#These indicated there were 2^32 distinct values, which agrees with 
the=20

help info.


If there are 2^30 distinct values for the two generators above, that also 
agrees with the documentation.



RNGkind(kind=3D"Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0

#So for this method, there should be more than 2^32 distinct values.

You may not get the exact numbers, but they should be close. So how to=20
explain above problem?

You haven't demonstrated what you claim, but if you look at the source, 
you'll see that in fact the man page is wrong:  Wichmann-Hill is based on 
3 integer values, which each take on approximately 15 bits of different 
values. So Wichmann-Hill could take nearly 2^45 different values (actually 
30269*30307*30323).


The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you 
want to check the others.

I need generate a large sample without any ties, it seems to me=20
"Wichmann-Hill" is only choice right now.

An alternative would be to construct a new value from two (or more) 
runif() values, but be careful that you don't mess up the distribution 
when you do that.


Duncan Murdoch

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Shengqiao Li

The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

On Thu, 14 Aug 2008, Peter Dalgaard wrote:



Shengqiao Li wrote:


Hello all,
=20
I am generating large samples of random numbers. The RNG help page 
says:=



=20

"All the supplied uniform generators return 32-bit integer values that 
a=



re=20

converted to doubles, so they take at most 2^32 distinct values and 
long=



=20

runs will return duplicated values." But I find that the cycles are not 
=



the=20


same as the 32-bit integer.
=20
My test indicated that the cycles for Knuth's methods were 2^30 
while=20
Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated 
i=



n=20

10M numbers generated by runif using Wichma

Re: [Rd] [R] RNG Cycle and Duplication (PR#12540)

2008-08-16 Thread Shengqiao Li


Knuth's double version RNG rng-double.c dose a great job. No ties were 
observed for 10M numbers ( totally 2^52 possible different values?). 
In rng-double, double modulo mod_sum replaced the integer version mod_diff 
in the integer version rng.c that is adopted by R.


The integer version uses modulus 2^30. Therefore there are only 2^30 
distinct numbers, which is confirmed by my previous test in R.


If someday Knuth's double version is also included in R, it will be great.


Shengqiao Li


On Fri, 15 Aug 2008, Duncan Murdoch wrote:


On 15/08/2008 10:28 AM, Shengqiao Li wrote:
Thank you for your reply and for your suggestion. So the note in man page 
could be more accurate since for an end user, man page should be more 
helpful and source code is mainly for developers.


I was also adviced to use Knuth's  double version RANARRAY from
http://www-cs-faculty.stanford.edu/~knuth/programs.html instead of the 
integer versions in R. I'm a R user. So why not also include the double 
verion in R implementation?


You can try it using kind="user-supplied" if you like, but I suspect it's the 
same as "Knuth-TAOCP-2002".


Duncan Murdoch



Thanks again,


Shengqiao Li

Research Associate
The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330



On Fri, 15 Aug 2008, Duncan Murdoch wrote:


[EMAIL PROTECTED] wrote:
  This message is in MIME format.  The first part should be readable 
text,
  while the remaining parts are likely unreadable without MIME-aware 
tools.


---559023410-851401618-1218751024=:15885
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE


I didn't describe the problem clearly. It's about the number of 
distinct=20

values. So just ignore cycle issue.

My tests were:

RNGkind(kind=3D"Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552

RNGkind(kind=3D"Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415

#These collision frequency suggested there were 2^30 distinct values 
by=20

birthday problem.

The birthday problem distribution applies to independent draws, but they 
are only pseudo-independent.  I think the only ways to know for sure if 
there are 2^30 values are to look at the code, or run through a complete 
cycle.  And you need to determine the cycle by looking at .Random.seed, 
not at the returned value.

RNGkind(kind=3D"Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682

RNGkind(kind=3D"Super-Duper");
sum(duplicated(runif(1e7))); #return 11542

RNGkind(kind=3D"Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656

#These indicated there were 2^32 distinct values, which agrees with 
the=20

help info.


If there are 2^30 distinct values for the two generators above, that also 
agrees with the documentation.



RNGkind(kind=3D"Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0

#So for this method, there should be more than 2^32 distinct values.

You may not get the exact numbers, but they should be close. So how to=20
explain above problem?

You haven't demonstrated what you claim, but if you look at the source, 
you'll see that in fact the man page is wrong:  Wichmann-Hill is based on 
3 integer values, which each take on approximately 15 bits of different 
values. So Wichmann-Hill could take nearly 2^45 different values (actually 
30269*30307*30323).


The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you 
want to check the others.

I need generate a large sample without any ties, it seems to me=20
"Wichmann-Hill" is only choice right now.

An alternative would be to construct a new value from two (or more) 
runif() values, but be careful that you don't mess up the distribution 
when you do that.


Duncan Murdoch

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Shengqiao Li

The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

On Thu, 14 Aug 2008, Peter Dalgaard wrote:



Shengqiao Li wrote:


Hello all,
=20
I am generating large samples of random numbers. The RNG help page 
says:=



=20

"All the supplied uniform generators return 32-bit integer values that 
a=



re=20

converted to doubles, so they take at most 2^32 distinct values and 
long=



=20

runs will return duplicated values." But I find that the cycles are not 
=



the=20


same as the 32-bit integer.
=20
My test indicated that the cycles for Knuth's methods were 2^30 
while=20
Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated 
i=



n=20

10M numbers generated by runif using Wichmann-Hill. The other three 
meth=



ods=20


had cycle length of 2^32.
=20
So, anybody can explain this? And any improvement to the implementation 
=



c