Re: [Rd] choose(n,k) when n is almost integer

2010-02-03 Thread Petr Savicky
On Tue, Feb 02, 2010 at 01:37:46PM +0100, Petr Savicky wrote:
> I would like to add some more information concerning the patch C
> to the function choose() proposed in the email
>   https://stat.ethz.ch/pipermail/r-devel/2009-December/056177.html
> 
> The patch uses transformations of choose(n, k), which are described in
>   http://www.cs.cas.cz/~savicky/R-devel/formulas_choose.pdf
> 
> The accuracy of the modified function choose(n, k) may be verified
> on randomly generated examples using Rmpfr package
>   http://cran.at.r-project.org/web/packages/Rmpfr/index.html
> and the script
>   http://www.cs.cas.cz/~savicky/R-devel/test_choose_2.R

Let me add an explanation of the script test_choose_2.R.

The script generates a vector of random real numbers n, which are from
a continuous distribution, but concentrate near integer values. The
original implementation of choose(m, k) considers m as an integer, if
abs(m - round(m)) <= 1e-7. The vector n is generated so that the
probability of abs(n[i] - round(n[i])) <= 1e-7 is approximately 0.1.
The distribution of n[i] - round(n[i]) is symmetric around 0, so we
get both n[i], which are close to an integer from below and from above.
On the other hand, the probability of abs(n[i] - round(n[i])) >= 0.3 is
approximately 0.1083404, so there are also numbers n[i], which are not
close to an integer.

The script calculates choose(n, k) for k in 0:209 (an ad hoc upper bound)
1. using the modified choose(n, k) from patch C
2. using the expression n(n-1)...(n-k+1)/(1 2 ... k) with Rmpfr numbers
   of precision 100 bits.
The relative difference of these two results is computed and its maximum
over all n[i] and k from 0 to a given bound is reported. The bounds on k are
chosen to be the numbers, whose last digit is 9, since the algorithm in 
choose(n,k)
is different for k <= 29 and k >= 30.

An upper bound of the relative rounding error of a single operation with
Rmpfr numbers of precision 100 bits is (1 + 2^-100). Hence, an upper bound
on the total relative error of n(n-1)...(n-k+1)/(1 2 ... k) is
(1 + 2^-100)^(2*209) \approx (1 + 2 * 209 * 2^-100) \approx 1 + 3.297e-28.
This is a negligible error compared to the errors reported by test_choose_2.R,
so the reported errors are the errors of the patched choose(n, k).

The errors reported by test_choose_2.R with patched choose(n,k) are in
a previous email.

Running test_choose_2.R with unpatched R version 2.11.0 (2010-02-01 r51089)
produces larger errors.

  > source("test_choose_2.R")
  k <= 9  max rel err = Inf 
  k <= 19  max rel err = Inf 

  > source("test_choose_2.R")
  k <= 9  max rel err = 0.111 
  k <= 19  max rel err = Inf 

  > source("test_choose_2.R")
  k <= 9  max rel err = Inf 
  k <= 19  max rel err = Inf 

  > source("test_choose_2.R")
  k <= 9  max rel err = 8.383718e-08 
  k <= 19  max rel err = 1.226306e-07 
  k <= 29  max rel err = 1.469050e-07 
  Error: segfault from C stack overflow

The Inf relative errors occur in cases, where choose(n, k) calculates 0,
but the correct result is not 0.

The stack overflow error is sometimes generated due to an infinite sequence
of transformations 
  choose(n, k) -> choose(n, n-k) -> choose(n, round(n-k))
which occur if k = 30 and n = 60 - eps. The reason for the transformation
  choose(n, k) -> choose(n, n-k)
is that
  k >= k_small_max = 30
  n is treated as an integer in R_IS_INT(n)
  n-k < k_small_max
So, choose(n, n-k) is called. There, we determine that n-k is almost an integer 
and
since n-k is the second argument of choose(n,k), it is explicitly rounded to an 
integer.
Since n = 2*k - eps, we have round(n-k) = round(k - eps) = k. The result is that
we again call choose(n, k) and this repeats until the stack overflow.

For example
  > choose(60 - 1e-9, 30)
  Error: segfault from C stack overflow

Besides patch C, which corrects this stack overflow, also the previous
patches A, B from 
  https://stat.ethz.ch/pipermail/r-devel/2009-December/056154.html
correct this, but have lower accuracy.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Package Directory Hierarchy: Recursive inclusion of *.R possible?

2010-02-03 Thread Johannes Graumann
Hello,

I would like to organize the "R" directory in my home-grown package into 
sub-directories, but "R CMD --build" doesn't seem to find *.R files below 
the actual source directory. Is there any way around that?

Thanks, Joh

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Package options

2010-02-03 Thread Chris Brien
Dear all,

I am developing a package foo that has a namespace. I would like to be able to 
provide an option, say opt, that 

1) is set to a default value when the package is loaded
2) can be set to a different value by the package user
3) is used by functions within the package

How can I achieve this?

Cheers,
 
Chris Brien
Adjunct Senior Lecturer in Statistics
-
School of Mathematics & Statistics - Mawson Lakes
Phenomics and Bioinformatics Research Centre
University of South Australia
GPO Box 2471
ADELAIDE  5001  South Australia
Phone:  +61 8 8302 5873   Fax:  +61 8 8302 5785
Email:   chris.br...@unisa.edu.au 
WEB page:   
CRICOS No 00121B 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package options

2010-02-03 Thread Duncan Murdoch

On 03/02/2010 8:07 AM, Chris Brien wrote:

Dear all,

I am developing a package foo that has a namespace. I would like to be able to provide an option, say opt, that 


1) is set to a default value when the package is loaded
2) can be set to a different value by the package user
3) is used by functions within the package

How can I achieve this?



Do you want this option to persist to the next session if a user saves 
the workspace?  If so, then it should be stored in a variable in the 
global environment.  You can use assign(".FooOptions", value, 
envir=globalenv()) to set it, and get(".FooOptions", envir=globalenv())

to read it.

The obvious problem with this is that the user might already have a 
.FooOptions variable defined, and your code would stomp on it.  An 
alternative is to store the variable into the namespace.  You need to 
unlock it to change it.  See tools:::startDynamicHelp for code that does 
this for the variable httpdPort.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Package options

2010-02-03 Thread Prof Brian Ripley
A number of packages do this.  See e.g. 'sm' and its function 
sm.options() for one implementation.


On Wed, 3 Feb 2010, Duncan Murdoch wrote:


On 03/02/2010 8:07 AM, Chris Brien wrote:

Dear all,

I am developing a package foo that has a namespace. I would like to be able 
to provide an option, say opt, that 
1) is set to a default value when the package is loaded

2) can be set to a different value by the package user
3) is used by functions within the package

How can I achieve this?



Do you want this option to persist to the next session if a user saves the 
workspace?  If so, then it should be stored in a variable in the global 
environment.  You can use assign(".FooOptions", value, envir=globalenv()) to 
set it, and get(".FooOptions", envir=globalenv())

to read it.

The obvious problem with this is that the user might already have a 
.FooOptions variable defined, and your code would stomp on it.  An 
alternative is to store the variable into the namespace.  You need to unlock 
it to change it.  See tools:::startDynamicHelp for code that does this for 
the variable httpdPort.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error with cut.POSIXt and daylight savings time switchover dates

2010-02-03 Thread Brian Diggs
On 2/1/2010 3:57 PM, Brian Diggs wrote:
> The following code:
> 
> cut(as.POSIXct("2009-11-01 04:00:00", tz="America/Los_Angeles"), "1 day")
> 
> gives the error:
> 
> Error in seq.int(0, to - from, by) : 'to' must be finite
> 
> This is related to November 1st, 2009 being the switchover date from
> daylight savings time to standard time in the America/Los_Angeles
> time zone.  In particular, in cut.POSIXt, the starting time (start)
> is converted to a POSIXlt, and the individual members are
> manipulated.  Because a spacing of "1 day" is requested, the hour,
> minute, and second are manually set to 0.  In doing so, the
> represented time is now before the 2:00am PDT->PST change.  This
> value is passed to seq.int (as the argument from), which dispatches
> to seq.POSIXt.  seq.POSIXt eventually does from <-
> unclass(as.POSIXct(from)) which evaluates to NA because
> as.POSIXct(from) is NA.  The seq.int call in the next line then
> passes NA as the "to" argument, causing the output error (which comes
> from the C-code of seq.int).
> 
> Bringing it all together, the sequence of commands that causes the
> problems is:
> 
> tm <- as.POSIXlt("2009-11-01 04:00:00", tz="America/Los_Angeles") 
> tm$hour <- 0 
> as.POSIXct(tm)
> # [1] NA
> 
> Is there a timezone/daylight savings time safe way to get to the
> beginning of the day in cut.POSIXt so that invalid dates are not sent
> to the other functions?  Alternatively, can as.POSIXct.POSIXlt be
> made to handle these manually manipulated dates correctly?

I realized I forgot my session info:

> sessionInfo()
R version 2.10.1 (2009-12-14) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 


> -- Brian Diggs, Ph.D. Senior Research Associate, Department of
> Surgery, Oregon Health & Science University

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mclapply on a set not divisible by number of cores (PR#14205)

2010-02-03 Thread o . heil
Full_Name: Oliver Heil
Version: 2.10.0
OS: debian squeeze
Submission from: (NULL) (193.174.58.251)


When running mclapply on a list of strings with a length of 618 on 10 cores the
resulting data is wrong every 10 entries starting with the 6th. Our machine has
16 cores.

You may reproduce the error using data provided here:


Together with the following code (R --vanilla): 

# foreach probeid(618 Probeids) get the data points from the 
#dataframes control and group
# calculate mean, standard deviation and detection p value for group and
control
# calculate the p value, that mean of control and mean of group are different
# 
# The result is a list (length 618) of 7 tuples
# 
# Have a look at x_sd_p.test[[6]], x_sd_p.test[[16]], ...
# It works fine using lapply or doing the function "by 
#hand" for example with factor=probeids[6]
#

load("df.control.R")
load("df.group.R")
load("negative_bead.R")
load("probeids.R")

library("multicore")

x_sd_p.test=mclapply(probeids,function(factor){
idxg=which(df.group$Factor %in% factor);
mg=NA;sdg=NA;pg=1.0;
if(length(idxg)>0){
lg=df.group$x[idxg];
mg=mean(lg,,TRUE);
sdg=sd(lg,TRUE);
t=wilcox.test(lg,negative_bead,alternative="g",exact=TRUE);
pg=t$p.value;
}
idxc=which(df.control$Factor %in% factor);
mc=NA;sdc=NA;pc=1.0
if(length(idxc)>0){
lc=df.control$x[idxc];
mc=mean(lc,,TRUE);
sdc=sd(lc,TRUE);
t=wilcox.test(lc,negative_bead,alternative="g",exact=TRUE);
pc=t$p.value;
}
p=1.0;
if(length(idxg)>0&&length(idxc)>0){
t=wilcox.test(lg,lc,alternative="t",exact=TRUE);
p=t$p.value;
}
c(mg,sdg,pg,mc,sdc,pc,p);
},mc.cores=10)

l=lapply(x_sd_p.test,function(x){length(x)})




> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] multicore_0.1-3

loaded via a namespace (and not attached):
[1] tools_2.10.0

> version
   _
platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  10.0
year   2009
month  10
day26
svn rev50208
language   R
version.string R version 2.10.0 (2009-10-26)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ctrl-C aborts R when compiling package code with ICC+openMP

2010-02-03 Thread Ernest Turro
Hi all,

I have some C++ code that I call from my package. In my main C++ loop, I check 
for user interrupts and return to the R shell after ensuring I've deallocated 
memory appropriately. This works fine when the code is compiled with gcc+openmp 
and with icc without openmp, but when I compile with icc and use openmp, the 
entire R session is immediately terminated when I hit ctrl-C. This happens even 
if I only have one thread and even if I set an openmp barrier just before 
checking for user interrupts.

When the R session terminates unexpectedly, I usually just get "Aborted" before 
return to the bash prompt. Occasionally, though, I get this error:

OMP: Error #15: Initializing libguide.so, but found libguide.so already 
initialized.
OMP: Hint: This may cause performance degradation and correctness issues. Set 
environment variable KMP_DUPLICATE_LIB_OK=TRUE to ignore this problem and force 
the program to continue anyway. Please note that the use of 
KMP_DUPLICATE_LIB_OK is unsupported and using it may cause undefined behavior. 
For more information, please contact Intel(R) Premier Support.

But KMP_DUPLICATE_LIB_OK=TRUE changes nothing.

I had a look at:
http://software.intel.com/en-us/articles/opm-abort-initializing-libguide40dll/

which suggests there may be a conflict between libguide40 and libiomp5md, but I 
can't find any loaded R packages that link against libiomp5md...

Does anyone have any ideas?

Many thanks,

Ernest
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: ctrl-C aborts R when compiling package code with ICC+openMP

2010-02-03 Thread Ernest Turro
I omitted to mention that when I set KMP_DUPLICATE_LIB_OK=TRUE and 
OMP_NUM_THREADS=1, I consistently get this error instead:

OMP: Error #13: Assertion failure at kmp_csupport.c(465).
OMP: Hint: Please submit a bug report with this message, compile and run 
commands used, and machine configuration info including native compiler and 
operating system versions. Faster response will be obtained by including all 
program sources. For information on submitting this issue, please see 
http://www.intel.com/software/products/support/.

Thanks,
Ernest


Begin forwarded message:

> From: Ernest Turro 
> Date: 3 February 2010 20:49:42 GMT
> To: r-devel List 
> Subject: [Rd] ctrl-C aborts R when compiling package code with ICC+openMP
> x-mailer: Apple Mail (2.1077)
> 
> Hi all,
> 
> I have some C++ code that I call from my package. In my main C++ loop, I 
> check for user interrupts and return to the R shell after ensuring I've 
> deallocated memory appropriately. This works fine when the code is compiled 
> with gcc+openmp and with icc without openmp, but when I compile with icc and 
> use openmp, the entire R session is immediately terminated when I hit ctrl-C. 
> This happens even if I only have one thread and even if I set an openmp 
> barrier just before checking for user interrupts.
> 
> When the R session terminates unexpectedly, I usually just get "Aborted" 
> before return to the bash prompt. Occasionally, though, I get this error:
> 
> OMP: Error #15: Initializing libguide.so, but found libguide.so already 
> initialized.
> OMP: Hint: This may cause performance degradation and correctness issues. Set 
> environment variable KMP_DUPLICATE_LIB_OK=TRUE to ignore this problem and 
> force the program to continue anyway. Please note that the use of 
> KMP_DUPLICATE_LIB_OK is unsupported and using it may cause undefined 
> behavior. For more information, please contact Intel(R) Premier Support.
> 
> But KMP_DUPLICATE_LIB_OK=TRUE changes nothing.
> 
> I had a look at:
> http://software.intel.com/en-us/articles/opm-abort-initializing-libguide40dll/
> 
> which suggests there may be a conflict between libguide40 and libiomp5md, but 
> I can't find any loaded R packages that link against libiomp5md...
> 
> Does anyone have any ideas?
> 
> Many thanks,
> 
> Ernest
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Proposal unary - operator for factors

2010-02-03 Thread Hadley Wickham
Hi all,

Why not make the unary minus operator return the factor with levels
reversed?  This would make it much easier to sort factors in
descending order in part of an order statement.

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread William Dunlap

> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Hadley Wickham
> Sent: Wednesday, February 03, 2010 3:10 PM
> To: r-devel@r-project.org
> Subject: [Rd] Proposal unary - operator for factors
> 
> Hi all,
> 
> Why not make the unary minus operator return the factor with levels
> reversed?  This would make it much easier to sort factors in
> descending order in part of an order statement.

It wouldn't make sense in the context of
   vector[-factor]

Wouldn't it be better to allow order's decreasing argument
to be a vector with one element per ... argument?  That
would work for numbers, factors, dates, and anything
else.  Currently order silently ignores decreasing[2] and
beyond.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Hadley
> 
> -- 
> http://had.co.nz/
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Hadley Wickham
> It wouldn't make sense in the context of
>   vector[-factor]

True, but that doesn't work currently so you wouldn't lose anything.
However, it would make a certain class of problem that used to throw
errors become silent.

> Wouldn't it be better to allow order's decreasing argument
> to be a vector with one element per ... argument?  That
> would work for numbers, factors, dates, and anything
> else.  Currently order silently ignores decreasing[2] and
> beyond.

The problem is you might want to do something like order(a, -b, c, -d)

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread William Dunlap
> -Original Message-
> From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On 
> Behalf Of Hadley Wickham
> Sent: Wednesday, February 03, 2010 3:38 PM
> To: William Dunlap
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Proposal unary - operator for factors
> 
> > It wouldn't make sense in the context of
> >   vector[-factor]
> 
> True, but that doesn't work currently so you wouldn't lose anything.
> However, it would make a certain class of problem that used to throw
> errors become silent.
> 
> > Wouldn't it be better to allow order's decreasing argument
> > to be a vector with one element per ... argument?  That
> > would work for numbers, factors, dates, and anything
> > else.  Currently order silently ignores decreasing[2] and
> > beyond.
> 
> The problem is you might want to do something like order(a, -b, c, -d)

Currently, for numeric a you can do either
   order(-a)
or
   order(a, decreasing=FALSE)
For nonnumeric types like POSIXct and factors only
the latter works.

Under my proposal your
   order(a, -b, c, d)
would be
   order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
and it would work for any ordably class without modifications
to any classes.

Bill
 
> Hadley
> 
> -- 
> http://had.co.nz/
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Duncan Murdoch

On 03/02/2010 6:49 PM, William Dunlap wrote:

-Original Message-
From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On 
Behalf Of Hadley Wickham

Sent: Wednesday, February 03, 2010 3:38 PM
To: William Dunlap
Cc: r-devel@r-project.org
Subject: Re: [Rd] Proposal unary - operator for factors


It wouldn't make sense in the context of
  vector[-factor]

True, but that doesn't work currently so you wouldn't lose anything.
However, it would make a certain class of problem that used to throw
errors become silent.


Wouldn't it be better to allow order's decreasing argument
to be a vector with one element per ... argument?  That
would work for numbers, factors, dates, and anything
else.  Currently order silently ignores decreasing[2] and
beyond.

The problem is you might want to do something like order(a, -b, c, -d)


Currently, for numeric a you can do either
   order(-a)
or
   order(a, decreasing=FALSE)
For nonnumeric types like POSIXct and factors only
the latter works.

Under my proposal your
   order(a, -b, c, d)
would be
   order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
and it would work for any ordably class without modifications
to any classes.


Why not use

 order(a, -xtfrm(b), c, -xtfrm(d))

??

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Hadley Wickham
>> Currently, for numeric a you can do either
>>   order(-a)
>> or
>>   order(a, decreasing=FALSE)
>> For nonnumeric types like POSIXct and factors only
>> the latter works.
>>
>> Under my proposal your
>>   order(a, -b, c, d)
>> would be
>>   order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
>> and it would work for any ordably class without modifications
>> to any classes.
>
> Why not use
>
>  order(a, -xtfrm(b), c, -xtfrm(d))

That's a good suggestion.  You could make it even easier to read with
desc <- function(x) -xtfrm(x)

order(a, desc(b), c, desc(d))

Could you remind me what xtfrm stands for?

Thanks!

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread William Dunlap

> -Original Message-
> From: Duncan Murdoch [mailto:murd...@stats.uwo.ca] 
> Sent: Wednesday, February 03, 2010 4:17 PM
> To: William Dunlap
> Cc: Hadley Wickham; r-devel@r-project.org
> Subject: Re: [Rd] Proposal unary - operator for factors
> 
> On 03/02/2010 6:49 PM, William Dunlap wrote:
> >> -Original Message-
> >> From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On 
> >> Behalf Of Hadley Wickham
> >> Sent: Wednesday, February 03, 2010 3:38 PM
> >> To: William Dunlap
> >> Cc: r-devel@r-project.org
> >> Subject: Re: [Rd] Proposal unary - operator for factors
> >>
> >>> It wouldn't make sense in the context of
> >>>   vector[-factor]
> >> True, but that doesn't work currently so you wouldn't lose 
> anything.
> >> However, it would make a certain class of problem that 
> used to throw
> >> errors become silent.
> >>
> >>> Wouldn't it be better to allow order's decreasing argument
> >>> to be a vector with one element per ... argument?  That
> >>> would work for numbers, factors, dates, and anything
> >>> else.  Currently order silently ignores decreasing[2] and
> >>> beyond.
> >> The problem is you might want to do something like 
> order(a, -b, c, -d)
> > 
> > Currently, for numeric a you can do either
> >order(-a)
> > or
> >order(a, decreasing=FALSE)
> > For nonnumeric types like POSIXct and factors only
> > the latter works.
> > 
> > Under my proposal your
> >order(a, -b, c, d)
> > would be
> >order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
> > and it would work for any ordably class without modifications
> > to any classes.
> 
> Why not use
> 
>   order(a, -xtfrm(b), c, -xtfrm(d))
> 
> ??

You could, if you can remember it.  I have been annoyed
that decreasing= was in order() but not as useful as it
could be since it is not vectorized.  The same goes for
na.last, although that seems less useful to me.

Here is a version of order (based on the
algorithm using in S+'s order) that
vectorizes the na.last and decreasing
arguments.  It calls the existing order
function to implement decreasing=TRUE/FALSE
and na.last=TRUE/FALSE for a single argument
but order itself could be mofified in this
way.

new.order <- function (..., na.last = TRUE, decreasing = FALSE) 
{
vectors <- list(...)
nVectors <- length(vectors)
stopifnot(nVectors > 0)
na.last <- rep(na.last, length = nVectors)
decreasing <- rep(decreasing, length = nVectors)
keys <- seq_len(length(vectors[[1]]))
for (i in nVectors:1) {
v <- vectors[[i]]
if (length(v) < length(keys)) 
v <- rep(v, length = length(keys))
keys <- keys[order(v[keys], na.last = na.last[i], decreasing =
decreasing[i])]
}
keys
}

With the following dataset

data <- data.frame(
  ct = as.POSIXct(c("2009-01-01", "2010-02-03",
"2010-02-28"))[c(2,2,2,3,3,1)],
  dt =as.Date(c("2009-01-01", "2010-02-03",
"2010-02-28"))[c(3,2,2,2,3,1)],
  fac =  factor(c("Small","Medium","Large"),
levels=c("Small","Medium","Large"))[c(1,3,2,3,3,1)],
  n  =c(11,12,12,11,12,12))

> data
  ct dtfac  n
1 2010-02-03 2010-02-28  Small 11
2 2010-02-03 2010-02-03  Large 12
3 2010-02-03 2010-02-03 Medium 12
4 2010-02-28 2010-02-03  Large 11
5 2010-02-28 2010-02-28  Large 12
6 2009-01-01 2009-01-01  Small 12
> data.frame(lapply(data,rank))
   ct  dt fac   n
1 3.0 5.5 1.5 1.5
2 3.0 3.0 5.0 4.5
3 3.0 3.0 3.0 4.5
4 5.5 3.0 5.0 1.5
5 5.5 5.5 5.0 4.5
6 1.0 1.0 1.5 4.5

we get (where my demos use rank because I could remember
the name xtfrm):

> with(data, identical(order(ct,dt), new.order(ct,dt)))
[1] TRUE
> with(data, identical(order(fac,-n),
new.order(fac,n,decreasing=c(FALSE,TRUE
[1] TRUE
> with(data, identical(order(ct,-rank(dt)),
new.order(ct,dt,decreasing=c(FALSE,TRUE
[1] TRUE
> with(data, identical(order(ct,-rank(fac)),
new.order(ct,fac,decreasing=c(FALSE,TRUE
[1] TRUE
> with(data, identical(order(n,-rank(fac)),
new.order(n,fac,decreasing=c(FALSE,TRUE
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  
> 
> Duncan Murdoch
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Duncan Murdoch

On 03/02/2010 7:20 PM, Hadley Wickham wrote:

Currently, for numeric a you can do either
  order(-a)
or
  order(a, decreasing=FALSE)
For nonnumeric types like POSIXct and factors only
the latter works.

Under my proposal your
  order(a, -b, c, d)
would be
  order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
and it would work for any ordably class without modifications
to any classes.

Why not use

 order(a, -xtfrm(b), c, -xtfrm(d))


That's a good suggestion.  You could make it even easier to read with
desc <- function(x) -xtfrm(x)

order(a, desc(b), c, desc(d))

Could you remind me what xtfrm stands for?


No, I don't think I ever worked it out. :-)

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] proposal for new axis.Date/axis.POSIXct

2010-02-03 Thread Felix Andrews
[reposting after holiday period]

-- Forwarded message --
From: Felix Andrews 
Date: 21 December 2009 23:44
Subject: proposal for new axis.Date/axis.POSIXct
To: r-devel@r-project.org


Hi R-devel.

I've noticed a couple of quirks in the current time/date axis
functions (axis.Date, axis.POSIXct, and the equivalents in lattice).
Looking at the code, it seems like a fairly ad-hoc approach, often
using pretty() on components of the time. This is not always ideal -
for example a one-hour interval gets cut into 10-minute chunks rather
than the more natural 15-minute chunks (since pretty() doesn't know
about the minutes in an hour, etc). Generally the number of tick marks
produced varies a lot, and there are a couple of strange cases: try
plot(0:1 ~ as.POSIXct(paste(2002:2003,"-02-01",sep="")))

So, I've written a function prettyDate() that attempts to act like
pretty(), but with dates and times. Like pretty, it takes arguments
'n' and 'min.n' which specify the desired and minimum number of ticks,
respectively.

http://pastie.org/751640

By the way, also listed there is an extension of trunc.POSIXt with
extra units "weeks", "months", "years", "decades", "centuries".

Following is a test of prettyDate() for axis labelling, drawn for a
sequence of different time intervals.

source("http://pastie.org/751640.txt";)

steps <-
   list("10 secs",
        "1 min", "5 mins", "10 mins", "15 mins", "30 mins",
        "1 hour", "3 hours", "6 hours", "12 hours",
        "1 DSTday", "1 week", "2 weeks",
        "1 month", "3 months", "6 months",
        "1 year", "2 years", "5 years", "10 years",
        "20 years", "50 years", "100 years")
names(steps) <- paste("span =", unlist(steps))

from <- as.POSIXct("2002-02-02 02:02")
devAskNewPage(TRUE)

lapply(steps, function(s) {
   times <- seq(from, by = s, length = 2)
   plot(0:1 ~ times, yaxt = "n", ylab = "")
   x <- mean(par("usr")[1:2])
   text(x, 0.5, paste("span:", s), cex = 2)
   text(x, 0.33, paste(format(times), collapse="\n"))
   text(x, 0.05, "current axis.POSIXct")
   text(x, 0.95, "proposed new prettyDate axis, n = 5")
   ## draw new proposed axis function at top of plot
   timelim <- par("usr")[1:2]
   mostattributes(timelim) <- attributes(from)
   axis(side = 3, at = prettyDate(timelim),
       labels = prettyDate(timelim, do.format=TRUE))
})

devAskNewPage(FALSE)


Is it appropriate / desirable for this to be incorporated into R?


-- 
Felix Andrews / 安福立
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andr...@anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Prof Brian Ripley

On Wed, 3 Feb 2010, Duncan Murdoch wrote:


On 03/02/2010 7:20 PM, Hadley Wickham wrote:

Currently, for numeric a you can do either
  order(-a)
or
  order(a, decreasing=FALSE)
For nonnumeric types like POSIXct and factors only
the latter works.

Under my proposal your
  order(a, -b, c, d)
would be
  order(a, b, c, d, decreasing=c(FALSE,TRUE,FALSE,TRUE))
and it would work for any ordably class without modifications
to any classes.

Why not use

 order(a, -xtfrm(b), c, -xtfrm(d))


That's a good suggestion.  You could make it even easier to read with
desc <- function(x) -xtfrm(x)

order(a, desc(b), c, desc(d))

Could you remind me what xtfrm stands for?


No, I don't think I ever worked it out. :-)


The same logic as strxfrm.



Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal unary - operator for factors

2010-02-03 Thread Hadley Wickham
>>> Could you remind me what xtfrm stands for?
>>
>> No, I don't think I ever worked it out. :-)
>
> The same logic as strxfrm.

strxfrm is short for string transform.
=> stxfrm is short for string tansform
=> txfrm is short for tansform
=> xtfrm is short of snatform?

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel