[Rd] The function cummax() seems to have a bug.

2015-05-17 Thread Dongcan Jiang
Hi,

The function cummax() seems to have a bug.

> x <- c(NA, 0)
> storage.mode(x) <- "integer"
> cummax(x)
[1] NA  0

The correct result of this case should be NA NA. The mistake in [
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
the reason.

Best Regards,
Dongcan

-- 
Dongcan Jiang
Team of Search Engine & Web Mining
School of Electronic Engineering & Computer Science
Peking University, Beijing, 100871, P.R.China

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] That 'make check-all' problem with the survival package

2015-05-17 Thread Hin-Tak Leung
--
On Sat, May 16, 2015 2:33 PM BST Marc Schwartz wrote:

>
>> On May 16, 2015, at 6:11 AM, Hin-Tak Leung  
>> wrote:
>> 
>> 
>> 
>> --
>> On Sat, May 16, 2015 8:04 AM BST Uwe Ligges wrote:
>> 
>> Not sure why this goes to R-devel. You just could have asked the 
>> maintainer. Terry Therneau is aware of it and promised he will fix it.
>> 
>> 
>> The quickest fix is to add cmprsk to the recommended list , and that's is an 
>> R-devel issue.
>
>
>Actually, the easiest solution would be for Terry to either modify relevant 
>code according to:
>
>  
>http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Suggested-packages
>
>or perhaps better, as noted, to remove the need for cmprsk at all.
>

We shall disagree on what is "quickest" - one other option I did not mention is 
to back
out the version upgrade to 2.38-1 in February, and ship the previous 2.37-7.

I trust that Terry made the addition for good reasons. So my thoughts are 
geared more towards
how to accommodate the new addition, rather than revert/remove it.
(though I'd be happier to isolate a smaller change than a bulk 2.37-7 -> 2.38.1 
upgrade
and think more about that. Hence the mention of the repositories.).

>The latter raises the philosophical issue as to whether or not a Recommended 
>package should or really needs to have any connections at all to third party 
>CRAN packages. It seems to me that the default R distribution of “Base” and 
>“Recommended” packages should be able to fully run in a stand alone manner 
>without any declared external connections to other non-default packages.
>
>The only other Recommended package that I found that has such a connection is 
>nlme, which has a Suggests for Frank’s Hmisc. However, based upon a grep 
>review of the package contents, there are only two code based references to 
>Hmisc that I located:
>
>./R/newFunc.R:264:      ## e.g. Hmisc's "labelled"
>./tests/augPred_lab.R:2:if(require("Hmisc")) {
>
>Neither of which is really needed (the first being a comment only, so benign) 
>and of course the latter goes against the current guidance in R-exts. The 
>second, within the if() code, uses the Hmisc label() function, which it seems 
>to me is not really needed here.
>
>Regards,
>
>Marc Schwartz
>
>
>> 
>> On 16.05.2015 07:22, Hin-Tak Leung wrote:
>>> 'make check-all' for current R has been showing this error in the middle
>>> for a few months now - any thought on fixing this? I think cmprsk
>>> should be either included in the recommended bundle, or
>>> the survival vignette to not depend on it. Having 'make check-all' showing
>>> glaring ERROR's for a few months seems to defeat the purpose of
>>> doing any checking at all via 'make check-all'.
>>> 
>>> FWIW, I did look at when/how the issue was introduced, but it appeared
>>> that svn://svn.r-forge.r-project.org/svnroot/survival is no longer being
>>> updated, and git://github.com/cran/survival.git only shows release jumps.
>>> Anyway, if first appears with survival 2.38-1 in February, and as the 
>>> previous
>>> 2.37-7 was 13 months older, this info is of no use to anybody.
>>> I didn't write earlier as I thought the issue would go away at some point;
>>> but obviously this isn't the case after 3 months.
>>> 
>>> ---
>>>  ERROR
>>> Errors in running code in vignettes:
>>> when running code in ‘compete.Rnw’
>>>   ...
>>> temp$fstat <- as.numeric(event)
>>> 
>>> temp$msex <- with(temp, 1 * (sex == "M"))
>>> 
>>> fgfit1 <- with(temp, crr(etime, fstat, cov1 = cbind(age,
>>> +     msex, mspike), failcode = 2, cencode = 1, variance = TRUE))
>>> 
>>>   When sourcing ‘compete.R’:
>>> Error: could not find function "crr"
>>> Execution halted
>>> 
>>> * checking re-building of vignette outputs ... NOTE
>>> Error in re-building vignettes:
>>>   ...
>>> Warning in coxph(Surv(futime, death) ~ group:age2 + sex + strata(group),  :
>>>   X matrix deemed to be singular; variable 23 24 25
>>> Loading required package: cmprsk
>>> Warning in library(package, lib.loc = lib.loc, character.only = TRUE, 
>>> logical.return = TRUE,  :
>>>   there is no package called ‘cmprsk’
>>> 
>>> Error: processing vignette 'compete.Rnw' failed with diagnostics:
>>>  chunk 15 (label = finegray)
>>> Error in eval(expr, envir, enclos) : could not find function "crr"
>>> Execution halted
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] install.packages() / update.packages() sometimes outputs to stdout and sometimes to stderr [and menu() & readline()]

2015-05-17 Thread Henrik Bengtsson
I've noticed that install.packages()
[https://svn.r-project.org/R/trunk/src/library/utils/R/packages.R] and
update.packages()
[https://svn.r-project.org/R/trunk/src/library/utils/R/packages2.R]
sometimes output to stdout and sometimes to stderr.

It looks like stderr is used (e.g. via cat()) when the message is part
of querying the user, e.g.

update.packages <- function(lib.loc = NULL, repos = getOption("repos"),
[...]
cat(old[k, "Package"], ":\n",
"Version", old[k, "Installed"],
"installed in", old[k, "LibPath"],
if(checkBuilt) paste("built under R", old[k, "Built"]),
"\n",
"Version", old[k, "ReposVer"], "available at",
simplifyRepos(old[k, "Repository"], type))
cat("\n")
answer <- substr(readline("Update (y/N/c)?  "), 1L, 1L)
if(answer == "c" | answer == "C") {
cat("cancelled by user\n")
return(invisible())
}

but it is not consistently so, because some are sent to stderr (e.g.
via message()), e.g.

install.packages <-
function(pkgs, lib, repos = getOption("repos"),
[...]
if(action == "interactive" && interactive()) {
msg <-
ngettext(sum(later & hasSrc),
 "Do you want to install from sources
the package which needs compilation?",
 "Do you want to install from sources
the packages which need compilation?")
message(msg, domain = NA)
res <- readline("y/n: ")
if(res != "y") later <- later & !hasSrc
} else if (action == "never") {
cat("  Binaries will be installed\n")
later <- later & !hasSrc
}

Also, as one see in the latter example, it is not only interactive
user queries for which stdout is used.  It's simply not consistent -
at least I cannot see pattern.

If these are not intended behaviors, I'm happy to provide patches.
I'd prefer stderr for all user queries (see below) - but I can also
see how this is something that needs considered thoughts and made an
official design policy across the R base distribution.


[Related to the above but could deserve it's own thread (feel free to
move the below to its own thread)]

utils::menu(..., graphics=FALSE)
[https://svn.r-project.org/R/trunk/src/library/utils/R/menu.R] queries
the user via standard output, which becomes an issue when running
interactive report generators, which mostly captures stdout and makes
it part of the produced artifact.  Personally, I'd argue that querying
the user via stderr would be a better choice in more cases.

Also, it's a bit weird that base::readline(), which is used for the
actual prompting of the user, is sent neither to R's stdout nor
stderr, e.g.

> zz <- file("all.Rout", open = "wt")
> sink(zz); sink(zz, type = "message")
> ans <- menu(letters[1:3], title="Select one:", graphics=FALSE)
Selection: 1

> sink(type = "message"); sink()
> ans
[1] 1

> cat(readLines("all.Rout"), sep="\n")
Select one:

1: a
2: b
3: c

Note the only thing displayed to the user is the prompt "Selection: ",
which is generated by readline().  It does however output to the
system's stdout (verified on Linux and Windows), e.g.

$ Rscript -e "readline('Press ENTER: ')" > stdout.log
$ cat stdout.log
Press ENTER:
[1] ""

Compare this to how it works in, for instance, bash:

$ read -p "Press ENTER: " ans > stdout.log
Press ENTER:
$ read -p "Press ENTER: " ans > stderr.log
$ cat stderr.log
Press ENTER:

My preference would be that menu() and readline() and other messages
for querying the user would output to the same connection/stream.
Again, I'd favor stderr over stdout, but possibly even a third
alternative designed specifically for user queries, cf. my R-devel
post '[Rd] Output to "raw console" rather than stdout/stderr?' on
2015-02-01 (https://stat.ethz.ch/pipermail/r-devel/2015-February/070578.html).
Just like when using menu(..., graphics=TRUE), this would not clutter
up the output to stdout (or stderr).  But even without this third
alternative, I argue that stderr is better than stdout. I'm happy to
provide patches for this as well.


Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] The function cummax() seems to have a bug.

2015-05-17 Thread Henrik Bengtsson
Below is some further troubleshooting on this:

>From code inspection this bug happens for only:

* for integer values
* when the first element is NA_integer_ and the second is not.


Examples:

# Numeric/doubles works as expected
> cummax(c(NA_real_, 0, 1, 2, 3))
[1] NA NA NA NA NA

# It does not occur when the first value is non-NA
> cummax(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA

# When first value is NA, it is not "remembered"
# (because internal for loop starts with 2nd element)
> cummax(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA  0  1  2  3

The problem is not there for cummin():

> cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA
> cummin(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA NA NA NA NA

but that is just "pure luck" due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.

LibExtern intR_NaInt;   /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER  R_NaInt

Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148

The simple solution is probably to do (cf. native icumsum):

[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
===
--- src/main/cum.c  (revision 68378)
+++ src/main/cum.c  (working copy)
@@ -130,7 +130,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int max = ix[0];
 is[0] = max;
-for (R_xlen_t i = 1 ; i < xlength(x) ; i++) {
+for (R_xlen_t i = 0 ; i < xlength(x) ; i++) {
if(ix[i] == NA_INTEGER) break;
is[i] = max = (max > ix[i]) ? max : ix[i];
 }
@@ -142,7 +142,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int min = ix[0];
 is[0] = min;
-for (R_xlen_t i = 1 ; i < xlength(x) ; i++ ) {
+for (R_xlen_t i = 0 ; i < xlength(x) ; i++ ) {
if(ix[i] == NA_INTEGER) break;
is[i] = min = (min < ix[i]) ? min : ix[i];
 }

/Henrik

On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang  wrote:
> Hi,
>
> The function cummax() seems to have a bug.
>
>> x <- c(NA, 0)
>> storage.mode(x) <- "integer"
>> cummax(x)
> [1] NA  0
>
> The correct result of this case should be NA NA. The mistake in
> [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
> the reason.
>
> Best Regards,
> Dongcan
>
> --
> Dongcan Jiang
> Team of Search Engine & Web Mining
> School of Electronic Engineering & Computer Science
> Peking University, Beijing, 100871, P.R.China

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel