[Rd] Dots are not fixed by make.names()

2018-10-05 Thread Kirill Müller

Hi


It seems that names of the form "..#" and "..." are not fixed by 
make.names(), even though they are reserved words. The documentation reads:


> [...] Names such as ".2way" are not valid, and neither are the 
reserved words.


> Reserved words in R: [...] ... and ..1, ..2 etc, which are used to 
refer to arguments passed down from a calling function, see ?... .


I have pasted a reproducible example below.

I'd like to suggest to convert these to "...#" and "", respectively. 
Happy to contribute PR.



Best regards

Kirill


make.names(c("..1", "..13", "..."))
#> [1] "..1"  "..13" "..."
`..1` <- 1
`..13` <- 13
`...` <- "dots"

mget(c("..1", "..13", "..."))
#> $..1
#> [1] 1
#>
#> $..13
#> [1] 13
#>
#> $...
#> [1] "dots"
`..1`
#> Error in eval(expr, envir, enclos): the ... list does not contain any 
elements

`..13`
#> Error in eval(expr, envir, enclos): the ... list does not contain 13 
elements

`...`
#> Error in eval(expr, envir, enclos): '...' used in an incorrect context

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bias issue in sample() (PR 17494)

2019-02-25 Thread Kirill Müller

Gabe


As mentioned on Twitter, I think the following behavior should be fixed 
as part of the upcoming changes:


R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"    "Rejection"
length(table(runif(1e6)))
## [1] 999863

I don't expect any collisions when using Mersenne-Twister to generate a 
million floating point values. I'm not sure what causes this behavior, 
but it's documented in ?Random:


"Do not rely on randomness of low-order bits from RNGs. Most of the 
supplied uniform generators return 32-bit integer values that are 
converted to doubles, so they take at most 2^32 distinct values and long 
runs will return duplicated values (Wichmann-Hill is the exception, and 
all give at least 30 varying bits.)"


The "Wichman-Hill" bit is interesting:

RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 100
length(table(runif(1e6)))
## [1] 100

Mersenne-Twister has a much much larger periodicity than Wichmann-Hill, 
it would be great to see the above behavior also for Mersenne-Twister. 
Thanks for considering.



Best regards

Kirill


On 20.02.19 08:01, Gabriel Becker wrote:

Luke,

I'm happy to help with this. Its great to see this get tackled (I've cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke 
wrote:


Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

  set.seed(123)
  m <- (2/5) * 2^32
  x <- sample(m, 100, replace = TRUE)
  table(x %% 2, x > m / 2)
  ##
  ##FALSE   TRUE
  ## 0 300620 198792
  ## 1 200196 300392

  table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2)
  ##
  ##  0  1
  ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
 Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bias issue in sample() (PR 17494)

2019-02-26 Thread Kirill Müller

Ralf


I don't doubt this is expected with the current implementation, I doubt 
the implementation is desirable. Suggesting to turn this to


pbirthday(1e6, classes = 2^53)
## [1] 5.550956e-05

(which is still non-zero, but much less likely to cause confusion.)


Best regards

Kirill

On 26.02.19 10:18, Ralf Stubner wrote:

Kirill,

I think some level of collision is actually expected! R uses a 32bit MT
that can produce 2^32 different doubles. The probability for a collision
within a million draws is


pbirthday(1e6, classes = 2^32)

[1] 1

Greetings
Ralf


On 26.02.19 07:06, Kirill Müller wrote:

Gabe


As mentioned on Twitter, I think the following behavior should be fixed
as part of the upcoming changes:

R.version.string
## [1] "R Under development (unstable) (2019-02-25 r76160)"
.Machine$double.digits
## [1] 53
set.seed(123)
RNGkind()
## [1] "Mersenne-Twister" "Inversion"    "Rejection"
length(table(runif(1e6)))
## [1] 999863

I don't expect any collisions when using Mersenne-Twister to generate a
million floating point values. I'm not sure what causes this behavior,
but it's documented in ?Random:

"Do not rely on randomness of low-order bits from RNGs. Most of the
supplied uniform generators return 32-bit integer values that are
converted to doubles, so they take at most 2^32 distinct values and long
runs will return duplicated values (Wichmann-Hill is the exception, and
all give at least 30 varying bits.)"

The "Wichman-Hill" bit is interesting:

RNGkind("Wichmann-Hill")
length(table(runif(1e6)))
## [1] 100
length(table(runif(1e6)))
## [1] 100

Mersenne-Twister has a much much larger periodicity than Wichmann-Hill,
it would be great to see the above behavior also for Mersenne-Twister.
Thanks for considering.


Best regards

Kirill


On 20.02.19 08:01, Gabriel Becker wrote:

Luke,

I'm happy to help with this. Its great to see this get tackled (I've
cc'ed
Kelli Ottoboni who helped flag this issue).

I can prepare a patch for the RNGkind related stuff and the doc update.

As for ???, what are your (and others') thoughts about the possibility of
a) a reproducibility API which takes either an R version (or maybe
alternatively a date) and sets the RNGkind to the default for that
version/date, and/or b) that sessionInfo be modified to capture (and
display) the RNGkind in effect.

Best,
~G


On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke 
wrote:


Before the next release we really should to sort out the bias issue in
sample() reported by Ottoboni and Stark in
https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and
filed aa a bug report by Duncan Murdoch at
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494.

Here are two examples of bad behavior through current R-devel:

   set.seed(123)
   m <- (2/5) * 2^32
   x <- sample(m, 100, replace = TRUE)
   table(x %% 2, x > m / 2)
   ##
   ##    FALSE   TRUE
   ## 0 300620 198792
   ## 1 200196 300392

   table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2)
   ##
   ##  0  1
   ## 429054 570946

I committed a modification to R_unif_index to address this by
generating random bits (blocks of 16) and rejection sampling, but for
now this is only enabled if the environment variable R_NEW_SAMPLE is
set before the first call.

Some things still needed:

- someone to look over the change and see if there are any issues
- adjustment of RNGkind to allowing the old behavior to be selected
- make the new behavior the default
- adjust documentation
- ???

Unfortunately I don't have enough free cycles to do this, but I can
help if someone else can take the lead.

There are two other places I found that might suffer from the same
issue, in walker_ProbSampleReplace (pointed out bu O & S) and in
src/nmath/wilcox.c.  Both can be addressed by using R_unif_index. I
have done that for walker_ProbSampleReplace, but the wilcox change
might need adjusting to support the standalone math library and I
don't feel confident enough I'd get that right.

Best,

luke


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics and    Fax:   319-335-3017
  Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Ellipsis and dot-dot-number [Re: Dots are not fixed by make.names()]

2019-03-08 Thread Kirill Müller

Hi


In addition to the inconsistency in make.names(), the text in ?Reserved 
seems incomplete:


"Reserved words outside quotes are always parsed to be references to the 
objects linked to in the ‘Description’, and hence they are not allowed 
as syntactic names (see make.names). They **are** allowed as 
non-syntactic names, e.g. inside backtick quotes."


`..1` and `...` are allowed for assigning, but these symbols cannot be 
used in the context of a variable. Example:


`..1` <- 1
`..13` <- 13
`...` <- "dots"
`..1`
#> Error: ..1 used in an incorrect context, no ... to look in
`..13`
#> Error: ..13 used in an incorrect context, no ... to look in
`...`
#> Error in eval(expr, envir, enclos): '...' used in an incorrect context

Does the ?Reserved help page need to mention this oddity, or link to 
more detailed documentation?



Best regards

Kirill


On 05.10.18 11:27, Kirill Müller wrote:

Hi


It seems that names of the form "..#" and "..." are not fixed by 
make.names(), even though they are reserved words. The documentation 
reads:


> [...] Names such as ".2way" are not valid, and neither are the 
reserved words.


> Reserved words in R: [...] ... and ..1, ..2 etc, which are used to 
refer to arguments passed down from a calling function, see ?... .


I have pasted a reproducible example below.

I'd like to suggest to convert these to "...#" and "", 
respectively. Happy to contribute PR.



Best regards

Kirill


make.names(c("..1", "..13", "..."))
#> [1] "..1"  "..13" "..."
`..1` <- 1
`..13` <- 13
`...` <- "dots"

mget(c("..1", "..13", "..."))
#> $..1
#> [1] 1
#>
#> $..13
#> [1] 13
#>
#> $...
#> [1] "dots"
`..1`
#> Error in eval(expr, envir, enclos): the ... list does not contain 
any elements

`..13`
#> Error in eval(expr, envir, enclos): the ... list does not contain 
13 elements

`...`
#> Error in eval(expr, envir, enclos): '...' used in an incorrect context

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


r-devel@r-project.org

2019-08-19 Thread Kirill Müller

Hi everyone


The following behavior (in R 3.6.1 and R-devel r77040) caught me by 
surprise today:


truthy <- c(TRUE, FALSE)
falsy <- c(FALSE, TRUE, FALSE)

if (truthy) "check"
#> Warning in if (truthy) "check": the condition has length > 1 and only the
#> first element will be used
#> [1] "check"
if (falsy) "check"
#> Warning in if (falsy) "check": the condition has length > 1 and only the
#> first element will be used
if (FALSE || truthy) "check"
#> [1] "check"
if (FALSE || falsy) "check"
if (truthy || FALSE) "check"
#> [1] "check"
if (falsy || FALSE) "check"

The || operator gobbles the warning about a length > 1 vector. I wonder 
if the existing checks for length 1 can be extended to the operands of 
the || and && operators. Thanks (and apologies if this has been raised 
before).



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Profiling: attributing costs to place of invocation (instead of place of evaluation)?

2020-02-26 Thread Kirill Müller

Hi


Consider the following example:

f <- function(expr) g(expr)
g <- function(expr) {
  h(expr)
}
h <- function(expr) {
  expr # evaluation happens here
  i(expr)
}
i <- function(expr) {
  expr # already evaluated, no costs here
  invisible()
}

rprof <- tempfile()
Rprof(rprof)
f(replicate(1e2, sample.int(1e4)))
Rprof(NULL)
cat(readLines(rprof), sep = "\n")
#> sample.interval=2
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f"

The evaluation of the slow replicate() call is deferred to the execution 
of h(), but there's no replicate() call in h's definition. This makes 
parsing the profile much more difficult than necessary.


I have pasted an experimental patch below (off of 3.6.2) that produces 
the following output:


cat(readLines(rprof), sep = "\n")
#> sample.interval=2
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"
#> "sample.int" "FUN" "lapply" "sapply" "replicate" "f"

This attributes the cost to the replicate() call to f(), where the call 
is actually defined. From my experience, this will give a much better 
understanding of the actual costs of each part of the code. The SIGPROF 
handler looks at sysparent and cloenv before deciding if an element of 
the call stack is to be included in the profile.


Is there interest in integrating a variant of this patch, perhaps with 
an optional argument to Rprof()?


Thanks!


Best regards

Kirill


Index: src/main/eval.c
===
--- src/main/eval.c    (revision 77857)
+++ src/main/eval.c    (working copy)
@@ -218,7 +218,10 @@
 if (R_Line_Profiling)
 lineprof(buf, R_getCurrentSrcref());

+    SEXP sysparent = NULL;
+
 for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
+    if (sysparent != NULL && cptr->cloenv != sysparent && 
cptr->sysparent != sysparent) continue;

 if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
     && TYPEOF(cptr->call) == LANGSXP) {
     SEXP fun = CAR(cptr->call);
@@ -292,6 +295,8 @@
         else
         lineprof(buf, cptr->srcref);
     }
+
+        sysparent = cptr->sysparent;
     }
 }
 }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R CMD check --as-cran without qpdf

2015-10-10 Thread Kirill Müller
Today, a package that has an HTML vignette (but no PDF vignette) failed 
R CMD check --as-cran on a system without qpdf. I think the warning 
originates here [1], due to a premature check for the existence of qpdf 
[2]. Setting R_QPDF=true (as in /bin/true) helped, but perhaps it's 
possible to check qpdf existence only when it matters.


I have attached a patch (untested) that could serve as a starting point. 
The code links correspond to SVN revision 69500. Thanks.



Best regards

Kirill


[1] 
https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L322-L326, 

[2] 
https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L4426-L4428
diff --git src/library/tools/R/check.R src/library/tools/R/check.R
index a508453..e4e5027 100644
--- src/library/tools/R/check.R
+++ src/library/tools/R/check.R
@@ -319,11 +319,7 @@ setRlibs <-
  paste("  file", paste(sQuote(miss[f]), collapse = ", "),
"will not be installed: please remove it\n"))
 }
-if (dir.exists("inst/doc")) {
-if (R_check_doc_sizes) check_doc_size()
-else if (as_cran)
-warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs")
-}
+if (R_check_doc_sizes && dir.exists("inst/doc")) check_doc_size()
 if (dir.exists("inst/doc") && do_install) check_doc_contents()
 if (dir.exists("vignettes")) check_vign_contents(ignore_vignettes)
 if (!ignore_vignettes) {
@@ -2129,12 +2125,18 @@ setRlibs <-
 
 check_doc_size <- function()
 {
-## Have already checked that inst/doc exists and qpdf can be found
+## Have already checked that inst/doc exists
 pdfs <- dir('inst/doc', pattern="\\.pdf",
 recursive = TRUE, full.names = TRUE)
 pdfs <- setdiff(pdfs, "inst/doc/Rplots.pdf")
 if (length(pdfs)) {
 checkingLog(Log, "sizes of PDF files under 'inst/doc'")
+if (!nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf" {
+if (as_cran)
+warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs")
+return()
+}
+
 any <- FALSE
 td <- tempfile('pdf')
 dir.create(td)
@@ -4424,8 +4426,7 @@ setRlibs <-
 	config_val_to_logical(Sys.getenv("_R_CHECK_PKG_SIZES_", "TRUE")) &&
 nzchar(Sys.which("du"))
 R_check_doc_sizes <-
-	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE")) &&
-nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf")))
+	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE"))
 R_check_doc_sizes2 <-
 	config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES2_", "FALSE"))
 R_check_code_assign_to_globalenv <-

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller
I can't seem to reliably obtain parse data via getParseData() for 
functions from installed packages. The parse data seems to be available 
only for the *last* file in the package.


See [1] for a small example package with just two functions f and g in 
two files a.R and b.R. See [2] for a documented test run on installed 
package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with 
r-devel (r70303).


The parse data helps reliable coverage analysis [3]. Please advise.


Best regards

Kirill


[1] https://github.com/krlmlr/covr.dummy
[2] http://rpubs.com/krlmlr/getParseData
[3] https://github.com/jimhester/covr/pull/154

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller



On 10.03.2016 15:49, Duncan Murdoch wrote:

On 10/03/2016 8:27 AM, Kirill Müller wrote:

I can't seem to reliably obtain parse data via getParseData() for
functions from installed packages. The parse data seems to be available
only for the *last* file in the package.

See [1] for a small example package with just two functions f and g in
two files a.R and b.R. See [2] for a documented test run on installed
package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with
r-devel (r70303).

The parse data helps reliable coverage analysis [3]. Please advise.


You don't say how you built the package.  Parse data is omitted by 
default.


Duncan Murdoch


I install using R CMD INSTALL ., and I have options(keep.source = TRUE, 
keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there, 
it's just that the parse data is not where I'd expect it to be.



-Kirill



Best regards

Kirill


[1] https://github.com/krlmlr/covr.dummy
[2] http://rpubs.com/krlmlr/getParseData
[3] https://github.com/jimhester/covr/pull/154

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] getParseData() for installed packages

2016-03-10 Thread Kirill Müller

On 10.03.2016 16:05, Duncan Murdoch wrote:

On 10/03/2016 9:53 AM, Kirill Müller wrote:


On 10.03.2016 15:49, Duncan Murdoch wrote:


I install using R CMD INSTALL ., and I have options(keep.source = TRUE,
keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there,
it's just that the parse data is not where I'd expect it to be.



Okay, I see what you describe.  I'm not going to have time to track 
this down for a while, so I'm going to post your message as a bug 
report, and hopefully will be able to get to it before 3.3.0.


Thanks. A related note: Would it be possible to make available all of 
first_byte/last_byte/first_column/last_column in the parse data, for 
easier srcref reconstruction?



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] DESCRIPTION file: Space after colon mandatory?

2016-03-29 Thread Kirill Müller
According to R-exts, DESCRIPTION is a DCF variant, and " Fields start 
with an ASCII name immediately followed by a colon: the value starts 
after the colon and a space." However, according to the linked 
https://www.debian.org/doc/debian-policy/ch-controlfields.html, 
horizontal space before and after a value are trimmed, this is also the 
behavior of read.dcf().


Is this an omission in the documentation, or is the space after the 
colon actually required? Thanks.



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-18 Thread Kirill Müller
Scenario: An S3 method is declared for an S4 base class but called for 
an instance of a derived class.


Steps to reproduce:

> Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
  no applicable method for 'test' applied to an object of class "lsyMatrix"
Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())

> Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c is 
making wrong assumptions, but there might be other reasons. 
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)


Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-18 Thread Kirill Müller

Please omit "MatrixDispatchTest::" from the test scripts:

Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; test(Matrix::Matrix())"


Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; test(Matrix::Matrix())"



-Kirill


On 19.04.2016 01:35, Kirill Müller wrote:
Scenario: An S3 method is declared for an S4 base class but called for 
an instance of a derived class.


Steps to reproduce:

> Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- 
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
  no applicable method for 'test' applied to an object of class 
"lsyMatrix"

Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())

> Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); 
test.Matrix <- function(x) 'Hi'; 
MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c 
is making wrong assumptions, but there might be other reasons. 
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)


Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment

2016-04-18 Thread Kirill Müller
Thanks for looking into it, your approach sounds good to me. See also 
R_has_methods_attached() 
(https://github.com/wch/r-source/blob/42ecf5f492a005f5398cbb4c9becd4aa5af9d05c/src/main/objects.c#L258-L265).


I'm fine with Rscript not loading "methods", as long as everything works 
properly with "methods" loaded but not attached.



-Kirill


On 19.04.2016 04:10, Michael Lawrence wrote:

Right, the methods package is not attached by default when running R
with Rscript. We should probably remove that special case, as it
mostly just leads to confusion, but that won't happen immediately.

For now, the S4_extends() should probably throw an error when the
methods namespace is not loaded. And the check should be changed to
directly check whether R_MethodsNamespace has been set to something
other than the default (R_GlobalEnv). Agreed?

On Mon, Apr 18, 2016 at 4:35 PM, Kirill Müller
 wrote:

Scenario: An S3 method is declared for an S4 base class but called for an
instance of a derived class.

Steps to reproduce:


Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <-
function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

Error in UseMethod("test", x) :
   no applicable method for 'test' applied to an object of class "lsyMatrix"
Calls: 
1: MatrixDispatchTest::test(Matrix::Matrix())


Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x);
test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())"

[1] "Hi"

To me, it looks like a sanity check in line 655 of src/main/attrib.c is
making wrong assumptions, but there might be other reasons.
(https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656)

Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Kirill Müller

Hi


I think the following behavior is a regression from R 3.2.5:

> match(iconv(  c("\u00f8", "A"), from = "UTF8", to  = "latin1" ), 
"\u00f8")

[1]  1 NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8")
[1] NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8", 
incomparables = NA)

[1] 1

I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.

The specific behavior makes me think this is related to the following 
NEWS entry:


match(x, table) is faster (sometimes by an order of magnitude) when x is 
of length one and incomparables is unchanged (PR#16491).



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R process killed when allocating too large matrix (Mac OS X)

2016-05-11 Thread Kirill Müller
My ulimit package exposes this API ([1], should finally submit it to 
CRAN); unfortunately this very API seems to be unsupported on OS X 
[2,3]. Last time I looked into it, neither of the documented settings 
achieved the desired effect.



-Kirill


[1] http://krlmlr.github.io/ulimit
[2] 
http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working
[3] 
https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html



On 10.05.2016 01:08, Jeroen Ooms wrote:

On 05/05/2016 10:11, Uwe Ligges wrote:

Actually this also happens under Linux and I had my R processes killed
more than once (and much worse also other processes so that we had to
reboot a server, essentially).

I found that setting RLIMIT_AS [1] works very well on Linux. But this
requires that you cap memory to some fixed value.


library(RAppArmor)
rlimit_as(1e9)
rnorm(1e9)

Error: cannot allocate vector of size 7.5 Gb

The RAppArmor package has many other utilities to protect your server
such from a mis-behaving process such as limiting cpu time
(RLIMIT_CPU), fork bombs (RLIMIT_NPROC) and file sizes (RLIMIT_FSIZE).

[1] http://linux.die.net/man/2/getrlimit

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R process killed when allocating too large matrix (Mac OS X)

2016-05-13 Thread Kirill Müller

On 12.05.2016 09:51, Martin Maechler wrote:

 > My ulimit package exposes this API ([1], should finally submit it to
 > CRAN); unfortunately this very API seems to be unsupported on OS X
 > [2,3]. Last time I looked into it, neither of the documented settings
 > achieved the desired effect.

 > -Kirill

 > [1] http://krlmlr.github.io/ulimit
 > [2]
 > 
http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working
 > [3]
 > 
https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html


...

In an ideal word, some of us,
 from R core, Jeroen, Kyrill, ,
 maintainer("microbenchmark>, ...
would sit together and devise an R function interface (based on
low level platform specific interfaces, specifically for at least
Linux/POSIX-compliant, Mac, and Windows) which would allow
something  like your rlimit(..) calls below.

We'd really need something to work on all platforms ideally,
to be used by R package maintainers
and possibly even better by R itself at startup, setting a
reasonable memory cap - which the user could raise even to +Inf (or lower
even more).

I haven't found a Windows API that allows limiting the address space, 
only one that limits the working set size; it seems likely that this is 
the best we can get on OS X, too, but then my experience with OS X is 
very limited.


mallinfo() is used on Windows and seems to be available on Linux, too, 
but not on OS X.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] withAutoprint({ .... }) ?

2016-09-02 Thread Kirill Müller

On 02.09.2016 14:38, Duncan Murdoch wrote:

On 02/09/2016 7:56 AM, Martin Maechler wrote:

On R-help, with subject
   '[R] source() does not include added code'


Joshua Ulrich 
on Wed, 31 Aug 2016 10:35:01 -0500 writes:


> I have quantstrat installed and it works fine for me. If you're
> asking why the output of t(tradeStats('macross')) isn't being 
printed,

> that's because of what's described in the first paragraph in the
> *Details* section of help("source"):

> Note that running code via ‘source’ differs in a few respects from
> entering it at the R command line.  Since expressions are not
> executed at the top level, auto-printing is not done. So you will
> need to include explicit ‘print’ calls for things you want to be
> printed (and remember that this includes plotting by ‘lattice’,
> FAQ Q7.22).



> So you need:

> print(t(tradeStats('macross')))

> if you want the output printed to the console.

indeed, and "of course"" ;-)

As my subject indicates, this is another case, where it would be
very convenient to have a function

   withAutoprint()

so the OP could have (hopefully) have used
   withAutoprint(source(..))
though that would have been equivalent to the already nicely existing

   source(.., print.eval = TRUE)

which works via the  withVisible(.) utility that returns for each
'expression' if it would auto print or not, and then does print (or
not) accordingly.

My own use cases for such a withAutoprint({...})
are demos and examples, sometimes even package tests which I want to 
print:


Assume I have a nice demo / example on a help page/ ...

foo(..)
(z <- bar(..))
summary(z)


where I carefully do print parts (and don't others),
and suddenly I find I want to run that part of the demo /
example / test only in some circumstances, e.g., only when
interactive, but not in BATCH, or only if it is me, the package 
maintainer,


if( identical(Sys.getenv("USER"), "maechler") ) {
  foo(..)
  (z <- bar(..))
  summary(z)
  
}

Now all the auto-printing is gone, and

1) I have to find out which of these function calls do autoprint and 
wrap

   a print(..) around these, and

2) the result is quite ugly (for an example on a help page etc.)

What I would like in a future R, is to be able to simply wrap the "{
.. } above with an 'withAutoprint(.) :

if( identical(Sys.getenv("USER"), "maechler") ) withAutoprint({
  foo(..)
  (z <- bar(..))
  summary(z)
  
})

Conceptually such a function could be written similar to source() 
with an R
level for loop, treating each expression separately, calling eval(.) 
etc.
That may cost too much performnace, ... still to have it would be 
better than

not having the possibility.



If you read so far, you'd probably agree that such a function
could be a nice asset in R,
notably if it was possible to do this on the fast C level of R's main
REPL.

Have any of you looked into how this could be provided in R ?
If you know the source a little, you will remember that there's
the global variable  R_Visible  which is crucial here.
The problem with that is that it *is* global, and only available
as that; that the auto-printing "concept" is so linked to "toplevel 
context"
and that is not easy, and AFAIK not so much centralized in one place 
in the
source. Consequently, all kind of (very) low level functions 
manipulate R_Visible

temporarily and so a C level implementation of withAutoprint() may
need considerable more changes than just setting R_Visible to TRUE in 
one

place.

Have any efforts / experiments already happened towards providing such
functionality ?


I don't think the performance cost would matter.  If you're printing 
something, you're already slow.  So doing this at the R level would 
make most sense to me --- that's how Sweave and source and knitr do 
it, so it can't be that bad.


Duncan Murdoch

A C-level implementation would bring the benefit of a lean traceback() 
in case of an error. I suspect eval() could be enhanced to auto-print.


By the same token it would be extremely helpful to have a C-level 
implementation of local() which wouldn't litter the stack trace.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] withAutoprint({ .... }) ?

2016-09-27 Thread Kirill Müller

On 25.09.2016 18:29, Martin Maechler wrote:

I'm now committing my version (including (somewhat incomplete)
documentation, so you (all) can look at it and try / test it further.
Thanks, that's awesome. Is `withAutoprint()` recursive? How about 
calling the new function in `example()` (instead of `source()` as it is 
now) so that examples are always rendered in auto-print mode? That may 
add some extra output to examples (which can be removed easily), but 
solve the original problem in a painless way.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Upgrading a package to which other packages are LinkingTo

2016-12-16 Thread Kirill Müller

Hi


I'd like to suggest to make R more informative when a user updates a 
package A where there's at least one package B that has "LinkingTo: A" 
in its description.


To illustrate the problem, assume package A is updated so that its C/C++ 
header interface (in inst/include) is changed. For package B to pick up 
these changes, we need to reinstall package A. In extreme cases, if B 
also imports A and uses functions from A's shared library, failure to 
reinstall B may lead to all sorts of undefined behavior.


I've stumbled over this recently for A = Rcpp 0.12.8 and B = dplyr 0.5.0 
[1], with a bug fix available in Rcpp 0.12.8.2. Simply upgrading Rcpp to 
0.12.8.2 wasn't enough to propagate the bug fix to dplyr; we need to 
reinstall dplyr 0.5.0 too.


I've prepared an example with R-devel r71799. The initial configuration 
[2] is Rcpp 0.12.8 and dplyr 0.5.0. There is no warning from R after 
upgrading Rcpp to 0.12.8.2 [3], and no warning when loading the (now 
"broken") dplyr 0.5.0 linked against Rcpp 0.12.8 but importing Rcpp 
0.12.8.2 [4].


As a remedy, I'd like to suggest that upgrading Rcpp gives a warning 
about installed packages that are LinkingTo it [3], and that loading 
dplyr gives a warning that it has been built against a different version 
of Rcpp [4], just like the warning when packages are built against a 
different version of R.


Thanks.


Best regards

Kirill


[1] https://github.com/hadley/dplyr/issues/2308#issuecomment-267495075
[2] https://travis-ci.org/krlmlr/pkg.upgrade.test#L589-L593
[3] https://travis-ci.org/krlmlr/pkg.upgrade.test#L619-L645
[4] https://travis-ci.org/krlmlr/pkg.upgrade.test#L671-L703

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Upgrading a package to which other packages are LinkingTo

2016-12-16 Thread Kirill Müller

Thanks for discussing this.

On 16.12.2016 17:19, Dirk Eddelbuettel wrote:

On 16 December 2016 at 11:00, Duncan Murdoch wrote:
| On 16/12/2016 10:40 AM, Dirk Eddelbuettel wrote:
| > On 16 December 2016 at 10:14, Duncan Murdoch wrote:
| > | On 16/12/2016 8:37 AM, Dirk Eddelbuettel wrote:
| > | >
| > | > On 16 December 2016 at 08:20, Duncan Murdoch wrote:
| > | > | Perhaps the solution is to recommend that packages which export their
| > | > | C-level entry points either guarantee them not to change or offer
| > | > | (require?) version checks by user code.  So dplyr should start out by
| > | > | saying "I'm using Rcpp interface 0.12.8".  If Rcpp has a new version
| > | > | with a compatible interface, it replies "that's fine".  If Rcpp has
| > | > | changed its interface, it says "Sorry, I don't support that any more."
Sounds good to me, I was considering something similar. dplyr can simply 
query Rcpp's current version in .onLoad(), compare it to the version at 
installation time and act accordingly.

| > | >
| > | > We try. But it's hard, and I'd argue, likely impossible.
| > | >
| > | > For example I even added a "frozen" package [1] in the sources / unit 
tests
| > | > to test for just this. In practice you just cannot hit every possible 
access
| > | > point of the (rich, in our case) API so the tests pass too often.
| > | >
| > | > Which is why we relentlessly test against reverse-depends to _at least 
ensure
| > | > buildability_ from our releases.
| >
| > I meant to also add:  "... against a large corpus of other packages."
| > The intent is to empirically answer this.
| >
| > | > As for seamless binary upgrade, I don't think in can work in practice.  
Ask
| > | > Uwe one day we he rebuilds everything every time on Windows. And for 
what it
| > | > is worth, we essentially do the same in Debian.
| > | >
| > | > Sometimes you just need to rebuild.  That may be the price of admission 
for
| > | > using the convenience of rich C++ interfaces.
| > | >
| > |
| > | Okay, so would you say that Kirill's suggestion is not overkill?  Every
| > | time package B uses LinkingTo: A, R should assume it needs to rebuild B
| > | when A is updated?
| >
| > Based on my experience is a "halting problem" -- i.e. cannot know ex ante.
| >
| > So "every time" would be overkill to me.  Sometimes you know you must
| > recompile (but try to be very prudent with public-facing API).  Many times
| > you do not. It is hard to pin down.
I'd argue that recompiling/reinstalling B is cheap enough and the safest 
option. So unless there is a risk, why not simply do it every time A 
updates? This could be implemented with a perhaps small change in R: 
When installing A, treat all packages that have A in both LinkingTo and 
Imports as dependencies that need to be reinstalled.



-Kirill

| >
| > At work we have a bunch of servers with Rcpp and many packages against them
| > (installed system-wide for all users). We _very really_ needs rebuild.

Edit:  "We _very rarely_ need rebuilds" is what was meant there.
  
| So that comes back to my suggestion:  you should provide a way for a

| dependent package to ask if your API has changed.  If you say it hasn't,
| the package is fine.  If you say it has, the package should abort,
| telling the user they need to reinstall it.  (Because it's a hard
| question to answer, you might get it wrong and say it's fine when it's
| not.  But that's easy to fix:  just make a new release that does require

Sure.

We have always increased the higher-order version number when that is needed.

One problem with your proposal is that the testing code may run after the
package load, and in the case where it matters ... that very code may not get
reached because the package didn't load.

Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] source(), parse(), and foreign UTF-8 characters

2017-05-09 Thread Kirill Müller

Hi


I'm having trouble sourcing or parsing a UTF-8 file that contains 
characters that are not representable in the current locale ("foreign 
characters") on Windows. The source() function stops with an error, the 
parse() function reencodes all foreign characters using the  
notation. I have added a reproducible example below the message.


This seems well within the bounds of documented behavior, although the 
documentation to source() could mention that the file can't contain 
foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and 
I'm willing to invest substantial time to help with that. Before 
starting to write a detailed proposal, I feel that I need a better 
understanding of the problem, and I'm grateful for any feedback you 
might have.


I have looked into character encodings in the context of the dplyr 
package, and I have observed the following behavior:


- Strings are treated preferentially in the native encoding
- Only upon specific request (via translateCharUTF8() or enc2utf8() or 
...), they are translated to UTF-8 and marked as such

- On UTF-8 systems, strings are never marked as UTF-8
- ASCII strings are marked as ASCII internally, but this information 
doesn't seem to be available, e.g., Encoding() returns "unknown" for 
such strings
- Most functions in R are encoding-agnostic: they work the same 
regardless if they receive a native or UTF-8 encoded string if they are 
properly tagged
- One important difference are symbols, which must be in the native 
encoding (and are always converted to native encoding, using  
escapes)
- I/O is centered around the native encoding, e.g., writeLines() always 
reencodes to the native encoding

- There is the "bytes" encoding which avoids reencoding.

I haven't looked into serialization or plot devices yet.

The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8 
narrow strings everywhere and convert them back and forth when using 
platform APIs that don’t support UTF-8 ...". (It is written in the 
context of the UTF-16 encoding used internally on Windows, but seems to 
apply just the same here for the native encoding.) I think that Unicode 
support in R could be greatly improved if we follow these guidelines. 
This seems to mean:


- Convert strings to UTF-8 as soon as possible, and mark them as such 
(also on systems where UTF-8 is the native encoding)
- Translate to native only upon specific request, e.g., in calls to API 
functions or perhaps for .C()

- Use UTF-8 for symbols
- Avoid the forced round-trip to the native encoding in I/O functions 
and for parsing (but still read/write native by default)

- Carefully look into serialization and plot devices
- Add helper functions that simplify mundane tasks such as 
reading/writing a UTF-8 encoded file


I'm sure I've missed many potential pitfalls, your input is greatly 
appreciated. Thanks for your attention.


Further ressources: A write-up by Prof. Ripley [2], a section in R-ints 
[3], a blog post by Ista Zahn [4], a StackOverflow search [5].



Best regards

Kirill



[1] http://utf8everywhere.org/#conclusions

[2] https://developer.r-project.org/Encodings_and_R.html

[3] 
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Encodings-for-CHARSXPs


[3] 
http://people.fas.harvard.edu/~izahn/posts/reading-data-with-non-native-encoding-in-r/


[4] 
http://stackoverflow.com/search?tab=votes&q=%5br%5d%20encoding%20windows%20is%3aquestion




# Use one of the following:
id <- "Gl\u00fcck"
id <- "\u5e78\u798f"
id <- "\u0441\u0447\u0430\u0441\u0442\u044c\u0435"
id <- "\ud589\ubcf5"

file_contents <- paste0('"', id, '"')
Encoding(file_contents)
raw_file_contents <- charToRaw(file_contents)

path <- tempfile(fileext = ".R")
writeBin(raw_file_contents, path)
file.size(path)
length(raw_file_contents)

# Escapes the string
parse(text = file_contents)

# Throws an error
print(source(path, encoding = "UTF-8"))

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] source(), parse(), and foreign UTF-8 characters

2017-05-09 Thread Kirill Müller

On 09.05.2017 13:19, Duncan Murdoch wrote:

On 09/05/2017 3:42 AM, Kirill Müller wrote:

Hi


I'm having trouble sourcing or parsing a UTF-8 file that contains
characters that are not representable in the current locale ("foreign
characters") on Windows. The source() function stops with an error, the
parse() function reencodes all foreign characters using the 
notation. I have added a reproducible example below the message.

This seems well within the bounds of documented behavior, although the
documentation to source() could mention that the file can't contain
foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and
I'm willing to invest substantial time to help with that. Before
starting to write a detailed proposal, I feel that I need a better
understanding of the problem, and I'm grateful for any feedback you
might have.

I have looked into character encodings in the context of the dplyr
package, and I have observed the following behavior:

- Strings are treated preferentially in the native encoding
- Only upon specific request (via translateCharUTF8() or enc2utf8() or
...), they are translated to UTF-8 and marked as such
- On UTF-8 systems, strings are never marked as UTF-8
- ASCII strings are marked as ASCII internally, but this information
doesn't seem to be available, e.g., Encoding() returns "unknown" for
such strings
- Most functions in R are encoding-agnostic: they work the same
regardless if they receive a native or UTF-8 encoded string if they are
properly tagged
- One important difference are symbols, which must be in the native
encoding (and are always converted to native encoding, using 
escapes)
- I/O is centered around the native encoding, e.g., writeLines() always
reencodes to the native encoding
- There is the "bytes" encoding which avoids reencoding.

I haven't looked into serialization or plot devices yet.

The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8
narrow strings everywhere and convert them back and forth when using
platform APIs that don’t support UTF-8 ...". (It is written in the
context of the UTF-16 encoding used internally on Windows, but seems to
apply just the same here for the native encoding.) I think that Unicode
support in R could be greatly improved if we follow these guidelines.
This seems to mean:

- Convert strings to UTF-8 as soon as possible, and mark them as such
(also on systems where UTF-8 is the native encoding)
- Translate to native only upon specific request, e.g., in calls to API
functions or perhaps for .C()
- Use UTF-8 for symbols
- Avoid the forced round-trip to the native encoding in I/O functions
and for parsing (but still read/write native by default)
- Carefully look into serialization and plot devices
- Add helper functions that simplify mundane tasks such as
reading/writing a UTF-8 encoded file


Those are good long term goals, though I think the effort is easier 
than you think.  Rather than attempting to do it all at once, you 
should look for ways to do it gradually and submit self-contained 
patches.  In many cases it doesn't matter if strings are left in the 
local encoding, because the encoding doesn't matter.  The problems 
arise when UTF-8 strings are converted to the local encoding before 
it's necessary, because that's a lossy conversion.  So a simple way to 
proceed is to identify where these conversions occur, and remove them 
one-by-one.
Thanks, Duncan, this looks like a good start indeed. Did you really mean 
to say "the effort is easier than I think"? It would be great if I had 
overestimated the effort, I seldom do. That said, I'd be grateful if you 
could review/integrate/... future patches of mine towards parsing and 
sourcing of UTF-8 files with foreign characters, this problem seems to 
be self-contained (but perhaps not that easy).


I still think symbols should be in UTF-8, and this change might be 
difficult to split into smaller changes, especially if taking into 
account serialization and other potential pitfalls.




Currently I'm working on bug 16098, "Windows doesn't handle high 
Unicode code points".  It doesn't require many changes at all to 
handle input of those characters; all the remaining issues are 
avoiding the problems you identify above.  The origin of the issue is 
the fact that in Windows wchar_t is only 16 bits (not big enough to 
hold all Unicode code points).  As far as I know, Windows has no 
standard type to hold a Unicode code point, most of the run-time 
functions still use the 16 bit wchar_t.

I didn't mention non-BMP characters, they are an important issue as well.



I think once that bug is dealt with, 90+% of the remaining issues 
could be solved by avoiding translateChar on Windows.  This could be 
done by avoiding it everywhere, or by acting as though Windows is 
running in a UTF-8 locale unt

[Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-05 Thread Kirill Müller

Hi


I've noted a minor inconsistency in the documentation: Current R-exts reads

s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), &ipx);

but I believe it has to be

PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx);

because PROTECT_WITH_INDEX() returns void.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-06 Thread Kirill Müller



On 06.06.2017 10:07, Martin Maechler wrote:

Kirill Müller 
 on Mon, 5 Jun 2017 17:30:20 +0200 writes:

 > Hi I've noted a minor inconsistency in the documentation:
 > Current R-exts reads

 > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), &ipx);

 > but I believe it has to be

 > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx);

 > because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.
Thanks, Martin, this sounds reasonable. I've put together a patch for 
review [1], a diff for applying to SVN (via `cat | patch -p1`) would be 
[2]. The code compiles on my system.



-Kirill


[1] https://github.com/krlmlr/r-source/pull/5/files

[2] https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff




Martin

 > Best regards
 > Kirill


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-08 Thread Kirill Müller

On 06.06.2017 22:14, Kirill Müller wrote:



On 06.06.2017 10:07, Martin Maechler wrote:

Kirill Müller 
 on Mon, 5 Jun 2017 17:30:20 +0200 writes:

 > Hi I've noted a minor inconsistency in the documentation:
 > Current R-exts reads

 > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), &ipx);

 > but I believe it has to be

 > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), &ipx);

 > because PROTECT_WITH_INDEX() returns void.

Yes indeed, thank you Kirill!

note that the same is true for its partner function|macro REPROTECT()

However, as  PROTECT() is used a gazillion times  and
PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT()
*does* return the SEXP,
I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not
behave the same as PROTECT()
(a view at the source code seems to suggest a change to be trivial).
I assume usual compiler optimization would not create less
efficient code in case the idiom   PROTECT_WITH_INDEX(s = ...)
is used, i.e., in case the return value is not used ?

Maybe this is mainly a matter of taste,  but I find the use of

SEXP s = PROTECT();

quite nice in typical cases where this appears early in a function.
Also for that reason -- but even more for consistency -- it
would also be nice if  PROTECT_WITH_INDEX()  behaved the same.
Thanks, Martin, this sounds reasonable. I've put together a patch for 
review [1], a diff for applying to SVN (via `cat | patch -p1`) would 
be [2]. The code compiles on my system.



-Kirill


[1] https://github.com/krlmlr/r-source/pull/5/files

[2] 
https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff


I forgot to mention that this patch applies cleanly to r72768.


-Kirill






Martin

 > Best regards
 > Kirill


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts

2017-06-09 Thread Kirill Müller

On 09.06.2017 13:23, Martin Maechler wrote:

Kirill Müller 
 on Thu, 8 Jun 2017 12:55:26 +0200 writes:

 > On 06.06.2017 22:14, Kirill Müller wrote:
 >>
 >>
 >> On 06.06.2017 10:07, Martin Maechler wrote:
 >>>>>>>> Kirill Müller  on
 >>>>>>>> Mon, 5 Jun 2017 17:30:20 +0200 writes:
 >>> > Hi I've noted a minor inconsistency in the
 >>> documentation: > Current R-exts reads
 >>>
 >>> > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env),
 >>> &ipx);
 >>>
 >>> > but I believe it has to be
 >>>
 >>> > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env),
 >>> &ipx);
 >>>
 >>> > because PROTECT_WITH_INDEX() returns void.
 >>>
 >>> Yes indeed, thank you Kirill!
 >>>
 >>> note that the same is true for its partner
 >>> function|macro REPROTECT()
 >>>
 >>> However, as PROTECT() is used a gazillion times and
 >>> PROTECT_WITH_INDEX() is used about 100 x less, and
 >>> PROTECT() *does* return the SEXP, I do wonder why
 >>> PROTECT_WITH_INDEX() and REPROTECT() could not behave
 >>> the same as PROTECT() (a view at the source code seems
 >>> to suggest a change to be trivial).  I assume usual
 >>> compiler optimization would not create less efficient
 >>> code in case the idiom PROTECT_WITH_INDEX(s = ...)  is
 >>> used, i.e., in case the return value is not used ?
 >>>
 >>> Maybe this is mainly a matter of taste, but I find the
 >>> use of
 >>>
 >>> SEXP s = PROTECT();
 >>>
 >>> quite nice in typical cases where this appears early in
 >>> a function.  Also for that reason -- but even more for
 >>> consistency -- it would also be nice if
 >>> PROTECT_WITH_INDEX() behaved the same.
 >> Thanks, Martin, this sounds reasonable. I've put together
 >> a patch for review [1], a diff for applying to SVN (via
 >> `cat | patch -p1`) would be [2]. The code compiles on my
 >> system.
 >>
 >>
 >> -Kirill
 >>
 >>
 >> [1] https://github.com/krlmlr/r-source/pull/5/files
 >>
 >> [2]
 >> https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff

 > I forgot to mention that this patch applies cleanly to r72768.

Thank you, Kirill.
I've been a bit busy so did not get to reply more quickly.

Just to be clear: I did not ask for a patch but was _asking_ /
requesting comments about the possibility to do that.

In the mean time, within the core team, the opinions were
mixed and costs of the change (recompilations needed, C source level
check tools would need updating / depend on R versions) are
clearly non-zero.

As a consquence, we will fix the documentation, rather than changing the API.
Thanks for looking into this. The patch was more a proof of concept, I 
don't mind throwing it away.



-Kirill

Martin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Trouble running Rtools31 on Wine

2013-11-15 Thread Kirill Müller
Hi

An attempt to use R and Rtools in Wine fails, see the bug report to Wine:

http://bugs.winehq.org/show_bug.cgi?id=34865

The people there say that Rtools uses an outdated Cygwin DLL with a 
custom patch. Is there any chance we can upgrade our Cygwin DLL to a 
supported upstream version? Thanks.


Cheers

Kirill

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-11 Thread Kirill Müller

Hi

Quite a few R packages are now available on GitHub long before they 
appear on CRAN, installation is simple thanks to 
devtools::install_github(). However, it seems to be common practice to 
keep the .Rd files (and NAMESPACE and the Collate section in the 
DESCRIPTION) in the Git tree, and to manually update it, even if they 
are autogenerated from the R code by roxygen2. This requires extra work 
for each update of the documentation and also binds package development 
to a specific version of roxygen2 (because otherwise lots of bogus 
changes can be added by roxygenizing with a different version).


What options are there to generate the .Rd files during build/install? 
In https://github.com/hadley/devtools/issues/43 the issue has been 
discussed, perhaps it can be summarized as follows:


- The devtools package is not the right place to implement 
roxygenize-before-build
- A continuous integration service would be better for that, but 
currently there's nothing that would be easy to use
- Roxygenizing via src/Makefile could work but requires further 
investigation and an installation of Rtools/xcode on Windows/OS X


Especially the last point looks interesting to me, but since this is not 
widely used there must be pitfalls I'm not aware of. The general idea 
would be:


- Place code that builds/updates the .Rd and NAMESPACE files into 
src/Makefile
- Users installing the package from source will require infrastructure 
(Rtools/make)
- For binary packages, the .Rd files are already generated and added to 
the .tar.gz during R CMD build before they are submitted to 
CRAN/WinBuilder, and they are also generated (in theory) by R CMD build 
--binary


I'd like to hear your opinion on that. I have also found a thread on 
package development workflow 
(https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but 
there was nothing on un-versioning .Rd files.



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

Gabor

I agree with you. There's Travis CI, and r-travis -- an attempt to 
integrate R package testing with Travis. Pushing back to GitHub is 
possible, but the setup is somewhat difficult. Also, this can be subject 
to race conditions because each push triggers a test run and they can 
happen in parallel even for the same repository. How do you handle branches?


It would be really great to be able to execute custom R code before 
building. Perhaps in a PreBuild: section in DESCRIPTION?



Cheers

Kirill



On 12/12/2013 02:21 AM, Gábor Csárdi wrote:

Hi,

this is maybe mostly a personal preference, but I prefer not to put
generated files in the vc repository. Changes in the generated files,
especially if there is many of them, pollute the diffs and make them
less useful.

If you really want to be able to install the package directly from
github, one solution is to
1. create another repository, that contains the complete generated
package, so that install_github() can install it.
2. set up a CI service, that can download the package from github,
build the package or the generated files (check the package, while it
is at it), and then push the build stuff back to github.
3. set up a hook on github, that invokes the CI after each commit.

I have used this setup in various projects with jenkins-ci and it
works well. Diffs are clean, the package is checked and built
frequently, and people can download it without having to install the
tools that generate the generated files.

The only downside is that you need to install a CI, so you need a
"server" for that. Maybe you can do this with travis-ci, maybe not, I
am not familiar with it that much.

Best,
Gabor

On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
 wrote:

Hi

Quite a few R packages are now available on GitHub long before they appear
on CRAN, installation is simple thanks to devtools::install_github().
However, it seems to be common practice to keep the .Rd files (and NAMESPACE
and the Collate section in the DESCRIPTION) in the Git tree, and to manually
update it, even if they are autogenerated from the R code by roxygen2. This
requires extra work for each update of the documentation and also binds
package development to a specific version of roxygen2 (because otherwise
lots of bogus changes can be added by roxygenizing with a different
version).

What options are there to generate the .Rd files during build/install? In
https://github.com/hadley/devtools/issues/43 the issue has been discussed,
perhaps it can be summarized as follows:

- The devtools package is not the right place to implement
roxygenize-before-build
- A continuous integration service would be better for that, but currently
there's nothing that would be easy to use
- Roxygenizing via src/Makefile could work but requires further
investigation and an installation of Rtools/xcode on Windows/OS X

Especially the last point looks interesting to me, but since this is not
widely used there must be pitfalls I'm not aware of. The general idea would
be:

- Place code that builds/updates the .Rd and NAMESPACE files into
src/Makefile
- Users installing the package from source will require infrastructure
(Rtools/make)
- For binary packages, the .Rd files are already generated and added to the
.tar.gz during R CMD build before they are submitted to CRAN/WinBuilder, and
they are also generated (in theory) by R CMD build --binary

I'd like to hear your opinion on that. I have also found a thread on package
development workflow
(https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but
there was nothing on un-versioning .Rd files.


Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
_
ETH Zürich
Institute for Transport Planning and Systems
HIL F 32.2
Wolfgang-Pauli-Str. 15
8093 Zürich

Phone:   +41 44 633 33 17
Fax: +41 44 633 10 57
Secretariat: +41 44 633 31 05
E-Mail:  kirill.muel...@ivt.baug.ethz.ch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller
On 12/13/2013 12:50 PM, Romain Francois wrote:
> Pushing back to github is not so difficult. See e.g 
> http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html
Thanks for the writeup, I'll try this. Perhaps it's better to push the 
results of `R CMD build`, though.
> You can manage branches easily in travis. You could for example decide 
> to do something different if you are on the master branch ...
That's right. But then no .Rd files are built when I'm on a branch, so I 
can't easily preview the result.

The ideal situation would be:

1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor 
the "Collate" section of DESCRIPTION. Machine-readable instructions on 
how to build those are provided with the package.
2. Anyone can install from GitHub using devtools::install_github(). This 
also should work for branches, forks and pull requests.
3. I can build the package so that the result can be accepted by CRAN.

The crucial point on that list is point 2, the others I can easily solve 
myself.

The way I see it, point 2 can be tackled by extending devtools or 
extending the ways packages are built. Extending devtools seems to be 
the inferior approach, although, to be honest, I'd be fine with that as 
well.


-Kirill

>
> Romain
>
> Le 13 déc. 2013 à 12:03, Kirill Müller 
>  > a écrit :
>
>> Gabor
>>
>> I agree with you. There's Travis CI, and r-travis -- an attempt to 
>> integrate R package testing with Travis. Pushing back to GitHub is 
>> possible, but the setup is somewhat difficult. Also, this can be 
>> subject to race conditions because each push triggers a test run and 
>> they can happen in parallel even for the same repository. How do you 
>> handle branches?
>>
>> It would be really great to be able to execute custom R code before 
>> building. Perhaps in a PreBuild: section in DESCRIPTION?
>>
>>
>> Cheers
>>
>> Kirill
>>
>>
>> On 12/12/2013 02:21 AM, Gábor Csárdi wrote:
>>> Hi,
>>>
>>> this is maybe mostly a personal preference, but I prefer not to put
>>> generated files in the vc repository. Changes in the generated files,
>>> especially if there is many of them, pollute the diffs and make them
>>> less useful.
>>>
>>> If you really want to be able to install the package directly from
>>> github, one solution is to
>>> 1. create another repository, that contains the complete generated
>>> package, so that install_github() can install it.
>>> 2. set up a CI service, that can download the package from github,
>>> build the package or the generated files (check the package, while it
>>> is at it), and then push the build stuff back to github.
>>> 3. set up a hook on github, that invokes the CI after each commit.
>>>
>>> I have used this setup in various projects with jenkins-ci and it
>>> works well. Diffs are clean, the package is checked and built
>>> frequently, and people can download it without having to install the
>>> tools that generate the generated files.
>>>
>>> The only downside is that you need to install a CI, so you need a
>>> "server" for that. Maybe you can do this with travis-ci, maybe not, I
>>> am not familiar with it that much.
>>>
>>> Best,
>>> Gabor
>>>
>>> On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
>>> >> > wrote:
 Hi

 Quite a few R packages are now available on GitHub long before they 
 appear
 on CRAN, installation is simple thanks to devtools::install_github().
 However, it seems to be common practice to keep the .Rd files (and 
 NAMESPACE
 and the Collate section in the DESCRIPTION) in the Git tree, and to 
 manually
 update it, even if they are autogenerated from the R code by 
 roxygen2. This
 requires extra work for each update of the documentation and also binds
 package development to a specific version of roxygen2 (because 
 otherwise
 lots of bogus changes can be added by roxygenizing with a different
 version).

 What options are there to generate the .Rd files during 
 build/install? In
 https://github.com/hadley/devtools/issues/43 the issue has been 
 discussed,
 perhaps it can be summarized as follows:

 - The devtools package is not the right place to implement
 roxygenize-before-build
 - A continuous integration service would be better for that, but 
 currently
 there's nothing that would be easy to use
 - Roxygenizing via src/Makefile could work but requires further
 investigation and an installation of Rtools/xcode on Windows/OS X

 Especially the last point looks interesting to me, but since this 
 is not
 widely used there must be pitfalls I'm not aware of. The general 
 idea would
 be:

 - Place code that builds/updates the .Rd and NAMESPACE files into
 src/Makefile
 - Users installing the package from source will require infrastructure
 (Rtools/make)
 - For binary packages

Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

Thanks a lot. This would indeed solve the problem. I'll try mkdist today ;-)

Is the NEWS file parsed before of after mkdist has been executed?

Would you be willing to share the code for the infrastructure, perhaps 
on GitHub?



-Kirill


On 12/13/2013 09:14 PM, Simon Urbanek wrote:

FWIW this is essentially what RForge.net provides. Each GitHub commit triggers a build 
(branches are supported as the branch info is passed in the WebHook) which can be either 
"classic" R CMD build or a custom shell script (hence you can do anything you 
want). The result is a tar ball (which includes the generated files) and that tar ball 
gets published in the R package repository. R CMD  check is run as well on the tar ball 
and the results are published.
This way you don't need devtools, users can simply use install.packages() 
without requiring any additional tools.

There are some talks about providing the above as a cloud service, so that 
anyone can run and/or use it.

Cheers,
Simon


On Dec 13, 2013, at 8:51 AM, Kirill Müller  
wrote:


On 12/13/2013 12:50 PM, Romain Francois wrote:

Pushing back to github is not so difficult. See e.g
http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html

Thanks for the writeup, I'll try this. Perhaps it's better to push the
results of `R CMD build`, though.

You can manage branches easily in travis. You could for example decide
to do something different if you are on the master branch ...

That's right. But then no .Rd files are built when I'm on a branch, so I
can't easily preview the result.

The ideal situation would be:

1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor
the "Collate" section of DESCRIPTION. Machine-readable instructions on
how to build those are provided with the package.
2. Anyone can install from GitHub using devtools::install_github(). This
also should work for branches, forks and pull requests.
3. I can build the package so that the result can be accepted by CRAN.

The crucial point on that list is point 2, the others I can easily solve
myself.

The way I see it, point 2 can be tackled by extending devtools or
extending the ways packages are built. Extending devtools seems to be
the inferior approach, although, to be honest, I'd be fine with that as
well.


-Kirill


Romain

Le 13 déc. 2013 à 12:03, Kirill Müller
mailto:kirill.muel...@ivt.baug.ethz.ch>> a écrit :


Gabor

I agree with you. There's Travis CI, and r-travis -- an attempt to
integrate R package testing with Travis. Pushing back to GitHub is
possible, but the setup is somewhat difficult. Also, this can be
subject to race conditions because each push triggers a test run and
they can happen in parallel even for the same repository. How do you
handle branches?

It would be really great to be able to execute custom R code before
building. Perhaps in a PreBuild: section in DESCRIPTION?


Cheers

Kirill


On 12/12/2013 02:21 AM, Gábor Csárdi wrote:

Hi,

this is maybe mostly a personal preference, but I prefer not to put
generated files in the vc repository. Changes in the generated files,
especially if there is many of them, pollute the diffs and make them
less useful.

If you really want to be able to install the package directly from
github, one solution is to
1. create another repository, that contains the complete generated
package, so that install_github() can install it.
2. set up a CI service, that can download the package from github,
build the package or the generated files (check the package, while it
is at it), and then push the build stuff back to github.
3. set up a hook on github, that invokes the CI after each commit.

I have used this setup in various projects with jenkins-ci and it
works well. Diffs are clean, the package is checked and built
frequently, and people can download it without having to install the
tools that generate the generated files.

The only downside is that you need to install a CI, so you need a
"server" for that. Maybe you can do this with travis-ci, maybe not, I
am not familiar with it that much.

Best,
Gabor

On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller
mailto:kirill.muel...@ivt.baug.ethz.ch>> wrote:

Hi

Quite a few R packages are now available on GitHub long before they
appear
on CRAN, installation is simple thanks to devtools::install_github().
However, it seems to be common practice to keep the .Rd files (and
NAMESPACE
and the Collate section in the DESCRIPTION) in the Git tree, and to
manually
update it, even if they are autogenerated from the R code by
roxygen2. This
requires extra work for each update of the documentation and also binds
package development to a specific version of roxygen2 (because
otherwise
lots of bogus changes can be added by roxygenizing with a different
version).

What options are there to generate the .Rd files during
build/install? In
https://github.com/hadley/devtools/issues/43 the issue has been
discu

Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree

2013-12-13 Thread Kirill Müller

On 12/13/2013 06:09 PM, Brian Diggs wrote:
One downside I can see with this third approach is that by making the 
package documentation generation part of the build process, you must 
then make the package depend/require roxygen (or whatever tools you 
are using to generate documentation). This dependence, though, is just 
to build the package, not to actually use the package. And by pushing 
this dependency onto the end users of the package, you have 
transferred the problem you mentioned ("... and also binds package 
development to a specific version of roxygen2 ...") to the many end 
users rather than the few developers.
That's right. As outlined in another message, roxygen2 would be required 
for building from the "raw" source (hosted on GitHub) but not for 
installing from a source tarball (which would contain the .Rd files). 
Not sure if that's possible, though.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Sweave trims console output in "tex" mode

2014-01-02 Thread Kirill Müller

Hi


In the example .Rnw file below, only the newline between c and d is 
visible in the resulting .tex file after running R CMD Sweave. What is 
the reason for this behavior? Newlines are important in LaTeX and should 
be preserved. In particular, this behavior leads to incorrect LaTeX code 
generated when using tikz(console=TRUE) inside a Sweave chunk, as shown 
in the tikzDevice vignette.


A similar question has been left unanswered before: 
https://stat.ethz.ch/pipermail/r-help/2010-June/242019.html . I am well 
aware of knitr, I'm looking for a solution for Sweave.



Cheers

Kirill


\documentclass{article}
\begin{document}
<>=
cat("a\n")
cat("b\n \n")
cat("c\nd")
@
\end{document}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in "tex" mode

2014-01-02 Thread Kirill Müller

On 01/03/2014 01:45 AM, Duncan Murdoch wrote:
You are running with the strip.white option set to TRUE.  That strips 
blank lines at then beginning and end of each output piece.  Just set 
strip.white=FALSE.
Thanks, the code below works perfectly. I have also found the 
documentation in ?RweaveLatex .


I'm not sure if the default setting is sensible for "results=tex", 
though. Has this changed in the recent past?



-Kirill


\documentclass{article}
\begin{document}
<>=
cat("a\n")
cat("b\n \n")
cat("c\nd")
@
\end{document}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in "tex" mode

2014-01-02 Thread Kirill Müller

On 01/03/2014 01:59 AM, Duncan Murdoch wrote:
But results=tex is not the default.  Having defaults for one option 
depend on the setting for another is confusing, so I think the current 
setting is appropriate. 
True. On the other hand, I cannot imagine that "results=tex" is useful 
at all without "strip.white=FALSE". If the strip.white option would 
auto-adjust, things would "just work". Anyway, I'm not a very active 
user of Sweave.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in "tex" mode

2014-01-03 Thread Kirill Müller

On 01/03/2014 02:34 AM, Duncan Murdoch wrote:

Carriage returns usually don't matter in LaTeX
I'd rather say they do. One is like a space, two or more end a paragraph 
and start a new one. If newlines are stripped away, the meaning of the 
TeX code can change, in some cases dramatically (e.g. if comments are 
written to the TeX code).


Also, I don't understand why the option is called strip.white, at least 
for results=tex. The docs say that "blank lines at the beginning and end 
of output are removed", but the observed behavior is to remove the 
terminating carriage return of the output.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in "tex" mode

2014-01-03 Thread Kirill Müller
I'm sorry, I didn't mean to be rude. Do you prefer including the entire 
original message when replying? Or perhaps I misunderstood you when you 
wrote:


> Carriage returns usually don't matter in LaTeX, so I didn't even know 
about this option, though I use results=tex quite often. I had to look 
at the source to see where the newlines were going, and saw it there.


Could you please clarify? Thanks.


-Kirill


On 01/03/2014 11:39 AM, Duncan Murdoch wrote:

It's dishonest to quote me out of context.

Duncan Murdoch

On 14-01-03 3:40 AM, Kirill Müller wrote:

On 01/03/2014 02:34 AM, Duncan Murdoch wrote:

Carriage returns usually don't matter in LaTeX

I'd rather say they do. One is like a space, two or more end a paragraph
and start a new one. If newlines are stripped away, the meaning of the
TeX code can change, in some cases dramatically (e.g. if comments are
written to the TeX code).

Also, I don't understand why the option is called strip.white, at least
for results=tex. The docs say that "blank lines at the beginning and end
of output are removed", but the observed behavior is to remove the
terminating carriage return of the output.


-Kirill





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Sweave trims console output in "tex" mode

2014-01-03 Thread Kirill Müller

On 01/03/2014 01:06 PM, Duncan Murdoch wrote:

On 14-01-03 5:47 AM, Kirill Müller wrote:

I'm sorry, I didn't mean to be rude. Do you prefer including the entire
original message when replying? Or perhaps I misunderstood you when you
wrote:


You don't need to include irrelevant material in your reply, but you 
should include explanatory material when you are arguing about a 
particular claim.  If you aren't sure whether it is relevant or not, 
then you should probably ask for clarification rather than arguing 
with the claim.


Thanks. In the future, I'll quote at least full sentences and everything 
they refer to, to avoid confusion and make sure that context is maintained.


  > Carriage returns usually don't matter in LaTeX, so I didn't even 
know

about this option, though I use results=tex quite often. I had to look
at the source to see where the newlines were going, and saw it there.

Could you please clarify? Thanks.


Single carriage returns are usually equivalent to spaces. Multiple 
carriage returns separate paragraphs, but they are rare in code chunk 
output in my Sweave usage.  I normally put plain text in the LaTeX 
part of the Sweave document.


Indeed, it only makes a difference for code that generates large 
portions of LaTeX (such as tikzDevice).
I have checked my own .Rnw files, and I have used results=tex about 
600 times, but never used strip.white.


I've also looked at the .Rnw files in CRAN packages, and 
strip.white=true and strip.white=all are used there about 140 times, 
but strip.white=false is only used 10 times.  I think only one package 
(SweaveListingUtils) uses strip.white=false in combination with 
results=tex.


So while I agree Martin's "adaptive" option would have been a better 
default than "true", I think it would be more likely to cause trouble 
than to solve it.


I agree, given this data and considering that trimming the terminal 
newline can be considered a feature. Perhaps comments are the only use 
case where the newline is really important. But then I don't see how to 
reliably detect comments, as the catcode for % can be changed, e.g., in 
a verbatim environment. I'll consider printing a \relax after the 
comment in tikzDevice, this should be robust and sufficient.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] file.exists does not like path names ending in /

2014-01-17 Thread Kirill Müller

On 01/17/2014 02:56 PM, Gabor Grothendieck wrote:

At the moment I am using this to avoid the
problem:

File.exists <- function(x) {
if (.Platform$OS == "windows" && grepl("[/\\]$", x)) {
file.exists(dirname(x))
} else file.exists(x)
}

but it would be nice if that could be done by file.exists itself.
I think that ignoring a terminal slash/backslash on Windows would do no 
harm: It would improve consistency between platforms, and perhaps nobody 
really relies on the current behavior. Would shorten the documentation, too.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] file.exists does not like path names ending in /

2014-01-17 Thread Kirill Müller

On 01/17/2014 07:35 PM, William Dunlap wrote:

I think that ignoring a terminal slash/backslash on Windows would do no
>harm:

Windows makes a distinction between "C:" and "C:/": the former is
not a file (or directory) and the latter is.
But, according to the documentation, neither would be currently detected 
by file.exists, while the latter is a directory, as you said, and should 
be detected as such.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] $new cannot be accessed when running from Rscript and methods package is not loaded

2014-02-10 Thread Kirill Müller

Hi


Accesses the $new method for a class defined in a package fails if the 
methods package is not loaded. I have created a test package with the 
following single code file:


newTest <- function() {
  cl <- get("someClass")
  cl$new
}

someClass <- setRefClass("someClass")

(This is similar to code actually used in the testthat package.)

If methods is not loaded, executing the newTest function fails in the 
following scenarios:


- Package "depends" on methods (scenario "depend")
- Package "imports" methods and imports either the setRefClass function 
(scenario "import-setRefClass") or the whole package (scenario 
"import-methods")


It succeeds if the newTest function calls require(methods) (scenario 
"require").


The script at 
https://raw2.github.com/krlmlr/methodsTest/master/test-all.sh creates an 
empty user library in subdirectory r-lib of the current directory, 
installs devtools, and tests the four scenarios by repeatedly installing 
the corresponding version of the package and trying to execute newTest() 
from Rscript. I have attached the output. The package itself is on 
GitHub: https://github.com/krlmlr/methodsTest , there is a branch for 
each scenario.


Why does it seem to be necessary to load the methods package here?


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] $new cannot be accessed when running from Rscript and methods package is not loaded

2014-02-10 Thread Kirill Müller

On 02/11/2014 03:22 AM, Peter Meilstrup wrote:

Because "depends" is treated incorrectly (if I may place a value
judgement on it). I had an earlier thread on this, not sure if any
changes have taken place since then:

http://r.789695.n4.nabble.com/Dependencies-of-Imports-not-attached-td4666529.html

Peter


Thanks. Could you please clarify: The thread you mention refers to a 
scenario where a package uses another package that depends on methods. 
The issue I'm describing doesn't have this, there is only a single 
package that tries to use $new and fails. ?


On that note: A related discussion on R-devel advises depending on 
methods, but this doesn't seem to be enough in this case:


http://r.789695.n4.nabble.com/advise-on-Depends-td4678930.html


-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Detect a terminated pipe

2014-03-14 Thread Kirill Müller

Hi

Is there a way to detect that the process that corresponds to a pipe has 
ended? On my system (Ubuntu 13.04), I see


> p <- pipe("true", "w"); Sys.sleep(1); system("ps -elf | grep true | 
grep -v grep"); isOpen(p)

[1] TRUE

The "true" process has long ended (as the filtered ps system call emits 
no output), still R believes that the pipe p is open.


Thanks for your input.


Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Detect a terminated pipe

2014-03-14 Thread Kirill Müller

On 03/14/2014 03:54 PM, Simon Urbanek wrote:

As far as R is concerned, the connection is open. In addition, pipes exist even 
without the process - you can close one end of a pipe and it will still exist 
(that’s what makes pipes useful, actually, because you can choose to close 
arbitrary combination of the R/W ends). Detecting that the other end of the 
pipe has closed is generally done by sending/receiving data to/from the end of 
interest - i.e. reading from a pipe that has closed the write end on the other 
side will yield 0 bytes read. Writing to a pipe that has closed the read end on 
the other side will yield SIGPIPE error (note that for text connections you 
have to call flush() to send the buffer):


>p=pipe("true","r")
>readLines(p)

character(0)

>close(p)
>p=pipe("true","w")
>writeLines("", p)
>flush(p)

Error in flush.connection(p) : ignoring SIGPIPE signal

>close(p)
Thanks for your reply. I tried this in an R console and received the 
error, just like you described. Unfortunately, the error is not thrown 
when trying the same in RStudio. Any ideas?



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Deep copy of factor levels?

2014-03-17 Thread Kirill Müller

Hi


It seems that selecting an element of a factor will copy its levels 
(Ubuntu 13.04, R 3.0.2). Below is the output of a script that creates a 
factor with 1 elements and then calls as.list() on it. The new 
object seems to use more than 700 MB, and inspection of the levels of 
the individual elements of the list suggest that they are distinct objects.


Perhaps some performance gain could be achieved by copying the levels 
"by reference", but I don't know R internals well enough to see if it's 
possible. Is there a particular reason for creating a full copy of the 
factor levels?


This has come up when looking at the performance of rbind.fill (in the 
plyr package) with factors: https://github.com/hadley/plyr/issues/206 .



Best regards

Kirill



> gc()
  used (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells  325977 17.51074393  57.4  10049951  536.8
Vcells 4617168 35.3   87439742 667.2 204862160 1563.0
> system.time(x <- factor(seq_len(1e4)))
   user  system elapsed
  0.008   0.000   0.007
> system.time(xx <- as.list(x))
   user  system elapsed
  4.263   0.000   4.322
> gc()
used  (Mb) gc trigger  (Mb)  max used   (Mb)
Ncells385991  20.71074393  57.4  10049951  536.8
Vcells 104672187 798.6  112367694 857.3 204862160 1563.0
> .Internal(inspect(levels(xx[[1]])))
@387f620 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0)
  @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
  @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
  @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
  @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
  @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
  ...
> .Internal(inspect(levels(xx[[2]])))
@1b38cb90 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0)
  @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "1"
  @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "2"
  @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "3"
  @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "4"
  @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] "5"
  ...

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Docker versus Vagrant for reproducability - was: The case for freezing CRAN

2014-03-22 Thread Kirill Müller


On 03/22/2014 02:10 PM, Nathaniel Smith wrote:

On 22 Mar 2014 12:38, "Philippe GROSJEAN" 
wrote:

On 21 Mar 2014, at 20:21, Gábor Csárdi  wrote:

In my opinion it is somewhat cumbersome to use this for everyday work,
although good virtualization software definitely helps.

Gabor


Additional info: you access R into the VM from within the host by ssh.

You can enable x11 forwarding there and you also got GUI stuff. It works
like a charm, but there are still some problems on my side when I try to
disconnect and reconnect to the same R process. I can solve this with, say,
screen. However, if any X11 window is displayed while I disconnect, R
crashes immediately on reconnection.

You might find the program 'xpra' useful. It's like screen, but for x11
programs.

-n
I second that. However, by default, xpra and GNU Screen are not aware of 
each other. To connect to xpra from within GNU Screen, you usually need 
to set the DISPLAY environment variable manually. I have described a 
solution that automates this, so that GUI applications "just work" from 
within GNU Screen and also survive a disconnect: 
http://krlmlr.github.io/integrating-xpra-with-screen/ .



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-03-26 Thread Kirill Müller

Dear list


It is possible to store expected output for tests and examples. From the 
manual: "If tests has a subdirectory Examples containing a file 
pkg-Ex.Rout.save, this is compared to the output file for running the 
examples when the latter are checked." And, earlier (written in the 
context of test output, but apparently applies here as well): "..., 
these two are compared, with differences being reported but not causing 
an error."


I think a NOTE would be appropriate here, in order to be able to detect 
this by only looking at the summary. Is there a reason for not flagging 
differences here?


The following is slightly related: Some compilers and static code 
analysis tools assign a numeric code to each type of error or warning 
they check for, and print it. Would that be possible to do for the 
anomalies detected by R CMD check? The most significant digit could 
denote the "severity" of the NOTE, WARNING or ERROR. This would further 
simplify (semi-)automated analysis of the output of R CMD check, e.g. in 
the context of automated tests.



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs

2014-04-10 Thread Kirill Müller


On 03/26/2014 06:46 PM, Paul Gilbert wrote:



On 03/26/2014 04:58 AM, Kirill Müller wrote:

Dear list


It is possible to store expected output for tests and examples. From the
manual: "If tests has a subdirectory Examples containing a file
pkg-Ex.Rout.save, this is compared to the output file for running the
examples when the latter are checked." And, earlier (written in the
context of test output, but apparently applies here as well): "...,
these two are compared, with differences being reported but not causing
an error."

I think a NOTE would be appropriate here, in order to be able to detect
this by only looking at the summary. Is there a reason for not flagging
differences here?


The problem is that differences occur too often because this is a 
comparison of characters in the output files (a diff). Any output that 
is affected by locale, node name or Internet downloads, time, host, or 
OS, is likely to cause a difference. Also, if you print results to a 
high precision you will get differences on different systems, 
depending on OS, 32 vs 64 bit, numerical libraries, etc. A better test 
strategy when it is numerical results that you want to compare is to 
do a numerical comparison and throw an error if the result is not 
good, something like


  r <- result from your function
  rGood <- known good value
  fuzz <- 1e-12  #tolerance

  if (fuzz < max(abs(r - rGood))) stop('Test xxx failed.')

It is more work to set up, but the maintenance will be less, 
especially when you consider that your tests need to run on different 
OSes on CRAN.


You can also use try() and catch error codes if you want to check those.



Thanks for your input.

To me, this is a different kind of test, for which I'd rather use the 
facilities provided by the testthat package. Imagine a function that 
operates on, say, strings, vectors, or data frames, and that is expected 
to produce completely identical results on all platforms -- here, a 
character-by-character comparison of the output is appropriate, and I'd 
rather see a WARNING or ERROR if something fails.


Perhaps this functionality can be provided by external packages like 
roxygen and testthat: roxygen could create the "good" output (if asked 
for) and set up a testthat test that compares the example run with the 
"good" output. This would duplicate part of the work already done by 
base R; the duplication could be avoided if there was a way to specify 
the severity of a character-level difference between output and expected 
output, perhaps by means of an .Rout.cfg file in DCF format:


OnDifference: mute|note|warning|error
Normalize: [R expression]
Fuzziness: [number of different lines that are tolerated]

On that note: Is there a convenient way to create the .Rout.save files 
in base R? By "convenient" I mean a single function call, not checking 
and manually copying as suggested here: 
https://stat.ethz.ch/pipermail/r-help/2004-November/060310.html .



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] UTC time zone on Windows

2014-08-06 Thread Kirill Müller

Hi


I'm having trouble running R CMD build and check with UTC time zone 
setting in Windows Server 2012. I can't seem to get rid of the following 
warning:


  unable to identify current timezone 'C':
please set environment variable 'TZ'

However, setting TZ to either "Europe/London" or "GMT Standard Time" 
didn't help.


It seems to me that the warning originates in registryTZ.c 
(https://github.com/wch/r-source/blob/776708efe6003e36f02587ad47b2e19e2f69/src/extra/tzone/registryTZ.c#L363). 
I have therefore looked at 
HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation, to learn that 
TimeZoneKeyName is set to "UTC". This time zone is not defined in 
TZtable, but is present in this machine's 
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones. (Also, the 
text of the warning permits the possibility that only the first 
character of the time zone is used for the warning message -- in the 
code, a const wchar_t* is used for a %s placeholder.)


Below is a link to the log of such a failing run. The first 124 lines 
are registry dumps, output of R CMD * is near the end of the log at 
lines 212 and 224.


https://ci.appveyor.com/project/krlmlr/r-appveyor/build/1.0.36

This happens with R 3.1.1 and R-devel r66309.

Is there a workaround I have missed, short of updating TZtable? How can 
I help updating TZtable? Thanks!



Cheers

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request to review a patch for rpart

2014-08-13 Thread Kirill Müller

Dear list


For my work, it would be helpful if rpart worked seamlessly with an 
empty model:


library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10)))

Currently, an unrelated error (originating from na.rpart) is thrown.

At some point in the near future, I'd like to release a package to CRAN 
which uses rpart and relies on that functionality. I have prepared a 
patch (minor modifications at three places, and a test) which I'd like 
to propose for inclusion in the next CRAN release of rpart. The patch 
can be reviewed at https://github.com/krlmlr/rpart/tree/empty-model, the 
files (based on the current CRAN release 4.1-8) can be downloaded from 
https://github.com/krlmlr/rpart/archive/empty-model.zip.


Thanks for your attention.


With kindest regards

Kirill Müller

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request to review a patch for rpart

2014-08-15 Thread Kirill Müller

Gabriel


Thanks for your feedback. Indeed, I was not particularly clear here. The 
empty model is just a very special case in a more general setting. I'd 
have to work around this deficiency in my code -- sure I can do that, 
but I thought a generic solution should be possible. In particular, I'm 
using predict.rpart(..., type = "prob") -- this just reflects the 
observed relative frequencies.



Cheers

Kirill


On 08/15/2014 06:44 PM, Gabriel Becker wrote:

Kirill,

Perhaps I'm just being obtuse, but what are you proposing rpart do in 
the case of an empty model?  Return a "tree" that always guesses the 
most common label, or doesn't guess at all (NA)? It doesn't seem like 
you'd need rpart for either of those.


~G


On Wed, Aug 13, 2014 at 3:51 AM, Kirill Müller 
<mailto:kirill.muel...@ivt.baug.ethz.ch>> wrote:


Dear list


For my work, it would be helpful if rpart worked seamlessly with
an empty model:

library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10)))

Currently, an unrelated error (originating from na.rpart) is thrown.

At some point in the near future, I'd like to release a package to
CRAN which uses rpart and relies on that functionality. I have
prepared a patch (minor modifications at three places, and a test)
which I'd like to propose for inclusion in the next CRAN release
of rpart. The patch can be reviewed at
https://github.com/krlmlr/rpart/tree/empty-model, the files (based
on the current CRAN release 4.1-8) can be downloaded from
https://github.com/krlmlr/rpart/archive/empty-model.zip.

    Thanks for your attention.


With kindest regards

Kirill Müller

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to test impact of candidate changes to package?

2014-09-10 Thread Kirill Müller
If you don't intend to keep the old business logic in the long run, 
perhaps a version control system such as Git can help you. If you use it 
in single-user mode, you can think of it as a backup system where you 
manually create each snapshot and give it a name, but it actually can do 
much more. For your use case, you can open a new *branch* where you 
implement your changes, and implement your testing logic simultaneously 
in both branches (using *merge* operations). The system handles 
switching between branches, so you can really perform invasive changes, 
and revert if you find that a particular change breaks something.


RStudio has Git support, but you probably need to use the shell to 
create a branch. On Windows or OS X the GitHub client helps you to get 
started.



Cheers

Kirill


On 09/10/2014 11:14 AM, Stephanie Locke wrote:

I have unit tests using testthat but these are typically of these types:
1) Check for correct calculation for a single set of valid inputs
2) Check for correct calculation for a larger set of valid inputs
3) Check for errors when providing incorrect inputs
4) Check for known frailties / past issues

This is more for where changes are needed to functions that apply various bits of 
business logic that can change over time, so there is no "one answer". A unit 
test (at least as I understand it) can be worked through to make sure that given inputs, 
the output is computationally correct. What I'd like to do is overall the impact of a 
potential change by testing version 1 of a function in a package for a sample, then test 
version 2 of a function in a package for a sample and compare the results.

My difficulties encountered so far is I'm reluctantly to manually do this 
change invasively by overwriting the relevant files in the R directory, and 
then say using devtools to load it and test it with testthat as I risk 
producing incorrect states of my package and potentially releasing the wrong 
thing.  My preference would be a non-invasive method.  Currently, where I'm 
trying to do this non-invasively I source a new version of the function stored 
in a separate directory, but some of the functions dependent on it continue to 
reference to the package version of the functions, this means that when I'm 
doing test #2 I have to load lots more functions and hope I've caught them all 
(or do some sort of dependency hunting programmatically).

I may be missing something about testthat, but what I'm doing now seems to be 
nowhere near optimal and I'd love to have a better solution.

Cheers

Stephanie Locke
BI & Credit Risk Analyst

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be]
Sent: 10 September 2014 09:30
To: Stephanie Locke; r-devel@r-project.org
Subject: RE: How to test impact of candidate changes to package?

Dear Stephanie,

Have a look at the testthat package and the related article in the R Journal.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team 
Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 
25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] 
Namens Stephanie Locke
Verzonden: woensdag 10 september 2014 9:55
Aan: r-devel@r-project.org
Onderwerp: [Rd] How to test impact of candidate changes to package?

I use a package to contain simple functions that can be handled by unit tests 
for correctness and more complex functions that combine the simple functions 
with business logic.  Where there are proposals to change either the simple 
functions or the business logic, a sample needs to be run before the change and 
then after it to understand the impact of the change.

I do this currently by
1. Using Rmarkdown documents
2. Loading the package as-is
3. Getting my sample
4. Running my sample through the package as-is and outputting table of results 
5. sourceing new copies of functions 6. Running my sample again and outputting 
table of results 7. Reloading package and sourceing different copies of 
functions as required

I really don't think this is a good way to do this as it risks missing 
downstream dependencies of the functions I'm trying to load into the global 
namespace to test.

Has anyone else had to do this sort of testing before on their packages? How 
did you do it? Am I missing an obvious package / framework that can do this?

Cheers,
Steph

--
Stepha

[Rd] xtabs and NA

2015-02-09 Thread Kirill Müller

Hi


I haven't found a way to produce a tabulation from factor data with NA 
values using xtabs. Please find a minimal example below, it's also on 
R-pubs [1]. Tested with R 3.1.2 and R-devel r67720.


It doesn't seem to be documented explicitly that it's not supported. 
From reading the code [2] it looks like the relevant call to table() 
doesn't set the "useNA" parameter, which I think is necessary to make 
NAs show up in the result.


Am I missing anything? If this a bug -- would a patch be welcome? Do we 
need compatibility with the current behavior?


I'm aware of workarounds, I just prefer xtabs() over table() for its 
interface.


Thanks.


Best regards

Kirill


[1] http://rpubs.com/krlmlr/xtabs-NA
[2] 
https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/library/stats/R/xtabs.R#L60



data <- factor(letters[1:4], levels = letters[1:3])
data
## [1] abc
## Levels: a b c
xtabs(~data)
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass)
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass, exclude = numeric())
## data
## a b c
## 1 1 1
xtabs(~data, na.action = na.pass, exclude = NULL)
## data
## a b c
## 1 1 1
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
##  [3] LC_TIME=de_CH.UTF-8LC_COLLATE=en_US.UTF-8
##  [5] LC_MONETARY=de_CH.UTF-8LC_MESSAGES=en_US.UTF-8
##  [7] LC_PAPER=de_CH.UTF-8   LC_NAME=C
##  [9] LC_ADDRESS=C   LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics  grDevices utils datasets  methods base
##
## other attached packages:
## [1] magrittr_1.5ProjectTemplate_0.6-1.0
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.8evaluate_0.5.7  formatR_1.0.3 htmltools_0.2.6
## [5] knitr_1.9.2 rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.2
## [9] ulimit_0.0-2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xtabs and NA

2015-02-13 Thread Kirill Müller

On 09.02.2015 16:59, Gabor Grothendieck wrote:

On Mon, Feb 9, 2015 at 8:52 AM, Kirill Müller
 wrote:
Passing table the output of model.frame would still allow the use of a 
formula interface:

mf <- model.frame( ~ data, na.action = na.pass)
do.call("table", c(mf, useNA = "ifany"))

abc 
1111


Fair enough, this qualifies as a workaround, and IMO this is how xtabs 
should handle it internally to allow writing xtabs(~data, na.action = 
na.pass) -- or at least xtabs(~data, na.action = na.pass, exclude = 
NULL) if backward compatibility is desired. Would anyone with write 
access to R's SVN repo care enough about this situation to review a 
patch? Thanks.



-Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] static pdf vignette

2015-02-27 Thread Kirill Müller

Perhaps the R.rsp package by Henrik Bengtsson [1,2] is an option.


Cheers

Kirill


[1] http://cran.r-project.org/web/packages/R.rsp/index.html
[2] https://github.com/HenrikBengtsson/R.rsp


On 27.02.2015 02:44, Wang, Zhu wrote:

Dear all,

In my package I have a computational expensive Rnw file which can't pass R CMD 
check. Therefore I set eval=FALSE in the Rnw file. But I would like to have the 
pdf vignette generated by the Rnw file with eval=TRUE. It seems to me a static 
pdf vignette is an option.  Any suggestions on this?

Thanks,

Zhu Wang


**Connecticut Children's Confidentiality Notice**

This e-mail message, including any attachments, is for...{{dropped:6}}


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] URL checks

2021-01-07 Thread Kirill Müller via R-devel

Hi


The URL checks in R CMD check test all links in the README and vignettes 
for broken or redirected links. In many cases this improves 
documentation, I see problems with this approach which I have detailed 
below.


I'm writing to this mailing list because I think the change needs to 
happen in R's check routines. I propose to introduce an "allow-list" for 
URLs, to reduce the burden on both CRAN and package maintainers.


Comments are greatly appreciated.


Best regards

Kirill


# Problems with the detection of broken/redirected URLs

## 301 should often be 307, how to change?

Many web sites use a 301 redirection code that probably should be a 307. 
For example, https://www.oracle.com and https://www.oracle.com/ both 
redirect to https://www.oracle.com/index.html with a 301. I suspect the 
company still wants oracle.com to be recognized as the primary entry 
point of their web presence (to reserve the right to move the 
redirection to a different location later), I haven't checked with their 
PR department though. If that's true, the redirect probably should be a 
307, which should be fixed by their IT department which I haven't 
contacted yet either.


$ curl -i https://www.oracle.com
HTTP/2 301
server: AkamaiGHost
content-length: 0
location: https://www.oracle.com/index.html
...

## User agent detection

twitter.com responds with a 400 error for requests without a user agent 
string hinting at an accepted browser.


$ curl -i https://twitter.com/
HTTP/2 400
...
...Please switch to a supported browser..

$ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1

HTTP/2 200

# Impact

While the latter problem *could* be fixed by supplying a browser-like 
user agent string, the former problem is virtually unfixable -- so many 
web sites should use 307 instead of 301 but don't. The above list is 
also incomplete -- think of unreliable links, HTTP links, other failure 
modes...


This affects me as a package maintainer, I have the choice to either 
change the links to incorrect versions, or remove them altogether.


I can also choose to explain each broken link to CRAN, this subjects the 
team to undue burden I think. Submitting a package with NOTEs delays the 
release for a package which I must release very soon to avoid having it 
pulled from CRAN, I'd rather not risk that -- hence I need to remove the 
link and put it back later.


I'm aware of https://github.com/r-lib/urlchecker, this alleviates the 
problem but ultimately doesn't solve it.


# Proposed solution

## Allow-list

A file inst/URL that lists all URLs where failures are allowed -- 
possibly with a list of the HTTP codes accepted for that link.


Example:

https://oracle.com/ 301
https://twitter.com/drob/status/1224851726068527106 400

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] URL checks

2021-01-07 Thread Kirill Müller via R-devel
One other failure mode: SSL certificates trusted by browsers that are 
not installed on the check machine, e.g. the "GEANT Vereniging" 
certificate from https://relational.fit.cvut.cz/ .



K


On 07.01.21 12:14, Kirill Müller via R-devel wrote:

Hi


The URL checks in R CMD check test all links in the README and 
vignettes for broken or redirected links. In many cases this improves 
documentation, I see problems with this approach which I have detailed 
below.


I'm writing to this mailing list because I think the change needs to 
happen in R's check routines. I propose to introduce an "allow-list" 
for URLs, to reduce the burden on both CRAN and package maintainers.


Comments are greatly appreciated.


Best regards

Kirill


# Problems with the detection of broken/redirected URLs

## 301 should often be 307, how to change?

Many web sites use a 301 redirection code that probably should be a 
307. For example, https://www.oracle.com and https://www.oracle.com/ 
both redirect to https://www.oracle.com/index.html with a 301. I 
suspect the company still wants oracle.com to be recognized as the 
primary entry point of their web presence (to reserve the right to 
move the redirection to a different location later), I haven't checked 
with their PR department though. If that's true, the redirect probably 
should be a 307, which should be fixed by their IT department which I 
haven't contacted yet either.


$ curl -i https://www.oracle.com
HTTP/2 301
server: AkamaiGHost
content-length: 0
location: https://www.oracle.com/index.html
...

## User agent detection

twitter.com responds with a 400 error for requests without a user 
agent string hinting at an accepted browser.


$ curl -i https://twitter.com/
HTTP/2 400
...
...Please switch to a supported browser..

$ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1

HTTP/2 200

# Impact

While the latter problem *could* be fixed by supplying a browser-like 
user agent string, the former problem is virtually unfixable -- so 
many web sites should use 307 instead of 301 but don't. The above list 
is also incomplete -- think of unreliable links, HTTP links, other 
failure modes...


This affects me as a package maintainer, I have the choice to either 
change the links to incorrect versions, or remove them altogether.


I can also choose to explain each broken link to CRAN, this subjects 
the team to undue burden I think. Submitting a package with NOTEs 
delays the release for a package which I must release very soon to 
avoid having it pulled from CRAN, I'd rather not risk that -- hence I 
need to remove the link and put it back later.


I'm aware of https://github.com/r-lib/urlchecker, this alleviates the 
problem but ultimately doesn't solve it.


# Proposed solution

## Allow-list

A file inst/URL that lists all URLs where failures are allowed -- 
possibly with a list of the HTTP codes accepted for that link.


Example:

https://oracle.com/ 301
https://twitter.com/drob/status/1224851726068527106 400

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel