[Rd] plot.POSIXct uses wrong x axis (PR#14016)

2009-10-20 Thread karl
Full_Name: Karl Ove Hufthammer
Version: 2.10.0 beta
OS: Windows
Submission from: (NULL) (93.124.134.66)


When plotting a single POSIXct variable, 'plot' uses a nonsensical x axis. Here
is some example code:

set.seed(1)
x=seq(1,1e8,length=100)+round(runif(100)*1e8)
y=as.POSIXct(x,origin="2001-01-01")
plot(y)

The y axis correctly shows appropriate labels (years 2002 to 2006), but the x
axis show the single time '59:58' in the lower left corner.

Expected behaviour: The indices should be shown on the x axis, just like for
plot(x), where x is the x variable in the above example code.

Additional notes: While ?plot.POSIXct does not explicitly say that the second
variable ('y') is optional, the help for the generic, ?plot, does. And it seems
reasonable that it should be. Also plot(POSIXct.variable) does produce a
'correct' plot, except for the labels on the x axis.

Output of sessionInfo():

R version 2.10.0 beta (2009-10-17 r50136) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.10.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R on Windows crashes when using certain characters in strings in data frames (PR#14125)

2009-12-10 Thread karl
Full_Name: Karl Ove Hufthammer
Version: 2.10.0
OS: Windows XP
Submission from: (NULL) (93.124.134.66)


I have found a rather strange bug in R 2.10.0 on Windows, where the choice of
characters used in a string make R crash (i.e., Windows shows a dialogue saying
that the application has a problem, and must be closed).

I can reproduce the bug on two separate systems running Windows XP, and with
both R 2.10.0 and the latest R.2.10.1 RC.

The following commands trigger the crash for me:

n=1e5
k=10
x=sample(k,n,replace=TRUE)
y=sample(k,n,replace=TRUE)
xy=paste(x,y,sep=" × ")
z=sample(n)
d=data.frame(xy,z)

The last step takes very long time, and R crashes before it's finished. Note
that if I reduce n, the problem disappears. Also, if I change the × (a
multiplication symbol) to a x (a letter), the problem also disappears (and the
last command takes almost no time to run).

I originally discovered this (or a related?) bug while using 'unique' on a data
frame similar to the 'd' data frame defined above, where R would often, but not
always, crash. 

> sessionInfo()
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
[2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
[3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] segfault on functions with 'source' attribute set to a boolean or a number (PR#10437)

2007-11-19 Thread karl
Full_Name: Karl Ove Hufthammer
Version: 2.6.0
OS: Linux (Fedora 7)
Submission from: (NULL) (129.177.61.84)


When viewing a function that has its 'source' attribute set to a boolean or a
numeric, R crashes with a segfault. (Setting 'source' to a character vector does
not make R crash, however.)

Steps to reproduce:

> attr(lm,"source")=FALSE
> lm

 *** caught segfault ***
address 0x18, cause 'memory not mapped'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (PR#10437) segfault on functions with 'source' attribute

2007-11-21 Thread karl
For the record: The reason I used attr(myfun, "source") = FALSE, is that I 
misread the example 'Tidying R Code' in 'Writing R Extensions', which calls 
for attr(myfun, "source") = NULL.

Somehow setting 'source' to FALSE seems more natural to me than 
setting it to NULL.

[EMAIL PROTECTED]:

> I am not sure why you would want to do that, but the C code does assume
> source attributes were put there by R, and changing tests from !isNull to
> isString in a few places will fix that.

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] colnames(tapply(...)) (PR#8539)

2006-01-29 Thread karl . thomaseth
I would like to bring to your attention the following error message
which didn't appear on previous versions (long time ago?)

Thanks for all your effort

Karl

Version 2.2.1 Patched (2006-01-21 r37153)

 > f <- rep(c(1,2),each=5)
 > x <- tapply(f,f,sum)
 > colnames(x)
Error in dn[[2]] : subscript out of bounds


---
Karl Thomaseth, Ph.D.
Research Director
National Research Council
Institute of Biomedical Engineering ISIB-CNR
Corso Stati Uniti 4
35127 Padova, ITALY
http://www.isib.cnr.it/~karl/
tel.: (+39) 049 8295762,  fax:  (+39) 049 8295763


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: warning or error upon type/storage mode coercion?

2010-09-15 Thread Karl Forner
-- Forwarded message --
From: Karl Forner 
Date: Wed, Sep 15, 2010 at 10:14 AM
Subject: Re: [Rd] warning or error upon type/storage mode coercion?
To: Stefan Evert 


I'm a Perl fan, and I really really miss the "use strict" feature. IMHO it's
very error-prone not to have thios safety net.

Best,



On Wed, Sep 15, 2010 at 9:54 AM, Stefan Evert wrote:

>
> On 15 Sep 2010, at 03:23, Benjamin Tyner wrote:
>
> > 2. So, assuming the answer to (1) is a resounding "no", does anyone care
> to state an opinion regarding the philosophical or historical rationale for
> why this is the case in R/S, whereas certain other interpreted languages
> offer the option to perform strict type checking? Basically, I'm trying to
> explain to someone from a perl background why the (apparent) lack of a "use
> strict; use warnings;" equivalent is not a hindrance to writing bullet-proof
> R code.
>
> If they're from a Perl background, you might also want to point out to them
> that (base) Perl doesn't do _any_ type checking at all, and converts types
> as needed.  As in ...
>
> $x = "0.0";
> if ($x) ... # true
> if ($x+0) ... # false
>
> AFAIK, that's one of the main complaints that people have about Perl.  "use
> strict" will just make sure that all variables have to be declared before
> they're used, so you can't mess up by mistyping variable names.  Which is
> something I'd very much like to have in R occasionally ...
>
> Best,
> Stefan
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best way to manage configuration for openMP support

2010-09-15 Thread Karl Forner
Thanks a lot, I have implemented the configure stuff and it works perfectly
!!
Exactly what I was looking for.

I just added AC_PREREQ([2.62]) because the AC_OPENMP was only supported from
this version, and
 AC_MSG_WARN([NO OpenMP support detected. You should should use gcc >= 4.2
!!!])
when no openmp support was detected.

Maybe this could be put into the Writing R Extensions manual.

Thanks again,

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible bug or annoyance with library.dynam.unload()

2010-09-16 Thread Karl Forner
Hello,

I have a package with a namespace. Because I use Roxygen that overwrites the
NAMESPACE file each time it is run, I use a R/zzz.R file with
an .onLoad() and .onUnload() functions to take care of loading and unloading
my shared library.

The problem: if I load my library from a local directory, then the unloading
of the package fails, e.g:

# loads fine
>library(Foo, lib.loc=".Rcheck")

>unloadNamespace("Foo")
Warning message:
.onUnload failed in unloadNamespace() for 'Foo', details:
  call: library.dynam.unload("Foo", libpath)
  error: shared library 'Foo' was not loaded

# I traced it a little:
>library.dynam.unload("Foo", ".Rcheck/Foo")
Error in library.dynam.unload("Foo", ".Rcheck/Foo") :
  shared library 'Foo' was not loaded

# using an absolute path works
>library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo")


So from what I understand, the problem is either that the relative libpath
is sent to the .onUnload() function instead of the absolute one,
or that library.dynam.unload() should be modified to handle the relative
paths.

Am I missing something ? What should I do ?

Thanks,


Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug or annoyance with library.dynam.unload()

2010-09-21 Thread Karl Forner
Hello,

I got no reply on this issue.
It is not critical and I could think of work-around, but it really looks
like a bug to me.
Should I file a bug-report instead of posting in this list ?

Thanks,

Karl

On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner  wrote:

> Hello,
>
> I have a package with a namespace. Because I use Roxygen that overwrites
> the NAMESPACE file each time it is run, I use a R/zzz.R file with
> an .onLoad() and .onUnload() functions to take care of loading and
> unloading my shared library.
>
> The problem: if I load my library from a local directory, then the
> unloading of the package fails, e.g:
>
> # loads fine
> >library(Foo, lib.loc=".Rcheck")
>
> >unloadNamespace("Foo")
> Warning message:
> .onUnload failed in unloadNamespace() for 'Foo', details:
>   call: library.dynam.unload("Foo", libpath)
>   error: shared library 'Foo' was not loaded
>
> # I traced it a little:
> >library.dynam.unload("Foo", ".Rcheck/Foo")
> Error in library.dynam.unload("Foo", ".Rcheck/Foo") :
>   shared library 'Foo' was not loaded
>
> # using an absolute path works
> >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo")
>
>
> So from what I understand, the problem is either that the relative libpath
> is sent to the .onUnload() function instead of the absolute one,
> or that library.dynam.unload() should be modified to handle the relative
> paths.
>
> Am I missing something ? What should I do ?
>
> Thanks,
>
>
> Karl
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug or annoyance with library.dynam.unload()

2010-09-22 Thread Karl Forner
Thanks Duncan for your suggestion.

I could not find any package using dynamic library, namespaces and not the
useDynLib pragma so
I created a minimalistic package to demonstrate the problem.
Please find attached a very small package foo (8.8k)

Steps to reproduce the problem:

* unarchive it ( tar zxvf foo_0.1.tar.gz )
* cd foo
* install it locally ( mkdir local; R CMD INSTALL -l local . )
* R
> library(foo, lib.loc="local/")
>.dynLibs()
# there you should be able to see the foo.so lib, in my case
/x05/people/m160508/workspace/foo/local/foo/libs/foo.so

> unloadNamespace("foo")
.onUnload, libpath= local/fooWarning message:
.onUnload failed in unloadNamespace() for 'foo', details:
  call: library.dynam.unload("foo", libpath)
  error: shared library 'foo' was not loaded

#The libpath that the .onUnload() gets is "local/foo".

#This fails:
>library.dynam.unload("foo", "local/foo")
Error in library.dynam.unload("foo", "local/foo") :
  shared library 'foo' was not loaded

# but if you use the absolute path it works:
>library.dynam.unload("foo", "/x05/people/m160508/workspace/foo/local/foo")

Karl

On Tue, Sep 21, 2010 at 5:33 PM, Duncan Murdoch wrote:

>  On 21/09/2010 10:38 AM, Karl Forner wrote:
>
>> Hello,
>>
>> I got no reply on this issue.
>> It is not critical and I could think of work-around, but it really looks
>> like a bug to me.
>> Should I file a bug-report instead of posting in this list ?
>>
>
> I'd probably post instructions for a reproducible example first.  Pick some
> CRAN package, tell us what to do with it to trigger the error, and then we
> can see if it's something special about your package or Roxygen or a general
> problem.
>
> Duncan Murdoch
>
>  Thanks,
>>
>> Karl
>>
>> On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner
>>  wrote:
>>
>> >  Hello,
>> >
>> >  I have a package with a namespace. Because I use Roxygen that
>> overwrites
>> >  the NAMESPACE file each time it is run, I use a R/zzz.R file with
>> >  an .onLoad() and .onUnload() functions to take care of loading and
>> >  unloading my shared library.
>> >
>> >  The problem: if I load my library from a local directory, then the
>> >  unloading of the package fails, e.g:
>> >
>> >  # loads fine
>> >  >library(Foo, lib.loc=".Rcheck")
>> >
>> >  >unloadNamespace("Foo")
>> >  Warning message:
>> >  .onUnload failed in unloadNamespace() for 'Foo', details:
>> >call: library.dynam.unload("Foo", libpath)
>> >error: shared library 'Foo' was not loaded
>> >
>> >  # I traced it a little:
>> >  >library.dynam.unload("Foo", ".Rcheck/Foo")
>> >  Error in library.dynam.unload("Foo", ".Rcheck/Foo") :
>> >shared library 'Foo' was not loaded
>> >
>> >  # using an absolute path works
>> >  >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo")
>> >
>> >
>> >  So from what I understand, the problem is either that the relative
>> libpath
>> >  is sent to the .onUnload() function instead of the absolute one,
>> >  or that library.dynam.unload() should be modified to handle the
>> relative
>> >  paths.
>> >
>> >  Am I missing something ? What should I do ?
>> >
>> >  Thanks,
>> >
>> >
>> >  Karl
>> >
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>


foo_0.1.tar.gz
Description: GNU Zip compressed data
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug or annoyance with library.dynam.unload()

2010-09-22 Thread Karl Forner
> Your package depends on Rcpp, so I didn't try it in the alpha version of
2.12.0

It's a mistake, in fact it does not depend anymore. You can safely delete
the src/Makevars file.


Duncan Murdoch
>
>
>  Steps to reproduce the problem:
>>
>> * unarchive it ( tar zxvf foo_0.1.tar.gz )
>> * cd foo
>> * install it locally ( mkdir local; R CMD INSTALL -l local . )
>> * R
>> >  library(foo, lib.loc="local/")
>> >.dynLibs()
>> # there you should be able to see the foo.so lib, in my case
>> /x05/people/m160508/workspace/foo/local/foo/libs/foo.so
>>
>> >  unloadNamespace("foo")
>> .onUnload, libpath= local/fooWarning message:
>> .onUnload failed in unloadNamespace() for 'foo', details:
>>   call: library.dynam.unload("foo", libpath)
>>   error: shared library 'foo' was not loaded
>>
>> #The libpath that the .onUnload() gets is "local/foo".
>>
>> #This fails:
>> >library.dynam.unload("foo", "local/foo")
>> Error in library.dynam.unload("foo", "local/foo") :
>>   shared library 'foo' was not loaded
>>
>> # but if you use the absolute path it works:
>> >library.dynam.unload("foo",
>> "/x05/people/m160508/workspace/foo/local/foo")
>>
>> Karl
>>
>> On Tue, Sep 21, 2010 at 5:33 PM, Duncan Murdoch> >wrote:
>>
>> >   On 21/09/2010 10:38 AM, Karl Forner wrote:
>> >
>> >>  Hello,
>> >>
>> >>  I got no reply on this issue.
>> >>  It is not critical and I could think of work-around, but it really
>> looks
>> >>  like a bug to me.
>> >>  Should I file a bug-report instead of posting in this list ?
>> >>
>> >
>> >  I'd probably post instructions for a reproducible example first.  Pick
>> some
>> >  CRAN package, tell us what to do with it to trigger the error, and then
>> we
>> >  can see if it's something special about your package or Roxygen or a
>> general
>> >  problem.
>> >
>> >  Duncan Murdoch
>> >
>> >   Thanks,
>> >>
>> >>  Karl
>> >>
>> >>  On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner
>> >>   wrote:
>> >>
>> >>  >   Hello,
>> >>  >
>> >>  >   I have a package with a namespace. Because I use Roxygen that
>> >>  overwrites
>> >>  >   the NAMESPACE file each time it is run, I use a R/zzz.R file with
>> >>  >   an .onLoad() and .onUnload() functions to take care of loading and
>> >>  >   unloading my shared library.
>> >>  >
>> >>  >   The problem: if I load my library from a local directory, then the
>> >>  >   unloading of the package fails, e.g:
>> >>  >
>> >>  >   # loads fine
>> >>  >   >library(Foo, lib.loc=".Rcheck")
>> >>  >
>> >>  >   >unloadNamespace("Foo")
>> >>  >   Warning message:
>> >>  >   .onUnload failed in unloadNamespace() for 'Foo', details:
>> >>  > call: library.dynam.unload("Foo", libpath)
>> >>  > error: shared library 'Foo' was not loaded
>> >>  >
>> >>  >   # I traced it a little:
>> >>  >   >library.dynam.unload("Foo", ".Rcheck/Foo")
>> >>  >   Error in library.dynam.unload("Foo", ".Rcheck/Foo") :
>> >>  > shared library 'Foo' was not loaded
>> >>  >
>> >>  >   # using an absolute path works
>> >>  >   >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo")
>> >>  >
>> >>  >
>> >>  >   So from what I understand, the problem is either that the relative
>> >>  libpath
>> >>  >   is sent to the .onUnload() function instead of the absolute one,
>> >>  >   or that library.dynam.unload() should be modified to handle the
>> >>  relative
>> >>  >   paths.
>> >>  >
>> >>  >   Am I missing something ? What should I do ?
>> >>  >
>> >>  >   Thanks,
>> >>  >
>> >>  >
>> >>  >   Karl
>> >>  >
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >>  __
>> >>  R-devel@r-project.org mailing list
>> >>  https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >
>> >
>>
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] checking user interrupts in C(++) code

2010-09-28 Thread Karl Forner
Hello,

My problem is that I have an extension in C++ that can be quite
time-consuming. I'd like to make it interruptible.
The problem is that if I use the recommended R_CheckUserInterrupt() method I
have no possibility to cleanup (e.g. free the memory).

I've seen an old thread about this, but I wonder if there's a new and
definitive answer.

I just do not understand why a simple R_CheckUserInterrupt() like method
returning a boolean could not be used.
Please enlighten me !

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] checking user interrupts in C(++) code

2010-09-29 Thread Karl Forner
Hi,

Thanks for your reply,


There are several ways in which you can make your code respond to interrupts
> properly - which one is suitable depends on your application. Probably the
> most commonly used for interfacing foreign objects is to create an external
> pointer with a finalizer - that makes sure the object is released even if
> you pass it on to R later. For memory allocated within a call you can either
> use R's transient memory allocation (see Salloc) or use the on.exit handler
> to cleanup any objects you allocated manually and left over.
>

Using  R's transient memory allocation is not really an option when you use
some code, like a library, not developed for R. Moreover what about c++ and
the new operator ?

One related question: if the code is interrupted, are C++ local objects
freed ?
Otherwise it is very very complex to attack all allocated objects, moreover
it depends on where happens the interruption

Best,

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] dendrogram plot does not draw long labels ?

2011-01-25 Thread Karl Forner
Hello,

It seems that the plot function for dendrograms does not draw labels when
they are too long.

> hc <- hclust(dist(USArrests), "ave")
> dend1 <- as.dendrogram(hc)
> dend2 <- cut(dend1, h=70)
> dd <- dend2$lower[[1]]
> plot(dd) # first label is drawn
> attr(dd[[1]], "label") <- "aa"
> plot(dd) # first label is NOT drawn

Is this expected ?
Is it possible to force the drawing ?

Thank you,

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] dendrogram plot does not draw long labels ?

2011-01-25 Thread Karl Forner
Hi Tobias and thank you for your reply,

Using your insight I managed to work-around the issue (with some help) by
increasing
the "mai" option of par().
For example a "mai" with first coordinate (bottom) set to 5 allows to
display ~ 42 letters.

We tried to change the xpd value in the text() call that you mentioned, but
it did not seem to fix the problem.

But I think this is very annoying: the dendrogram plot is meant to be the
common unique plotting for all clustering stuff
and suddenly if your labels are just too long, nothing get displayed,
without even a warning.
I suppose that the margins should be dynamically set based on the max label
text drawn length...

The hclust plot seemed to handle very nicely these long labels, but I need
to display colored labels and the only way I found is to use the
plot.dendrogram for this.

Best,

Karl

On Tue, Jan 25, 2011 at 12:17 PM, Tobias Verbeke <
tobias.verb...@openanalytics.eu> wrote:

> Hi Karl,
>
>
> On 01/25/2011 11:27 AM, Karl Forner wrote:
>
>  It seems that the plot function for dendrograms does not draw labels when
>> they are too long.
>>
>>  hc<- hclust(dist(USArrests), "ave")
>>> dend1<- as.dendrogram(hc)
>>> dend2<- cut(dend1, h=70)
>>> dd<- dend2$lower[[1]]
>>> plot(dd) # first label is drawn
>>> attr(dd[[1]], "label")<- "aa"
>>> plot(dd) # first label is NOT drawn
>>>
>>
>> Is this expected ?
>>
>
> Reading the code of stats:::plotNode, yes.
>
> Clipping to the figure region is hard-coded.
>
> You can see it is clipping to the figure region as follows:
>
>
> hc <- hclust(dist(USArrests), "ave")
> dend1 <- as.dendrogram(hc)
> dend2 <- cut(dend1, h=70)
> dd <- dend2$lower[[1]]
> op <- par(oma = c(8,4,4,2)+0.1, xpd = NA)
>
> plot(dd) # first label is drawn
> attr(dd[[1]], "label") <- "abcdefghijklmnopqrstuvwxyz"
>
> plot(dd) # first label is NOT drawn
> box(which = "figure")
> par(op)
>
>
>  Is it possible to force the drawing ?
>>
>
> These are (from very quick reading -- not verified)
> the culprit lines in plotNode, I think:
>
> text(xBot, yBot + vln, nodeText, xpd = TRUE, # <- clipping hard-coded
>  cex = lab.cex, col = lab.col, font = lab.font)
>
> Best,
> Tobias
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible bug in cut.dendrogram when there are only 2 leaves in the tree ?

2011-01-28 Thread Karl Forner
Hello,

I noticed a behavior ot the cut() function that does not seem right. In a
dendrogram with only 2 leaves in one cluster, if you cut()
at a height above this cluster, you end up with 2 cut clusters, one for each
leaf, instead of one.

But it seems to work fine for dendrograms with more than 2 objects.

For instance:

library(stats)
m <- matrix(c(0,0.1,0.1,0),nrow=2, ncol=2)
dd <- as.dendrogram(hclust(as.dist(m)))
#plot(dd)
print(cut(dd, 0.2)) # 2 clusters in $lower

m2 <- matrix(c(0,0.1,0.5,0.1,0,0.5,0.5,0.5,0),nrow=3, ncol=3)
dd <- as.dendrogram(hclust(as.dist(m2)))
print(cut(dd, 0.2)) # here 2 clusters in $lower, as expected

So the question is: is it expected behavior that the whole tree is not
reported in the $lower if it is itself under the threshold ?

Thank you,

Karl FORNER

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error in svg() : cairo-based devices are not supported on this build

2011-05-19 Thread Karl Forner
Hello,

Sorry if it is not the right place..


I installed R-2.13.0 on a x86_64 linux server.
All went fine, but the svg()  function yells:
> svg()
Error in svg() : cairo-based devices are not supported on this build

I have the Cairo, cairoDevice, RSvgDevice packages installed, and running.

> Cairo.capabilities()
  png  jpeg  tiff   pdf   svgps   x11   win
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

I tried to google around unsuccessfully. The only thing I noticed in
config.log is:
r_cv_has_pangocairo=no
r_cv_cairo_works=yes
r_cv_has_cairo=yes
#define HAVE_WORKING_CAIRO 1
#define HAVE_CAIRO_PDF 1
#define HAVE_CAIRO_PS 1
#define HAVE_CAIRO_SVG 1


So what can be wrong ??

Thank you

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: Error in svg() : cairo-based devices are not supported on this build

2011-06-06 Thread Karl Forner
Check what configure is saying when you build R and config.log. You may be
> simply missing something like pango-dev - Cairo doesn't use pango while R
> does - but it is usually optional (it works on my Mac without pango) so
> there may be more to it - config.log will tell you.
>

I managed to compile it successfully with pango-cairo support by editing the
configure script and adding the pangoxft module to the pkg-config list:
%diff -c configure.bak  configure
*** configure.bak   2011-05-31 16:16:55.0 +0200
--- configure   2011-05-31 16:17:21.0 +0200
***
*** 31313,31319 
  $as_echo "$r_cv_has_pangocairo" >&6; }
if test "x${r_cv_has_pangocairo}" = "xyes"; then
  modlist="pangocairo"
! for module in cairo-xlib cairo-png; do
if "${PKGCONF}" --exists ${module}; then
modlist="${modlist} ${module}"
fi
--- 31313,31319 
  $as_echo "$r_cv_has_pangocairo" >&6; }
if test "x${r_cv_has_pangocairo}" = "xyes"; then
  modlist="pangocairo"
! for module in cairo-xlib cairo-png pangoxft; do
if "${PKGCONF}" --exists ${module}; then
modlist="${modlist} ${module}"
fi


I do not know if it is an error in the configure script or just a
peculiarity of my installation. All these libs (pango, cairo, gtk, glib)
have been installed manually from tarballs.

Best,

Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mcparallel (parallel:::mcexit) does not call finalizers

2016-06-16 Thread Karl Forner
Hello,

In the context of trying to cover a package code that use parallelized
tests using the covr package, I realized that code executed using
mcparallel() was not covered,
cf https://github.com/jimhester/covr/issues/189#issuecomment-226492623

>From my understanding, it seems that the package finalizer set by covr (cf
https://github.com/jimhester/covr/blob/79f7e0434f3d14a48c6fea994b67b9814b34e4e5/R/covr.R#L348)
is not called, because the forked process exits using parallel:::mcexit,
which is a non standard exit and does not call some of the cleanup code
(e.g. the R_CleanUp function is not called).

I was wondering if a modification of the parallel mcexit could be
considered, to make it call the finalizers, possibly triggered by a
parameter or an option, or if there are solid reasons not to do so.

Regards,
Karl Forner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] weird dir() behavior with broken symlinks

2016-10-18 Thread Karl Forner
I encountered very weird behavior of the dir() function, that I just can
not understand.

Reproducible example:

docker run -ti rocker/r-base
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> # setup
> tmp <- tempfile()
> dir.create(tmp)
> setwd(tmp)
> file.symlink('from', 'to')

# First weirdness, the behavior of the recursive argument
> dir()
[1] "to"
> dir(recursive=TRUE)
character(0)

# include.dirs make it work again. The doc states: Should subdirectory
names be included in
# recursive listings?  (They always are in non-recursive ones).
>dir(recursive=TRUE, include.dirs=TRUE)
[1] "to"

Best,
Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] weird dir() behavior with broken symlinks

2016-10-18 Thread Karl Forner
another strange behavior of list.dirs(), that seems related:
docker run -ti rocker/r-base

> setwd(tempdir())
> file.symlink('from', 'to')
[1] TRUE
> list.dirs(recursive=FALSE)
[1] "./to"

> file.symlink('C/non_existing.doc', 'broken.txt')
[1] TRUE
> list.dirs(recursive=FALSE)
[1] "./broken.txt"


On Tue, Oct 18, 2016 at 3:08 PM, Karl Forner  wrote:

> I encountered very weird behavior of the dir() function, that I just can
> not understand.
>
> Reproducible example:
>
> docker run -ti rocker/r-base
> R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
> Copyright (C) 2016 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
> > # setup
> > tmp <- tempfile()
> > dir.create(tmp)
> > setwd(tmp)
> > file.symlink('from', 'to')
>
> # First weirdness, the behavior of the recursive argument
> > dir()
> [1] "to"
> > dir(recursive=TRUE)
> character(0)
>
> # include.dirs make it work again. The doc states: Should subdirectory
> names be included in
> # recursive listings?  (They always are in non-recursive ones).
> >dir(recursive=TRUE, include.dirs=TRUE)
> [1] "to"
>
> Best,
> Karl
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug in order function

2017-09-14 Thread Karl Nordström

Dear R-devel(opers),

I wanted to draw your attention to a small problem with the order 
function in base. According to the documentation, radix sort supports 
different orders for each argument. This breaks when one of the 
arguments is an object.


Please have a look to this stackoverflow question:

https://stackoverflow.com/questions/39737871/r-order-method-on-multiple-columns-gives-error-argument-lengths-differ

It describes the problem well and suggests a solution.

Although it is a niche case, it's a very easy thing to fix :)

Best regards,

Karl Nordström

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] bug in package.skeleton(), and doc typo.

2013-06-04 Thread Karl Forner
Hi all,

I think there's a bug in package.skeleton(), when using the environment
argument:

Example:

env <- new.env()
env$hello  <- function() { print('hello') }
package.skeleton(name='mypkg', environment=env)

==> does not create any source in mypkg/R/*

By the way, package.skeleton(name='mypkg', environment=env, list="hello")
does not work either.

According to the documentation:
>The arguments list, environment, and code_files provide alternative ways
to initialize the package.
> If code_files is supplied, the files so named will be sourced to form the
environment, then used to generate the package skeleton.
>Otherwise list defaults to the non-hidden files in environment (those
whose name does not start with .), but can be supplied to select a subset
of the objects in that environment.

I believe to have found the problem: in package.skeleton() body, the two
calls to dump():
> dump(internalObjs, file = file.path(code_dir, sprintf("%s-internal.R",
name)))
> dump(item, file = file.path(code_dir, sprintf("%s.R", list0[item])))
should use the extra argument: envir=environment

There's also a typo in the doc:
The sentence:
> Otherwise list defaults to the non-hidden **files** in environment (those
whose name does not start with .)
should be
> Otherwise list defaults to the non-hidden **objects** in environment
(those whose name does not start with .)

Best,
Karl Forner



>  sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8  LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8   LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8   LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8  LC_NAME=en_US.UTF-8
 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] rj_1.1.3-1

loaded via a namespace (and not attached):
[1] rj.gd_1.1.3-1 tools_3.0.1

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sys.source() does not provide the parsing info to eval()

2013-06-24 Thread Karl Forner
Hello,

It seems that the parsing information attached to expressions parsed by the
parse() function when keep.source=TRUE is not provided to the eval()
function.

Please consider this code:

path <- tempfile()
code <- '(function() print( str( sys.calls() ) ))()'
writeLines(code, path)
sys.source(path, envir=globalenv(), keep.source=TRUE)

> OUTPUT:
Dotted pair list of 4
 $ : language sys.source(path, envir = globalenv(), keep.source = TRUE)
 $ : language eval(i, envir)
 $ : language eval(expr, envir, enclos)
 $ : language (function() print(str(sys.calls(()
NULL

then:
eval(parse(text=code))
> OUTPUT:
Dotted pair list of 3
 $ : language eval(parse(text = code))
 $ : language eval(expr, envir, enclos)
 $ :length 1 (function() print(str(sys.calls(()
  ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 1 1 42 1 42 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'


As you can see, when using eval() directly, the expression/call has the
parsing information available in the "srcref" attribute, but not when using
sys.source()

Looking at sys.source() implementation, this seems to be caused by this
line:
for (i in exprs) eval(i, envir)

The attribute "srcref" is not available anymore when "exprs" is subsetted,
as illustred by the code below:

ex <- parse( text="1+1; 2+2")

attr(ex, 'srcref')
print(str(ex))
# length 2 expression(1 + 1, 2 + 2)
#  - attr(*, "srcref")=List of 2
#   ..$ :Class 'srcref'  atomic [1:8] 1 1 1 3 1 3 1 1
#   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'

#   ..$ :Class 'srcref'  atomic [1:8] 1 6 1 8 6 8 1 1
#   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'

#  - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' 
#  - attr(*, "wholeSrcref")=Class 'srcref'  atomic [1:8] 1 0 2 0 0 0 1 2
#   .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'

# NULL

print( str(ex[[1]]))
#  language 1 + 1
# NULL

print( str(ex[1]))
# length 1 expression(1 + 1)
#  - attr(*, "srcref")=List of 1
#   ..$ :Class 'srcref'  atomic [1:8] 1 1 1 3 1 3 1 1
#   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'

# NULL


I suppose that the line "for (i in exprs) eval(i, envir)" could be replaced
by "eval(exprs, envir)" ?

Best,

Karl Forner



P.S
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
...

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Comments requested on "changedFiles" function

2013-09-04 Thread Karl Millar
Hi Duncan,

I think this functionality would be much easier to use and understand if
you split it up the functionality of taking snapshots and comparing them
into separate functions.  In addition, the 'timestamp' functionality seems
both confusing and brittle to me.  I think it would be better to store file
modification times in the snapshot and use those instead of an external
file.  Maybe:

# Take a snapshot of the files.
takeFileSnapshot(directory, file.info = TRUE, md5sum = FALSE, full.names =
FALSE, recursive = TRUE, ...)

# Take a snapshot using the same options as used for snapshot.
retakeFileSnapshot(snapshot, directory = snapshot$directory) {
   takeFileSnapshot)(directory, file.info = snapshot$file.info, md5sum =
snapshot$md5sum, etc)
}

compareFileSnapshots(snapshot1, snapshot2)
- or -
getNewFiles(snapshat1, snapshot2)   # These names are probably too
generic
getDeletedFiles(snapshot1, snapshot2)
getUpdatedFiles(snapshot1, snapshot2)
-or-
setdiff(snapshot1, snapshot2)  # Unclear how this should treat updated files


This approach does have the difficulty that users could attempt to compare
snapshots that were taken with different options and that can't be
compared, but that should be an easy error to detect.

Karl


On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch wrote:

> In a number of places internal to R, we need to know which files have
> changed (e.g. after building a vignette).  I've just written a general
> purpose function "changedFiles" that I'll probably commit to R-devel.
>  Comments on the design (or bug reports) would be appreciated.
>
> The source for the function and the Rd page for it are inline below.
>
> - changedFiles.R:
> changedFiles <- function(snapshot, timestamp = tempfile("timestamp"),
> file.info = NULL,
>  md5sum = FALSE, full.names = FALSE, ...) {
> dosnapshot <- function(args) {
> fullnames <- do.call(list.files, c(full.names = TRUE, args))
> names <- do.call(list.files, c(full.names = full.names, args))
> if (isTRUE(file.info) || (is.character(file.info) && length(
> file.info))) {
> info <- file.info(fullnames)
> rownames(info) <- names
> if (isTRUE(file.info))
> file.info <- c("size", "isdir", "mode", "mtime")
> } else
> info <- data.frame(row.names=names)
> if (md5sum)
> info <- data.frame(info, md5sum = tools::md5sum(fullnames))
> list(info = info, timestamp = timestamp, file.info = file.info,
>  md5sum = md5sum, full.names = full.names, args = args)
> }
> if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) {
> if (length(timestamp) == 1)
> file.create(timestamp)
> if (missing(snapshot)) snapshot <- "."
> pre <- dosnapshot(list(path = snapshot, ...))
> pre$pre <- pre$info
> pre$info <- NULL
> pre$wd <- getwd()
> class(pre) <- "changedFilesSnapshot"
> return(pre)
> }
>
> if (missing(timestamp)) timestamp <- snapshot$timestamp
> if (missing(file.info) || isTRUE(file.info)) file.info <- snapshot$
> file.info
> if (identical(file.info, FALSE)) file.info <- NULL
> if (missing(md5sum))md5sum <- snapshot$md5sum
> if (missing(full.names)) full.names <- snapshot$full.names
>
> pre <- snapshot$pre
> savewd <- getwd()
> on.exit(setwd(savewd))
> setwd(snapshot$wd)
>
> args <- snapshot$args
> newargs <- list(...)
> args[names(newargs)] <- newargs
> post <- dosnapshot(args)$info
> prenames <- rownames(pre)
> postnames <- rownames(post)
>
> added <- setdiff(postnames, prenames)
> deleted <- setdiff(prenames, postnames)
> common <- intersect(prenames, postnames)
>
> if (length(file.info)) {
> preinfo <- pre[common, file.info]
> postinfo <- post[common, file.info]
> changes <- preinfo != postinfo
> }
> else changes <- matrix(logical(0), nrow = length(common), ncol = 0,
>dimnames = list(common, character(0)))
> if (length(timestamp))
> changes <- cbind(changes, Newer = file_test("-nt", common,
> timestamp))
> if (md5sum) {
> premd5 <- pre[common, "md5sum"]
> postmd5 <- post[common, "md5sum"]
> changes <- cbind(changes, md5sum = premd5 != postmd5)
> }
> changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE]
> changed <- 

Re: [Rd] Comments requested on "changedFiles" function

2013-09-05 Thread Karl Millar
Comments inline:


On Wed, Sep 4, 2013 at 6:10 PM, Duncan Murdoch  wrote:
>
> On 13-09-04 8:02 PM, Karl Millar wrote:
>>
>> Hi Duncan,
>>
>> I think this functionality would be much easier to use and understand if
>> you split it up the functionality of taking snapshots and comparing them
>> into separate functions.
>
>
> Yes, that's another possibility.  Some more comment below...
>
>
>
>  In addition, the 'timestamp' functionality
>>
>> seems both confusing and brittle to me.  I think it would be better to
>> store file modification times in the snapshot and use those instead of
>> an external file.  Maybe:
>
>
> You can do that, using file.info = "mtime", but the file.info snapshots are 
> quite a bit slower than using the timestamp file (when looking at a big 
> recursive directory of files).


Sorry, I completely failed to explain what I was thinking here.  There
are a number of issues here, but the biggest one is that you're
implicitly assuming that files that get modified will have mtimes that
come after the timestamp file was created.  This isn't always true,
with the most notable exception being if you download a package from
CRAN and untar it, the mtimes are usually well in the past (at least
with GNU tar on a linux system), so this code won't notice that the
files have changed.

It may be a good idea to store the file sizes as well, which would
help prevent false negatives in the (rare IIRC) cases where the
contents have changed but the mtime values have not.  Since you
already need to call file.info() to get the mtime, this shouldn't
increase the runtime, and the extra memory needed is fairly modest.

>>
>> # Take a snapshot of the files.
>> takeFileSnapshot(directory, file.info <http://file.info> = TRUE, md5sum
>>
>> = FALSE, full.names = FALSE, recursive = TRUE, ...)
>>
>> # Take a snapshot using the same options as used for snapshot.
>> retakeFileSnapshot(snapshot, directory = snapshot$directory) {
>> takeFileSnapshot)(directory, file.info <http://file.info> =
>> snapshot$file.info <http://file.info>, md5sum = snapshot$md5sum, etc)
>>
>> }
>>
>> compareFileSnapshots(snapshot1, snapshot2)
>> - or -
>> getNewFiles(snapshat1, snapshot2)   # These names are probably too
>> generic
>> getDeletedFiles(snapshot1, snapshot2)
>> getUpdatedFiles(snapshot1, snapshot2)
>> -or-
>> setdiff(snapshot1, snapshot2)  # Unclear how this should treat updated files
>>
>>
>> This approach does have the difficulty that users could attempt to
>> compare snapshots that were taken with different options and that can't
>> be compared, but that should be an easy error to detect.
>
>
> I don't want to add too many new functions.  The general R style is to have 
> functions that do a lot, rather than have a lot of different functions to 
> achieve different parts of related tasks.  This is better for interactive use 
> (fewer functions to remember, a simpler help system to navigate), though it 
> probably results in less readable code.


This is somewhat more nuanced and not particular to interactive use
IMHO.  Having functions that do a lot is good, _as long as the
semantics are always consistent_.  For example, lm() does a huge
amount and has a wide variety of ways that you can specify your data,
but it basically does the same thing no matter how you use it.  On the
other hand, if you have a function that does different things
depending on how you call it (e.g. reshape()) then it's easy to
remember the function name, but much harder to remember how to call it
correctly, harder to understand the documentation and less readable.

>
> I can see an argument for two functions (a get and a compare), but I don't 
> think there are many cases where doing two gets and comparing the snapshots 
> would be worth the extra runtime.  (It's extra because file.info is only a 
> little faster than list.files, and it would be unavoidable to call both twice 
> in that version.  Using the timestamp file avoids one of those calls, and 
> replaces the other with file_test, which takes a similar amount of time.  So 
> overall it's about 20-25% faster.)  It also makes the code a bit more 
> complicated, i.e. three calls (get, get, compare) instead of two (get, 
> compare).


I think a 'snapshotDirectory' and 'compareDirectoryToSnapshot'
combination might work well.

Thanks,

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Comments requested on "changedFiles" function

2013-09-06 Thread Karl Millar
Hi Duncan,

I like the interface of this version a lot better, but there's still a
bunch of implementation details that need fixing:

* As previously mentioned, there are important cases where the mtime
values change in ways that this code doesn't detect.
* If the timestamp file (which is usually in the temp directory) gets
deleted (which can happen after a moderate amount of time of
inactivity on some systems), then the file_test('-nt', ...) will
always return false, even if the file has changed.
* If files get added or deleted between the two calls to list.files in
fileSnapshot, it will fail with an error.
* If the path is on a remote file system, tempdir is local, and
there's significant clock skew, then you can get incorrect results.

Unfortunately, these aren't just theoretical scenarios -- I've had the
misfortune to run up against all of them in the past.

I've attached code that's loosely based on your implementation that
solves these problems AFAICT.  Alternatively, Hadley's code handles
all of these correctly, with the exception that compare_state doesn't
handle the case where safe_digest returns NA very well.

Regards,

Karl

On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch  wrote:
> On 13-09-06 7:40 PM, Scott Kostyshak wrote:
>>
>> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch 
>> wrote:
>>>
>>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote:
>>>>
>>>>
>>>> I have now put the code into a temporary package for testing; if anyone
>>>> is interested, for a few days it will be downloadable from
>>>>
>>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>
>>>
>>>
>>> Sorry, error in the URL.  It should be
>>>
>>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>
>>
>> Works well. A couple of things I noticed:
>>
>> (1)
>> md5sum is being called on directories, which causes warnings. (If this
>> is not viewed as undesirable, please ignore the rest of this comment.)
>> Should this be the responsibility of the user (by passing arguments to
>> list.files)? In the example, changing
>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE)
>> to
>> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE, include.dirs=FALSE,
>> recursive=TRUE")
>>
>> gets rid of the warnings. But perhaps the user just wants to exclude
>> directories for the md5sum calculations. This can't be controlled from
>> fileSnapshot.
>
>
> I don't see the warnings, I just get NA values.  I'll try to see why there's
> a difference.  (One possibility is my platform (Windows); another is that
> I'm generally testing in R-patched and R-devel rather than the 3.0.1 release
> version.)  I would rather suppress the warnings than make the user avoid
> them.
>
>
>>
>> Or, should the "if (md5sum)" chunk subset "fullnames" using file_test
>> or file.info to exclude directories (and then fill in the directories
>> with NA)?
>>
>> (2)
>> If I run example(changedFiles) several times, sometimes I get:
>>
>> chngdF> changedFiles(snapshot)
>> File changes:
>>mtime md5sum
>> file2  TRUE   TRUE
>>
>> and other times I get:
>>
>> chngdF> changedFiles(snapshot)
>> File changes:
>>md5sum
>> file2   TRUE
>>
>> I wonder why.
>
>
> Sometimes the example runs so quickly that the new version has exactly the
> same modification time as the original.  That's the risk of the mtime check.
> If you put a delay between, you'll get consistent results.
>
> Duncan Murdoch
>
>
>>
>> Scott
>>
>>> sessionInfo()
>>
>> R Under development (unstable) (2013-08-31 r63780)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> other attached packages:
>> [1] testpkg_1.0
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.1.0
>>>
>>>
>>
>>
>> --
>> Scott Kostyshak
>> Economics PhD Candidate
>> Princeton University
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Comments requested on "changedFiles" function

2013-09-06 Thread Karl Millar
On Fri, Sep 6, 2013 at 7:03 PM, Duncan Murdoch  wrote:
> On 13-09-06 9:21 PM, Karl Millar wrote:
>>
>> Hi Duncan,
>>
>> I like the interface of this version a lot better, but there's still a
>> bunch of implementation details that need fixing:
>>
>> * As previously mentioned, there are important cases where the mtime
>> values change in ways that this code doesn't detect.
>> * If the timestamp file (which is usually in the temp directory) gets
>> deleted (which can happen after a moderate amount of time of
>> inactivity on some systems), then the file_test('-nt', ...) will
>> always return false, even if the file has changed.
>
>
> If that happened without user intervention, I think it would break other
> things in R -- the temp directory is supposed to last for the whole session.
> But I should be checking anyway.

Yes, it does break other things in R -- my experience has been that
the help system seems to be the one that is impacted the most by this.
 FWIW, I've never seen the entire R temp directory deleted, just
individual files and subdirectories in it, but even that probably
depends on how the machine is configured.  I suspect only a few users
ever notice this, but my R use is probably somewhat anomalous and I
think it only happens to R sessions that I haven't used for a few
days.

>> * If files get added or deleted between the two calls to list.files in
>> fileSnapshot, it will fail with an error.
>
>
> Yours won't work if path contains more than one directory.  This is probably
> a reasonable restriction, but it's inconsistent with list.files, so I'd like
> to avoid it if I can find a way.

I'm currently unsure what the behaviour when comparing snapshots with
multiple directories should be.

Presumably we should have the property that (horribly abusing notation
for succinctness):
  compareSnapshots(c(a1, a2),  c(a1, a2))
is the same as concatenating (in some form)
  compareSnapshots(a1, a1) and compareSnapshots(a2, a2)
and there's a bunch of ways we could concatenate -- we could return a
list of results, or a single result where each of the 'added, deleted,
modified' fields are a list, or where we concatenate the 'added,
deleted, modified' fields together into three simple vectors.
Concatenating the vectors together like this is appealing, but unless
you're using the full names, it doesn't include the information of
which directory the changes are in, and using the full names doesn't
work in the case where you're comparing different sets of directories,
e.g. compareSnapshots(c(a1, a2), c(b1, b2)), where there is no
sensible choice for a full name.  The list options don't have this
problem, but are harder to work with, particularly for the common case
where there's only a single directory.  You'd also have to be somewhat
careful with filenames that occur in both directories.

Maybe I'm just being dense, but I don't see a way to do this thats
clear, easy to use and wouldn't confuse users at the moment.

Karl

> Duncan Murdoch
>
>
>> * If the path is on a remote file system, tempdir is local, and
>> there's significant clock skew, then you can get incorrect results.
>>
>> Unfortunately, these aren't just theoretical scenarios -- I've had the
>> misfortune to run up against all of them in the past.
>>
>> I've attached code that's loosely based on your implementation that
>> solves these problems AFAICT.  Alternatively, Hadley's code handles
>> all of these correctly, with the exception that compare_state doesn't
>> handle the case where safe_digest returns NA very well.
>>
>> Regards,
>>
>> Karl
>>
>> On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch 
>> wrote:
>>>
>>> On 13-09-06 7:40 PM, Scott Kostyshak wrote:
>>>>
>>>>
>>>> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch
>>>> 
>>>> wrote:
>>>>>
>>>>>
>>>>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have now put the code into a temporary package for testing; if
>>>>>> anyone
>>>>>> is interested, for a few days it will be downloadable from
>>>>>>
>>>>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sorry, error in the URL.  It should be
>>>>>
>>>>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz
>>>

Re: [Rd] Using long long types in C++

2013-09-19 Thread Karl Millar
Romain,

Can you use int64_t and uint_t64 instead?  IMHO that would be more useful
than long long anyway.

Karl
On Sep 19, 2013 5:33 PM, "Patrick Welche"  wrote:

> On Fri, Sep 20, 2013 at 12:51:52AM +0200, rom...@r-enthusiasts.com wrote:
> > In Rcpp we'd like to do something useful for types such as long long
> > and unsigned long long.
> ...
> > But apparently this is still not enough and on some versions of gcc
> > (e.g. 4.7 something), -pedantic still generates the warnings unless
> > we also use -Wno-long-long
>
> Can you also add -std=c++0x or is that considered as bad as adding
> -Wno-long-long?
>
> (and why not use autoconf's AC_TYPE_LONG_LONG_INT and
> AC_TYPE_UNSIGNED_LONG_LONG_INT for the tests?)
>
> Cheers,
>
> Patrick
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Possible problem with namespaceImportFrom() and methods for generic primitive functions

2013-10-18 Thread Karl Forner
Hi all,

I have a problem with a package that imports two other packages which both
export a method for the `[` primitive function.

I set up a reproducible example here:
https://github.com/kforner/namespaceImportFrom_problem.git

Basically, the testPrimitiveImport package imports testPrimitiveExport1 and
testPrimitiveExport2, which both export a S4 class and a `[` method for the
class.
Then:
R CMD INSTALL -l lib testPrimitiveExport1
R CMD INSTALL -l lib testPrimitiveExport2

The command:
R CMD INSTALL -l lib testPrimitiveImport

gives me:
Error in namespaceImportFrom(self, asNamespace(ns)) :
  trying to get slot "package" from an object of a basic class ("function")
with no slots

I get the same message if I check the package (with R CMD check), or even
if I try to load it using devtools::load_all()


I tried to investigate the problem, and I found that the error arises in
the base::namespaceImportFrom() function, and more precisely in
this block:
for (n in impnames) if (exists(n, envir = impenv, inherits = FALSE)) {
if (.isMethodsDispatchOn() && methods:::isGeneric(n,  ns)) {
genNs <- get(n, envir = ns)@package

Here n is '[', and the get(n, envir = ns) expression returns
.Primitive("["), which is a function and has no @package slot.

This will only occur if exists(n, envir = impenv, inherits = FALSE) returns
TRUE, i.e. if the '[' symbol is already in the imports env of the package.
In my case, the first call to namespaceImportFrom() is for the first import
of testPrimitiveExport1, which runs fine and populate the imports env with
'['.
But for the second call, exists(n, envir = impenv, inherits = FALSE) will
be TRUE, so that the offending line will be called.


I do not know if the problem is on my side, e.g. from a misconfiguration of
the NAMESPACE file, or if it is a bug and in which case what should be done.

Any feedback appreciated.

Karl Forner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unloadNamespace, getPackageName and "Created a package name xxx " warning

2013-10-29 Thread Karl Forner
Dear all,

Consider this code:
>library("data.table")
>unloadNamespace('data.table')

It produces some warnings
Warning in FUN(X[[1L]], ...) :
  Created a package name, ‘2013-10-29 17:05:51’, when none found
Warning in FUN(X[[1L]], ...) :
  Created a package name, ‘2013-10-29 17:05:51’, when none found
...

The warning is produced by the getPackageName() function.
e.g.
getPackageName(parent.env(getNamespace('data.table')))

I was wondering what could be done to get rid of these warnings, which I
believe in the case "unloadNamespace" case are irrelevant.

The stack of calls is:
# where 3: sapply(where, getPackageName)
# where 4: findClass(what, classWhere)
# where 5: .removeSuperclassBackRefs(cl, cldef, searchWhere)
# where 6: methods:::cacheMetaData(ns, FALSE, ns)
# where 7: unloadNamespace(pkgname)

So for instance:
>findClass('data.frame', getNamespace('data.table'))
generates a warning which once again seems irrelevant.

On the top of my head, I could imagine adding an extra argument to
getPackageName, say warning = TRUE, which would be set to FALSE in the
getPackageName call in findClass() body.

I also wonder if in the case of import namespaces, getPackageName() could
not find a more appropriate name:
>parent.env(getNamespace('data.table'))

attr(,"name")
[1] "imports:data.table"

This namespace has a name that might be used to generate the package name.

My question is: what should be done ?

Thanks for your attention.

Karl Forner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] problem using rJava with parallel::mclapply

2013-11-11 Thread Karl Forner
Dear all,

I got an issue trying to parse excel files in parallel using XLConnect, the
process hangs forever.
Martin Studer, the maintainer of XLConnect kindly investigated the issue,
identified rJava as a possible cause of the problem:

This does not work (hangs):
library(parallel)
require(rJava)
.jinit()
res <- mclapply(1:2, function(i) {
  J("java.lang.Runtime")$getRuntime()$gc()
  1
  }, mc.cores = 2)

but this works:
library(parallel)
res <- mclapply(1:2, function(i) {
  require(rJava)
  .jinit()
  J("java.lang.Runtime")$getRuntime()$gc()
  1
}, mc.cores = 2)

To cite Martin, it seems to work with mclapply when the JVM process is
initialized after forking.

Is this a bug or a limitation of rJava ?
Or is there a good practice for rJava clients to avoid this problem ?

Best,
Karl

P.S.
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.0.1

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem using rJava with parallel::mclapply

2013-11-11 Thread Karl Forner
Thanks Malcolm,

But it does seem to solve the problem.




On Mon, Nov 11, 2013 at 6:48 PM, Cook, Malcolm  wrote:

> Karl,
>
> I have the following notes to self that may be pertinent:
>
> options(java.parameters=
>  ## Must preceed `library(XLConnect)` in order to prevent "Java
>  ## requested System.exit(130), closing R." which happens when
>  ## rJava quits R upon trapping INT (control-c), as is done by
>  ## XLConnect (and playwith?), below. (c.f.:
>  ## https://www.rforge.net/bugzilla/show_bug.cgi?id=237)
>  "-Xrs")
>
>
> ~Malcolm
>
>
>
>  >-Original Message-
>  >From: r-devel-boun...@r-project.org [mailto:
> r-devel-boun...@r-project.org] On Behalf Of Karl Forner
>  >Sent: Monday, November 11, 2013 11:41 AM
>  >To: r-devel@r-project.org
>  >Cc: Martin Studer
>  >Subject: [Rd] problem using rJava with parallel::mclapply
>  >
>  >Dear all,
>  >
>  >I got an issue trying to parse excel files in parallel using XLConnect,
> the
>  >process hangs forever.
>  >Martin Studer, the maintainer of XLConnect kindly investigated the issue,
>  >identified rJava as a possible cause of the problem:
>  >
>  >This does not work (hangs):
>  >library(parallel)
>  >require(rJava)
>  >.jinit()
>  >res <- mclapply(1:2, function(i) {
>  >  J("java.lang.Runtime")$getRuntime()$gc()
>  >  1
>  >  }, mc.cores = 2)
>  >
>  >but this works:
>  >library(parallel)
>  >res <- mclapply(1:2, function(i) {
>  >  require(rJava)
>  >  .jinit()
>  >  J("java.lang.Runtime")$getRuntime()$gc()
>  >  1
>  >}, mc.cores = 2)
>  >
>  >To cite Martin, it seems to work with mclapply when the JVM process is
>  >initialized after forking.
>  >
>  >Is this a bug or a limitation of rJava ?
>  >Or is there a good practice for rJava clients to avoid this problem ?
>  >
>  >Best,
>  >Karl
>  >
>  >P.S.
>  >> sessionInfo()
>  >R version 3.0.1 (2013-05-16)
>  >Platform: x86_64-unknown-linux-gnu (64-bit)
>  >
>  >locale:
>  > [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  > [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  > [7] LC_PAPER=C LC_NAME=C
>  > [9] LC_ADDRESS=C   LC_TELEPHONE=C
>  >[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>  >
>  >attached base packages:
>  >[1] stats graphics  grDevices utils datasets  methods   base
>  >
>  >loaded via a namespace (and not attached):
>  >[1] tools_3.0.1
>  >
>  >  [[alternative HTML version deleted]]
>  >
>  >__
>  >R-devel@r-project.org mailing list
>  >https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to catch warnings sent by arguments of s4 methods ?

2013-11-29 Thread Karl Forner
Hello,

I apologized if this had already been addressed, and I also submitted
this problem on SO:
http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection

Example code:
setGeneric('my_method', function(x) standardGeneric('my_method') )
setMethod('my_method', 'ANY', function(x) invisible())

withCallingHandlers(my_method(warning('argh')), warning = function(w)
{ stop('got warning:', w) })
# this does not catch the warning

It seems that the warnings emitted during the evaluation of the
arguments of S4 methods can not get caught using
withCallingHandlers().

Is this expected ? Is there a work-around ?

Best,
Karl Forner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to catch warnings sent by arguments of s4 methods ?

2013-12-02 Thread Karl Forner
Hi,
Just to add some information and to clarify why I feel this is an
important issue.

If you have a S4 method with a default argument, it seems that you can
not catch the warnings
emitted during their evaluation. It matters because on some occasions
those warnings carry an essential information,
that your code needs to use.

Martin Morgan added some information about this issue on:
http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection
Basically the C function R_dispatchGeneric  uses R_tryEvalSilent to
evaluate the method arguments, that seems no to use the calling
handlers.

Best,
Karl


On Fri, Nov 29, 2013 at 11:30 AM, Karl Forner  wrote:
> Hello,
>
> I apologized if this had already been addressed, and I also submitted
> this problem on SO:
> http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection
>
> Example code:
> setGeneric('my_method', function(x) standardGeneric('my_method') )
> setMethod('my_method', 'ANY', function(x) invisible())
>
> withCallingHandlers(my_method(warning('argh')), warning = function(w)
> { stop('got warning:', w) })
> # this does not catch the warning
>
> It seems that the warnings emitted during the evaluation of the
> arguments of S4 methods can not get caught using
> withCallingHandlers().
>
> Is this expected ? Is there a work-around ?
>
> Best,
> Karl Forner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Status of reserved keywords and builtins

2013-12-12 Thread Karl Millar
According to 
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Reserved-words

  if else repeat while function for in next break
  TRUE FALSE NULL Inf NaN
  NA NA_integer_ NA_real_ NA_complex_ NA_character_
  ... ..1 ..2 etc.

are all reserved keywords.


However, in R 3.0.2 you can do things like:
   `if` <- function(cond, val1, val2) val2
after which
   if(TRUE) 1 else 2
returns 2.

Similarly, users can change the implementation of `<-`, `(`, `{`, `||` and `&&`.


Two questions:
  - Is this intended behaviour?

  - If so, would it be a good idea to change the language definition
to prevent this?  Doing so would both have the benefits that users
could count on keywords having their normal interpretation, and allow
R implementations to implement these more efficiently, including not
having to lookup the symbol each time.  It'd break any code that
assumes that this is valid, but hopefully there's little or no code
that does.

Thanks

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [PATCH] Code coverage support proof of concept

2014-03-05 Thread Karl Forner
Hello,

I submit a patch for review that implements code coverage tracing in
the R interpreter.
It records the lines that are actually executed and their associated
frequency for which srcref information is available.

I perfectly understands that this patch will not make its way inside R
as it is, that they are many concerns of stability, compatibility,
maintenance and so on.
I would like to have the code reviewed, and proper guidance on how to
get this feature available at one point in R, in base R or as a
package or patch if other people are interested.

Usage

Rcov_start()
# your code to trace here
res <- Rcov_stop()

res is currently a hashed env, with traced source filenames associated
with 2-columns matrices holding the line numbers and their
frequencies.


How it works
-
I added a test in getSrcref(), that records the line numbers if code
coverage is started.
The overhead should be minimal since for a given file, subsequent
covered lines will be stored
in constant time. I use a hased env to store the occurrences by file.

I added two entry points in the utils package (Rcov_start() and Rcov_stop())


Example
-
* untar the latest R-devel and cd into it
* patch -p1 < rdev-cov-patch.txt
* ./configure [... ] && make && [sudo] make install
* install the devtools package
* run the following script using Rscript

library(methods)
library(devtools)
pkg  <- download.packages('testthat', '.', repos = "http://stat.ethz.ch/CRAN";)
untar(pkg[1, 2])

Rcov_start()
test('testthat')
env <- Rcov_stop()

res <- lapply(ls(env), get, envir = env)
names(res) <- ls(env)
print(res)


This will hopefully output something like:
$`.../testthat/R/auto-test.r`
 [,1] [,2]
[1,]   331
[2,]   801

$`.../testthat/R/colour-text.r`
  [,1] [,2]
 [1,]   181
 [2,]   19  106
 [3,]   20  106
 [4,]   22  106
 [5,]   23  106
 [6,]   401
 [7,]   591
 [8,]   701
 [9,]   71  106
...


Karl Forner


Disclaimer
-
There are probably bugs  and ugly statements, but this is just a proof
of concept. This is untested and only run on a linux x86_64
diff -ruN R-devel/src/library/utils/man/Rcov_start.Rd 
R-devel-cov/src/library/utils/man/Rcov_start.Rd
--- R-devel/src/library/utils/man/Rcov_start.Rd 1970-01-01 01:00:00.0 
+0100
+++ R-devel-cov/src/library/utils/man/Rcov_start.Rd 2014-03-05 
16:07:45.907596276 +0100
@@ -0,0 +1,26 @@
+% File src/library/utils/man/Rcov_start.Rd
+% Part of the R package, http://www.R-project.org
+% Copyright 1995-2010 R Core Team
+% Distributed under GPL 2 or later
+
+\name{Rcov_start}
+\alias{Rcov_start}
+\title{Start Code Coverage analysis of R's Execution}
+\description{
+  Start Code Coverage analysis of the execution of \R expressions.
+}
+\usage{
+Rcov_start(nb_lines = 1L, growth_rate = 2)
+}
+\arguments{
+  \item{nb_lines}{
+Initial max number of lines per source file. 
+  }
+  \item{growth_rate}{
+growth factor of the line numbers vectors per filename. 
+If a reached line number L is greater than  nb_lines, the vector will
+be reallocated with provisional size of growth_rate * L. 
+  }
+}
+
+\keyword{utilities}
diff -ruN R-devel/src/library/utils/man/Rcov_stop.Rd 
R-devel-cov/src/library/utils/man/Rcov_stop.Rd
--- R-devel/src/library/utils/man/Rcov_stop.Rd  1970-01-01 01:00:00.0 
+0100
+++ R-devel-cov/src/library/utils/man/Rcov_stop.Rd  2014-03-03 
16:14:25.883440716 +0100
@@ -0,0 +1,20 @@
+% File src/library/utils/man/Rcov_stop.Rd
+% Part of the R package, http://www.R-project.org
+% Copyright 1995-2010 R Core Team
+% Distributed under GPL 2 or later
+
+\name{Rcov_stop}
+\alias{Rcov_stop}
+\title{Start Code Coverage analysis of R's Execution}
+\description{
+  Start Code Coverage analysis of the execution of \R expressions.
+}
+\usage{
+Rcov_stop()
+}
+
+\value{
+  a named list of integer vectors holding occurrences counts (line number, 
frequency)
+  , named after the covered source file names. 
+}
+\keyword{utilities}
diff -ruN R-devel/src/library/utils/NAMESPACE 
R-devel-cov/src/library/utils/NAMESPACE
--- R-devel/src/library/utils/NAMESPACE 2013-09-10 03:04:59.0 +0200
+++ R-devel-cov/src/library/utils/NAMESPACE 2014-03-03 16:18:48.407430952 
+0100
@@ -1,7 +1,7 @@
 # Refer to all C routines by their name prefixed by C_
 useDynLib(utils, .registration = TRUE, .fixes = "C_")
 
-export("?", .DollarNames, CRAN.packages, Rprof, Rprofmem, RShowDoc,
+export("?", .DollarNames, CRAN.packages, Rcov_start, Rcov_stop, Rprof, 
Rprofmem, RShowDoc,
RSiteSearch, URLdecode, URLencode, View, adist, alarm, apropos,
aregexec, argsAnywhere, assignInMyNamespace, assignInNamespace,
as.roman, as.person, as.personList, as.relistable, aspell,
diff -ruN R-devel/src/library/utils/R/Rcov.R 
R-devel-cov/src/library/utils/R/Rcov.R
--- R-devel/src/library/utils/R/Rcov.R 

Re: [Rd] [PATCH] Code coverage support proof of concept

2014-03-07 Thread Karl Forner
Here's an updated version of the patch that fixes a stack imbalance bug.
N.B: the patch seems to work fine with R-3.0.2 too.

On Wed, Mar 5, 2014 at 5:16 PM, Karl Forner  wrote:
> Hello,
>
> I submit a patch for review that implements code coverage tracing in
> the R interpreter.
> It records the lines that are actually executed and their associated
> frequency for which srcref information is available.
>
> I perfectly understands that this patch will not make its way inside R
> as it is, that they are many concerns of stability, compatibility,
> maintenance and so on.
> I would like to have the code reviewed, and proper guidance on how to
> get this feature available at one point in R, in base R or as a
> package or patch if other people are interested.
>
> Usage
> 
> Rcov_start()
> # your code to trace here
> res <- Rcov_stop()
>
> res is currently a hashed env, with traced source filenames associated
> with 2-columns matrices holding the line numbers and their
> frequencies.
>
>
> How it works
> -
> I added a test in getSrcref(), that records the line numbers if code
> coverage is started.
> The overhead should be minimal since for a given file, subsequent
> covered lines will be stored
> in constant time. I use a hased env to store the occurrences by file.
>
> I added two entry points in the utils package (Rcov_start() and Rcov_stop())
>
>
> Example
> -
> * untar the latest R-devel and cd into it
> * patch -p1 < rdev-cov-patch.txt
> * ./configure [... ] && make && [sudo] make install
> * install the devtools package
> * run the following script using Rscript
>
> library(methods)
> library(devtools)
> pkg  <- download.packages('testthat', '.', repos = "http://stat.ethz.ch/CRAN";)
> untar(pkg[1, 2])
>
> Rcov_start()
> test('testthat')
> env <- Rcov_stop()
>
> res <- lapply(ls(env), get, envir = env)
> names(res) <- ls(env)
> print(res)
>
>
> This will hopefully output something like:
> $`.../testthat/R/auto-test.r`
>  [,1] [,2]
> [1,]   331
> [2,]   801
>
> $`.../testthat/R/colour-text.r`
>   [,1] [,2]
>  [1,]   181
>  [2,]   19  106
>  [3,]   20  106
>  [4,]   22  106
>  [5,]   23  106
>  [6,]   401
>  [7,]   591
>  [8,]   701
>  [9,]   71  106
> ...
>
>
> Karl Forner
>
>
> Disclaimer
> -
> There are probably bugs  and ugly statements, but this is just a proof
> of concept. This is untested and only run on a linux x86_64
diff -urN -x '.*' R-devel/src/library/utils/man/Rcov_start.Rd 
R-develcov/src/library/utils/man/Rcov_start.Rd
--- R-devel/src/library/utils/man/Rcov_start.Rd 1970-01-01 01:00:00.0 
+0100
+++ R-develcov/src/library/utils/man/Rcov_start.Rd  2014-03-07 
18:41:33.117646470 +0100
@@ -0,0 +1,26 @@
+% File src/library/utils/man/Rcov_start.Rd
+% Part of the R package, http://www.R-project.org
+% Copyright 1995-2010 R Core Team
+% Distributed under GPL 2 or later
+
+\name{Rcov_start}
+\alias{Rcov_start}
+\title{Start Code Coverage analysis of R's Execution}
+\description{
+  Start Code Coverage analysis of the execution of \R expressions.
+}
+\usage{
+Rcov_start(nb_lines = 1L, growth_rate = 2)
+}
+\arguments{
+  \item{nb_lines}{
+Initial max number of lines per source file. 
+  }
+  \item{growth_rate}{
+growth factor of the line numbers vectors per filename. 
+If a reached line number L is greater than  nb_lines, the vector will
+be reallocated with provisional size of growth_rate * L. 
+  }
+}
+
+\keyword{utilities}
diff -urN -x '.*' R-devel/src/library/utils/man/Rcov_stop.Rd 
R-develcov/src/library/utils/man/Rcov_stop.Rd
--- R-devel/src/library/utils/man/Rcov_stop.Rd  1970-01-01 01:00:00.0 
+0100
+++ R-develcov/src/library/utils/man/Rcov_stop.Rd   2014-03-07 
18:41:33.117646470 +0100
@@ -0,0 +1,20 @@
+% File src/library/utils/man/Rcov_stop.Rd
+% Part of the R package, http://www.R-project.org
+% Copyright 1995-2010 R Core Team
+% Distributed under GPL 2 or later
+
+\name{Rcov_stop}
+\alias{Rcov_stop}
+\title{Start Code Coverage analysis of R's Execution}
+\description{
+  Start Code Coverage analysis of the execution of \R expressions.
+}
+\usage{
+Rcov_stop()
+}
+
+\value{
+  a named list of integer vectors holding occurrences counts (line number, 
frequency)
+  , named after the covered source file names. 
+}
+\keyword{utilities}
diff -urN -x '.*' R-devel/src/library/utils/NAMESPACE 
R-develcov/src/library/utils/NAMESPACE
--- R-devel/src/library/utils/NAMESPACE 2013-09-10 03:04:59.0 +0200
+++ R-develcov/src/library/utils/NAMESPACE  2014-03-07 18:41:33.121646470 
+0100
@@ -1,7 +1,7 @@
 # 

Re: [Rd] [RFC] A case for freezing CRAN

2014-03-19 Thread Karl Millar
I think what you really want here is the ability to easily identify
and sync to CRAN snapshots.

The easy way to do this is setup a CRAN mirror, but back it up with
version control, so that it's easy to reproduce the exact state of
CRAN at any given point in time.  CRAN's not particularly large and
doesn't churn a whole lot, so most version control systems should be
able to handle that without difficulty.

Using svn, mod_dav_svn and (maybe) mod_rewrite, you could setup the
server so that e.g.:
   http://my.cran.mirror/repos/2013-01-01/
is a mirror of how CRAN looked at midnight 2013-01-01.

Users can then set their repository to that URL, and will have a
stable snapshot to work with, and can have all their packages built
with that snapshot if they like.  For reproducibility purposes, all
users need to do is to agree on the same date to use.  For publication
purposes, the date of the snapshot should be sufficient.

We'd need a version of update.packages() that force-syncs all the
packages to the version in the repository, even if they're downgrades,
but otherwise it ought to be fairly straight-forward.

FWIW, we do something similar internally at Google.  All the packages
that a user has installed come from the same source control revision,
where we know that all the package versions are mutually compatible.
It saves a lot of headaches, and users can rollback to any previous
point in time easily if they run into problems.


On Wed, Mar 19, 2014 at 7:45 PM, Jeroen Ooms  wrote:
> On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt
>  wrote:
>> Reading this thread again, is it a fair summary of your position to say 
>> "reproducibility by default is more important than giving users access to 
>> the newest bug fixes and features by default?" It's certainly arguable, but 
>> I'm not sure I'm convinced: I'd imagine that the ratio of new work being 
>> done vs reproductions is rather high and the current setup optimizes for 
>> that already.
>
> I think that separating development from released branches can give us
> both reliability/reproducibility (stable branch) as well as new
> features (unstable branch). The user gets to pick (and you can pick
> both!). The same is true for r-base: when using a 'released' version
> you get 'stable' base packages that are up to 12 months old. If you
> want to have the latest stuff you download a nightly build of r-devel.
> For regular users and reproducible research it is recommended to use
> the stable branch. However if you are a developer (e.g. package
> author) you might want to develop/test/check your work with the latest
> r-devel.
>
> I think that extending the R release cycle to CRAN would result both
> in more stable released versions of R, as well as more freedom for
> package authors to implement rigorous change in the unstable branch.
> When writing a script that is part of a production pipeline, or sweave
> paper that should be reproducible 10 years from now, or a book on
> using R, you use stable version of R, which is guaranteed to behave
> the same over time. However when developing packages that should be
> compatible with the upcoming release of R, you use r-devel which has
> the latest versions of other CRAN and base packages.
>
>
>> What I'm trying to figure out is why the standard "install the following 
>> list of package versions" isn't good enough in your eyes?
>
> Almost nobody does this because it is cumbersome and impractical. We
> can do so much better than this. Note that in order to install old
> packages you also need to investigate which versions of dependencies
> of those packages were used. On win/osx, users need to manually build
> those packages which can be a pain. All in all it makes reproducible
> research difficult and expensive and error prone. At the end of the
> day most published results obtain with R just won't be reproducible.
>
> Also I believe that keeping it simple is essential for solutions to be
> practical. If every script has to be run inside an environment with
> custom libraries, it takes away much of its power. Running a bash or
> python script in Linux is so easy and reliable that entire
> distributions are based on it. I don't understand why we make our
> lives so difficult in R.
>
> In my estimation, a system where stable versions of R pull packages
> from a stable branch of CRAN will naturally resolve the majority of
> the reproducibility and reliability problems with R. And in contrast
> to what some people here are suggesting it does not introduce any
> limitations. If you want to get the latest stuff, you either grab a
> copy of r-devel, or just enable the testing branch and off you go.
> Debian 'testing' works in a similar way, see
> http://www.debian.org/devel/testing.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/l

Re: [Rd] The case for freezing CRAN

2014-03-20 Thread Karl Millar
Given the version / dated snapshots of CRAN, and an agreement that
reproducibility is the responsibility of the study author, the author
simply needs to sync all their packages to a chosen date, run the analysis
and publish the chosen date.  It is true that this doesn't include
compilers, OS, system packages etc, but in my experience those are
significantly more stable than CRAN packages.


Also, my previous description of how to serve up a dated CRAN was way too
complicated.  Since most of the files on CRAN never change, they don't need
version control.  Only the metadata about which versions are current really
needs to be tracked, and that's small enough that it could be stored in
static files.




On Thu, Mar 20, 2014 at 6:32 AM, Dirk Eddelbuettel  wrote:

>
> No attempt to summarize the thread, but a few highlighted points:
>
>  o Karl's suggestion of versioned / dated access to the repo by adding a
>layer to webaccess is (as usual) nice.  It works on the 'supply' side.
> But
>Jeroen's problem is on the demand side.  Even when we know that an
>analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it
> only
>gives us a 'ceiling' estimate of what was on the machine.  In production
>or lab environments, installations get stale.  Maybe packages were
> already
>a year old?  To me, this is an issue that needs to be addressed on the
>'demand' side of the user. But just writing out version numbers is not
>good enough.
>
>  o Roger correctly notes that R scripts and packages are just one issue.
>Compilers, libraries and the OS matter.  To me, the natural approach
> these
>days would be to think of something based on Docker or Vagrant or (if
> you
>must, VirtualBox).  The newer alternatives make snapshotting very cheap
>(eg by using Linux LXC).  That approach reproduces a full environemnt as
>best as we can while still ignoring the hardware layer (and some readers
>may recall the infamous Pentium bug of two decades ago).
>
>  o Reproduciblity will probably remain the responsibility of study
>authors. If an investigator on a mega-grant wants to (or needs to)
> freeze,
>they do have the tools now.  Requiring the need of a few to push work on
>those already overloaded (ie CRAN) and changing the workflow of
> everybody
>is a non-starter.
>
>  o As Terry noted, Jeroen made some strong claims about exactly how flawed
>the existing system is and keeps coming back to the example of 'a JSS
>paper that cannot be re-run'.  I would really like to see empirics on
>this.  Studies of reproducibility appear to be publishable these days,
> so
>maybe some enterprising grad student wants to run with the idea of
>actually _testing_ this.  We maybe be above Terry's 0/30 and nearer to
>Kevin's 'low'/30.  But let's bring some data to the debate.
>
>  o Overall, I would tend to think that our CRAN standards of releasing with
>tests, examples, and checks on every build and release already do a much
>better job of keeping things tidy and workable than in most if not all
>other related / similar open source projects. I would of course welcome
>contradictory examples.
>
> Dirk
>
> --
> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: [RFC] A case for freezing CRAN

2014-03-21 Thread Karl Forner
Interesting and strategic topic indeed.

One other point is that reproducibility (and backwards compatibility) is
also very important in the industry. To get acceptance it can really help
if you can easily reproduce results.

Concerning the arguments that I read in this discussion:

- "do it yourself"
The point is to discuss to find the best way for the community, and
thinking collectively about this general problems can never hurt.
Once a consensus is reached we can think about the resources.

- "don't think the effort is worth it, instead install a specific version
of package" + "new sessionInfoPlus()":
This could work, meaning achieving the same result, but not at the same
price for users, because it would require each script writer to include its
sessionInfo(),  to store them along the scripts in repositories. And prior
to running the scripts, you would have to install the snapshot of packages,
not mentioning install problems and so on.

- "versions automatically at package build time (n DESCRIPTION)":
does not really solve the problems, because if package A is submitted with
dependency B-1.0 and package C with dependency B-2 and do you do ?

- "exact deps versions":
will put a lot of burden of the developer.

- "I do not want to wait a year to get a new (or updated package)", "access
to bug fixes":

Installed packages are already setup as libraries. By default you have the
library inside the R installation, that contains base packages + those
installed by install.packages() if you have the proper permissions, the
personal library otherwise.
Why not organizing these libraries so that:
  - normal CRAN versions associated with the R version gets installed along
the base packages
  - "critical updates", meaning important bugs found in normal CRAN
versions installed in the critical/ library
  - additional packages and updated package in another library.
This way, using the existing .libPaths() mechanism, or equivalently the
lib.loc option of library, one could easily switch between the library that
will ensure full compatibility and reproducibility with the R version, or
add critical updates, or use the newer or updated packages.

- new use case.
Here in Quartz bio we have two architectures, so two R installations for
each R version. It is quite cumbersome to keep them consistent because the
installed version depends on the moment you perform the install.packages().

So I second the Jeroen proposal to have a snapshot of packages versions
tied to a given R version, well tested altogether. This implies as stated
by Herve  to keep all package source versions, and will solve the bioC
reproducibility issue.

Best,
Karl Forner








On Tue, Mar 18, 2014 at 9:24 PM, Jeroen Ooms wrote:

> This came up again recently with an irreproducible paper. Below an
> attempt to make a case for extending the r-devel/r-release cycle to
> CRAN packages. These suggestions are not in any way intended as
> criticism on anyone or the status quo.
>
> The proposal described in [1] is to freeze a snapshot of CRAN along
> with every release of R. In this design, updates for contributed
> packages treated the same as updates for base packages in the sense
> that they are only published to the r-devel branch of CRAN and do not
> affect users of "released" versions of R. Thereby all users, stacks
> and applications using a particular version of R will by default be
> using the identical version of each CRAN package. The bioconductor
> project uses similar policies.
>
> This system has several important advantages:
>
> ## Reproducibility
>
> Currently r/sweave/knitr scripts are unstable because of ambiguity
> introduced by constantly changing cran packages. This causes scripts
> to break or change behavior when upstream packages are updated, which
> makes reproducing old results extremely difficult.
>
> A common counter-argument is that script authors should document
> package versions used in the script using sessionInfo(). However even
> if authors would manually do this, reconstructing the author's
> environment from this information is cumbersome and often nearly
> impossible, because binary packages might no longer be available,
> dependency conflicts, etc. See [1] for a worked example. In practice,
> the current system causes many results or documents generated with R
> no to be reproducible, sometimes already after a few months.
>
> In a system where contributed packages inherit the r-base release
> cycle, scripts will behave the same across users/systems/time within a
> given version of R. This severely reduces ambiguity of R behavior, and
> has the potential of making reproducibility a natural part of the
> language, rather than a tedious exercise.
>
> ## Repository Management
>
> Just like scripts suffer

Re: [Rd] Fwd: [RFC] A case for freezing CRAN

2014-03-21 Thread Karl Forner
> On Fri, Mar 21, 2014 at 12:08 PM, Karl Forner wrote:
> [...]
>
> - "exact deps versions":
>> will put a lot of burden of the developer.
>>
>
> Not really, in my opinion, if you have the proper tools. Most likely when
> you develop any given version of your package you'll use certain versions
> of other packages, probably the most recent at that time.
>
> If there is a build tool that just puts these version numbers into the
> DESCRIPTION file, you don't need to do anything extra.
>

I of course assumed that this part was automatic.



>
> In fact, it is easier for the developer, because if you work on your
> release for a month, at the end you don't have to make sure that your
> package works with packages that were updated in the meanwhile.
>

Hmm, what if your package depends on packages A and B, and that A depends
on C v1.0 and B depends on C v1.1 ? This is just an example but I imagine
that will lead to a lot of complexities.



>
> Gabor
>
> [...]
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: [RFC] A case for freezing CRAN

2014-03-21 Thread Karl Forner
On Fri, Mar 21, 2014 at 6:27 PM, Gábor Csárdi wrote:

> On Fri, Mar 21, 2014 at 12:40 PM, Karl Forner wrote:
> [...]
>
>> Hmm, what if your package depends on packages A and B, and that A depends
>> on C v1.0 and B depends on C v1.1 ? This is just an example but I imagine
>> that will lead to a lot of complexities.
>>
>
> You'll have to be able to load (but not attach, of course!) multiple
> versions of the same package at the same time. The search paths are set up
> so that A imports v1.0 of C, B imports v1.1. This is possible to support
> with R's namespaces and imports mechanisms, I believe.
>

not really: I think there are still cases (unfortunately) where you have to
use depends, e.g. when defining S4 methods for classes implemented in other
packages.
But my point is that you would need really really smart tools, AND to be
able to install precise versions of packages.



> It requires quite some work, though, so I am obviously not saying to
> switch to it tomorrow. Having a CRAN-devel seems simpler.
>

Indeed.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Rjulia: a package for R call Julia through Julia C API

2014-06-06 Thread Karl Forner
Excellent.
By any chance are you aware of a julia way to perform the opposite: call R
from julia ?
Thanks


On Fri, Jun 6, 2014 at 7:23 AM, Yu Gong  wrote:

> hello everyone,recently I write a package for R call Julia through Julia C
> API
> https://github.com/armgong/RJulia
> now the package can do
> 1 finish basic typemapping, now int boolean double R vector to julia
> 1d-array is ok,and julia int32 int64 float64 bool 1D array to R vector is
> also ok.
> 2 R STRSXP to julia string 1D array and Julia string array to STRSXP is
> written but not sure it is correct or not?
> 3 Can Set Julia gc disable at initJulia.
> to build Rjulia need git master branch julia and R.
> the package now only finish very basic function, need more work to finish.
> so any comments and advice is welcome.
> currently it can be use on unix and windows console,on windows gui it
> crashed.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] regression bug with getParseData and/or parse in R-3.1.0

2014-06-12 Thread Karl Forner
Hi,

With R-3.1.0 I get:
> getParseData(parse(text = "{1}", keep.source = TRUE))
  line1 col1 line2 col2 id parent token terminal text
7 11 13  7  9  exprFALSE
1 11 11  1  7   '{' TRUE{
2 12 12  2  3 NUM_CONST TRUE1
3 12 12  3  5  exprFALSE
4 13 13  4  7   '}' TRUE}

Which has two problems:
1) the parent of the first expression (id=7) should be 0
2) the parent of the expression with id=3 should be 7

For reference, with R-3.0.2:

> getParseData(parse(text = "{1}", keep.source = TRUE))
  line1 col1 line2 col2 id parent token terminal text
7 11 13  7  0  exprFALSE
1 11 11  1  7   '{' TRUE{
2 12 12  2  3 NUM_CONST TRUE1
3 12 12  3  7  exprFALSE
4 13 13  4  7   '}' TRUE}

which is correct.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] regression bug with getParseData and/or parse in R-3.1.0

2014-06-12 Thread Karl Forner
Thank you Duncan.

I confirm:

R version 3.1.0 Patched (2014-06-11 r65921) -- "Spring Dance"


> getParseData(parse(text = "{1}", keep.source = TRUE))
  line1 col1 line2 col2 id parent token terminal text
7 11 13  7  0  exprFALSE
1 11 11  1  7   '{' TRUE{
2 12 12  2  3 NUM_CONST TRUE1
3 12 12  3  7  exprFALSE
4 13 13  4  7   '}' TRUE}

Karl


On Thu, Jun 12, 2014 at 2:39 PM, Duncan Murdoch 
wrote:

> On 12/06/2014, 7:37 AM, Karl Forner wrote:
> > Hi,
> >
> > With R-3.1.0 I get:
> >> getParseData(parse(text = "{1}", keep.source = TRUE))
> >   line1 col1 line2 col2 id parent token terminal text
> > 7 11 13  7  9  exprFALSE
> > 1 11 11  1  7   '{' TRUE{
> > 2 12 12  2  3 NUM_CONST TRUE1
> > 3 12 12  3  5  exprFALSE
> > 4 13 13  4  7   '}' TRUE}
> >
> > Which has two problems:
> > 1) the parent of the first expression (id=7) should be 0
> > 2) the parent of the expression with id=3 should be 7
>
> I believe this has been fixed in R-patched.  Could you please check?
>
> The problem was due to an overly aggressive optimization introduced in
> R-devel in June, 2013.  It assumed a vector was initialized to zeros,
> but in some fairly common circumstances it wasn't, so the parent
> calculation was wrong.
>
> Luckily 3.1.1 has been delayed by incompatible schedules of various
> people, or this fix might have missed that too.  As with some other
> fixes in R-patched, this is a case of a bug that sat there for most of a
> year before being reported.  Please people, test pre-release versions.
>
> Duncan Murdoch
>
>
> >
> > For reference, with R-3.0.2:
> >
> >> getParseData(parse(text = "{1}", keep.source = TRUE))
> >   line1 col1 line2 col2 id parent token terminal text
> > 7 11 13  7  0  exprFALSE
> > 1 11 11  1  7   '{' TRUE{
> > 2 12 12  2  3 NUM_CONST TRUE1
> > 3 12 12  3  7  exprFALSE
> > 4 13 13  4  7   '}' TRUE}
> >
> > which is correct.
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] isOpen() misbehaviour

2014-06-19 Thread Karl Forner
Hello,

>From the doc, it says:
 "isOpen returns a logical value, whether the connection is currently open."

But actually it seems to die on closed connections:
> con <- file()
> isOpen(con)
[1] TRUE
> close(con)
> isOpen(con)
Error in isOpen(con) : invalid connection

Is it expected ?
Tested on R-3.0.2 and R version 3.1.0 Patched (2014-06-11 r65921) on
linux x86_64

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] isOpen() misbehaviour

2014-06-19 Thread Karl Forner
Thanks Joris, it makes sense now, though the doc is a bit misleading.

On Thu, Jun 19, 2014 at 3:22 PM, Joris Meys  wrote:
> Hi Karl,
>
> that is expected. The moment you close a connection, it is destroyed as well
> (see ?close). A destroyed connection cannot be tested. In fact, I've used
> isOpen() only in combination with the argument rw.
>
>> con <- file("clipboard",open="r")
>> isOpen(con,"write")
> [1] FALSE
>
> cheers
>
>
> On Thu, Jun 19, 2014 at 3:10 PM, Karl Forner  wrote:
>>
>> Hello,
>>
>> >From the doc, it says:
>>  "isOpen returns a logical value, whether the connection is currently
>> open."
>>
>> But actually it seems to die on closed connections:
>> > con <- file()
>> > isOpen(con)
>> [1] TRUE
>> > close(con)
>> > isOpen(con)
>> Error in isOpen(con) : invalid connection
>>
>> Is it expected ?
>> Tested on R-3.0.2 and R version 3.1.0 Patched (2014-06-11 r65921) on
>> linux x86_64
>>
>> Karl
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Patch for R to fix some buffer overruns and add a missing PROTECT().

2014-09-23 Thread Karl Millar
This patch is against current svn and contains three classes of fix:
   - Ensure the result is properly terminated after calls to strncpy()
   - Replace calls of sprintf() with snprintf()
   - Added a PROTECT() call in do_while which could cause memory
errors if evaluating the condition results in a warning.

Thanks,

Karl
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Patch for R to fix some buffer overruns and add a missing PROTECT().

2014-09-23 Thread Karl Millar
Bug submitted.  Thanks.

On Tue, Sep 23, 2014 at 12:42 PM, Duncan Murdoch
 wrote:
> On 23/09/2014 3:20 PM, Karl Millar wrote:
>>
>> This patch is against current svn and contains three classes of fix:
>> - Ensure the result is properly terminated after calls to strncpy()
>> - Replace calls of sprintf() with snprintf()
>> - Added a PROTECT() call in do_while which could cause memory
>> errors if evaluating the condition results in a warning.
>
>
> Nothing was attached.
>
> Generally fixes like this are best sent to bugs.r-project.org, and they
> receive highest priority if accompanied by code demonstrating why they are
> needed, i.e. crashes or incorrect results in current R.  Those will likely
> be incorporated as regression tests.
>
> Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Making parent.env<- an error for package namespaces and package imports

2014-10-16 Thread Karl Millar
I'd like to propose a change to the R language so that calling
'parent.env<-' on a package namespace or package imports is a runtime
error.

Currently the documentation warns that it's dangerous behaviour and
might go away:
 The replacement function ‘parent.env<-’ is extremely dangerous as
 it can be used to destructively change environments in ways that
 violate assumptions made by the internal C code.  It may be
 removed in the near future.

This change would both eliminate some potential dangerous behaviours,
and make it significantly easier for runtime compilation systems to
optimize symbol lookups for code in packages.

The following patch against current svn implements this functionality.
It allows calls to 'parent.env<-' only until the namespace is locked,
allowing the namespace to be built correctly while preventing user
code from subsequently messing with it.

I'd also like to make calling parent.env<- on an environment on the
call stack an error, for the same reasons, but it's not so obvious to
me how to implement that efficiently right now.  Could we at least
document that as being 'undefined behaviour'?

Thanks,

Karl


Index: src/main/builtin.c
===
--- src/main/builtin.c (revision 66783)
+++ src/main/builtin.c (working copy)
@@ -356,6 +356,24 @@
 return( ENCLOS(arg) );
 }

+static Rboolean R_IsImportsEnv(SEXP env)
+{
+if (isNull(env) || !isEnvironment(env))
+return FALSE;
+if (ENCLOS(env) != R_BaseNamespace)
+return FALSE;
+SEXP name = getAttrib(env, R_NameSymbol);
+if (!isString(name) || length(name) != 1)
+return FALSE;
+
+const char *imports_prefix = "imports:";
+const char *name_string = CHAR(STRING_ELT(name, 0));
+if (!strncmp(name_string, imports_prefix, strlen(imports_prefix)))
+return TRUE;
+else
+return FALSE;
+}
+
 SEXP attribute_hidden do_parentenvgets(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
 SEXP env, parent;
@@ -371,6 +389,10 @@
  error(_("argument is not an environment"));
 if( env == R_EmptyEnv )
  error(_("can not set parent of the empty environment"));
+if (R_EnvironmentIsLocked(env) && R_IsNamespaceEnv(env))
+  error(_("can not set the parent environment of a namespace"));
+if (R_EnvironmentIsLocked(env) && R_IsImportsEnv(env))
+  error(_("can not set the parent environment of package imports"));
 parent = CADR(args);
 if (isNull(parent)) {
  error(_("use of NULL environment is defunct"));


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] MAX_NUM_DLLS too low ?

2015-05-08 Thread Karl Forner
Hello,

My problem is that I hit the hard-coded MAX_NUM_DLLS (100) limit of the
number of loaded DLLs.
I have a number of custom packages which interface and integrate a lot of
CRAN and Bioconductor packages.

For example, on my installation:
 Rscript -e 'library(crlmm);print(length(getLoadedDLLs()))'
gives 28 loaded DLLs.

I am currently trying to work-around that by putting external packages in
Suggests: instead of Imports:, and lazy-load them, but still I am wondering
if that threshold value of 100 is still relevant nowadays, or would it be
possible to increase it.

Thanks,

Karl Forner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] .Call in R

2011-11-18 Thread Karl Forner
Hi,

A probably very naive remark, but I believe that the probability of sum(
runif(1) ) >= 5 is exactly 0.5. So why not just test that, and
generate the uniform values only if needed ?


Karl Forner

On Thu, Nov 17, 2011 at 6:09 PM, Raymond  wrote:

> Hi R developers,
>
>I am new to this forum and hope someone can help me with .Call in R.
> Greatly appreciate any help!
>
>Say, I have a vector called "vecA" of length 1, I generate a vector
> called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA
> and vecR are of double type. I want to replace elements vecA by elements in
> vecR only if sum of elements in vecR is greater than or equal to 5000.
> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
>vecA<-something;
>vecR<-runif(1);
>if (sum(vecR)>=5000)){
>   vecA<-vecR;
>}
>
>
>Now my question is, if I am going to do the same thing in R using .Call.
> How can I achieve it in a more efficient way (i.e. less computation time
> compared with pure R code above.).  My c code (called "change_vecA.c")
> using
> .Call is like this:
>
>SEXP change_vecA(SEXP vecA){
> int i,vecA_len;
> double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
>
> vecA_ptr=REAL(vecA);
> vecA_len=length(vecA);
> SEXP res_vec,vecR;
>
> PROTECT(res_vec=allocVector(REALSXP, vec_len));
> PROTECT(vecR=allocVector(REALSXP, vec_len));
> res_ptr=REAL(res_vec);
> vecR_ptr=REAL(vecR);
> GetRNGstate();
> sum=0.0;
> for (i=0;i  vecR_ptr[i]=runif(0,1);
>  sum+=vecR_ptr[i];
> }
> if (sum>=5000){
>/*copy vecR to the vector to be returned*/
>for (i=0;i  res_ptr[i]=vecR_ptr[i];
>}
> }
> else{
>/*copy vecA to the vector to be returned*/
>for (i=0;i  res_ptr[i]=vecA_ptr[i];
>}
> }
>
> PutRNGstate();
> UNPROTECT(2);
> resturn(res);
> }
> My R wrapper function is
>change_vecA<-function(vecA){
>  dyn.load("change_vecA.so");
>  .Call("change_vecA",vecA);
>}
>
> Now my question is, due to two loops (one generates the random
> vector and one determines the vector to be returned), can .Call still be
> faster than pure R code (only one loop to copy vecR to vecA given condition
> is met)? Or, how can I improve my c code to avoid redundant loops if any.
> My
> concern is if vecA is large (say of length 100 or even bigger), loops
> in
> C code can slow things down.  Thanks for any help!
>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
> Sent from the R devel mailing list archive at Nabble.com.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] .Call in R

2011-11-18 Thread Karl Forner
Yes indeed. My mistake.

On Fri, Nov 18, 2011 at 4:45 PM, Joris Meys  wrote:

> Because if you calculate the probability and then make uniform values,
> nothing guarantees that the sum of those uniform values actually is
> larger than 50,000. You only have 50% chance it is, in fact...
> Cheers
> Joris
>
> On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner 
> wrote:
> > Hi,
> >
> > A probably very naive remark, but I believe that the probability of sum(
> > runif(1) ) >= 5 is exactly 0.5. So why not just test that, and
> > generate the uniform values only if needed ?
> >
> >
> > Karl Forner
> >
> > On Thu, Nov 17, 2011 at 6:09 PM, Raymond 
> wrote:
> >
> >> Hi R developers,
> >>
> >>I am new to this forum and hope someone can help me with .Call in R.
> >> Greatly appreciate any help!
> >>
> >>Say, I have a vector called "vecA" of length 1, I generate a
> vector
> >> called "vecR" with elements randomly generated from Uniform[0,1]. Both
> vecA
> >> and vecR are of double type. I want to replace elements vecA by
> elements in
> >> vecR only if sum of elements in vecR is greater than or equal to 5000.
> >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads
> >>vecA<-something;
> >>vecR<-runif(1);
> >>if (sum(vecR)>=5000)){
> >>   vecA<-vecR;
> >>}
> >>
> >>
> >>Now my question is, if I am going to do the same thing in R using
> .Call.
> >> How can I achieve it in a more efficient way (i.e. less computation time
> >> compared with pure R code above.).  My c code (called "change_vecA.c")
> >> using
> >> .Call is like this:
> >>
> >>SEXP change_vecA(SEXP vecA){
> >> int i,vecA_len;
> >> double sum,*res_ptr,*vecR_ptr,*vecA_ptr;
> >>
> >> vecA_ptr=REAL(vecA);
> >> vecA_len=length(vecA);
> >> SEXP res_vec,vecR;
> >>
> >> PROTECT(res_vec=allocVector(REALSXP, vec_len));
> >> PROTECT(vecR=allocVector(REALSXP, vec_len));
> >> res_ptr=REAL(res_vec);
> >> vecR_ptr=REAL(vecR);
> >> GetRNGstate();
> >> sum=0.0;
> >> for (i=0;i >>  vecR_ptr[i]=runif(0,1);
> >>  sum+=vecR_ptr[i];
> >> }
> >> if (sum>=5000){
> >>/*copy vecR to the vector to be returned*/
> >>for (i=0;i >>  res_ptr[i]=vecR_ptr[i];
> >>}
> >> }
> >> else{
> >>/*copy vecA to the vector to be returned*/
> >>for (i=0;i >>  res_ptr[i]=vecA_ptr[i];
> >>}
> >> }
> >>
> >> PutRNGstate();
> >> UNPROTECT(2);
> >> resturn(res);
> >> }
> >> My R wrapper function is
> >>change_vecA<-function(vecA){
> >>  dyn.load("change_vecA.so");
> >>  .Call("change_vecA",vecA);
> >>}
> >>
> >> Now my question is, due to two loops (one generates the random
> >> vector and one determines the vector to be returned), can .Call still be
> >> faster than pure R code (only one loop to copy vecR to vecA given
> condition
> >> is met)? Or, how can I improve my c code to avoid redundant loops if
> any.
> >> My
> >> concern is if vecA is large (say of length 100 or even bigger),
> loops
> >> in
> >> C code can slow things down.  Thanks for any help!
> >>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html
> >> Sent from the R devel mailing list archive at Nabble.com.
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] OpenMP and random number generation

2012-02-22 Thread Karl Forner
Hello,

For your information, I plan to release "soon" a package with a fast and
multithreaded aware RNG for C++ code in R packages.
It is currently part of one of my (not yet accepted) packages and I want to
extract it into its own package.
I plan to do some quick benchmarks too.

Of course I can not define exactly when it will be ready.

Best,
Karl

On Wed, Feb 22, 2012 at 9:23 AM, Mathieu Ribatet <
mathieu.riba...@math.univ-montp2.fr> wrote:

> Dear all,
>
> Now that R has OpenMP facilities, I'm trying to use it for my own package
> but I'm still wondering if it is safe to use random number generation
> within a OpenMP block. I looked at the R writing extension document  both
> on the OpenMP and Random number generation but didn't find any information
> about that.
>
> Could someone tell me if it is safe or not please ?
>
> Best,
> Mathieu
>
> -
> I3M, UMR CNRS 5149
> Universite Montpellier II,
> 4 place Eugene Bataillon
> 34095 Montpellier cedex 5   France
> http://www.math.univ-montp2.fr/~ribatet
> Tel: + 33 (0)4 67 14 41 98
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] RcppProgress: progress monitoring and interrupting c++ code, request for comments

2012-02-23 Thread Karl Forner
Hello,

I just created a little package, RcppProgress, to display a progress bar to
monitor the execution status of a C++ code loop, possibly multihreaded with
OpenMP.
I also implemented the possibility to check for user interruption, using
the work-around by Simon Urbanek.

I just uploaded the package on my R-forge project, so you should be able to
get the package from
https://r-forge.r-project.org/scm/viewvc.php/pkg/RcppProgress/?root=gwas-bin-tests

* The progress bar is displayed using REprintf, so that it works also in
the eclipse StatET console, provided that you disable the scroll lock.
* You should be able to nicely interrupt the execution by typing CTRL+C in
the R console, or by clicking the "cancel current task" in the StatET
console.
* I tried to write a small documentation, included in the package, but
basically you use it like this:

The main loop:

Progress p(max, display_progress); // create the progress monitor
#pragma omp parallel for schedule(dynamic)
for (int i = 0; i < max; ++i) {
if ( ! p.is_aborted() ) { // the only way to exit an OpenMP loop
long_computation(nb);
p.increment(); // update the progress
}
}

and in your computation intensive function:

void long_computation(int nb) {
double sum = 0;
for (int i = 0; i < nb; ++i) {
if ( Progress::check_abort() )
return;
for (int j = 0; j < nb; ++j) {
sum += Rf_dlnorm(i+j, 0.0, 1.0, 0);
}
}
}

I provided two small R test functions so that you can see how it looks,
please see the doc.

 I would be extremely grateful if you could give me comments, criticisms
and other suggestions.

I try to release this in order to reuse this functionality in my other
packages.

Best regards,
Karl Forner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] portable parallel seeds project: request for critiques

2012-03-02 Thread Karl Forner
> Some of the random number generators allow as a seed a vector,
> not only a single number. This can simplify generating the seeds.
> There can be one seed for each of the 1000 runs and then,
> the rows of the seed matrix can be
>
>  c(seed1, 1), c(seed1, 2), ...
>  c(seed2, 1), c(seed2, 2), ...
>  c(seed3, 1), c(seed3, 2), ...
>  ...
>
> There could be even only one seed and the matrix can be generated as
>
>  c(seed, 1, 1), c(seed, 1, 2), ...
>  c(seed, 2, 1), c(seed, 2, 2), ...
>  c(seed, 3, 1), c(seed, 3, 2), ...
>
> If the initialization using the vector c(seed, i, j) is done
> with a good quality hash function, the runs will be independent.
>
> What is your opinion on this?
>
> An advantage of seeding with a vector is also that there can
> be significantly more initial states of the generator among
> which we select by the seed than 2^32, which is the maximum
> for a single integer seed.
>
>

Hello,
I would be also in favor for using multiple seeds based on (seed,
task_number) for convenience (i.e. avoiding storing the seeds)
and with the possibility of having a dynamic number of tasks, but I am mot
sure it is theoretically correct.
But I can refer you to this article:
http://www.agner.org/random/ran-instructions.pdf , section 6.1
where the author states:

For example, if we make 100 streams of 10^10 random numbers each from an
> SFMT
> generator with cycle length ρ = 2^11213, we have a probability of overlap
> p ≈ 10^3362.
>

What do you think ? I am very concerned by the correctness of this approach
so would appreciate any advice on that matter.

Thanks
Karl

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] c/c++ Random Number Generators Benchmarks using OpenMP

2012-03-02 Thread Karl Forner
Dear R gurus,

I am interested in permutations-based cpu-intensive methods so I had to pay
a little attention to Random Number Generators (RNG).
For my needs, RNGs have to:
   1) be fast. I profiled my algorithms, and for some the bottleneck was
the RNG.
   2) be scalable. Meaning that I want the RNG to remain fast as I add
threads.
   3) offer a long cycle length. Some basic generators have a cycle length
so low that in a few seconds you can finish it, making further computations
useless and redundant
   4) be able to give reproducible results independent of the number of
threads used, i.e. I want my program to give the very same exact results
using one or 10 threads
   ( 4) "be good" of course )

I found an implementation that seems to meet my criterion and made a
preliminary package to test it.
In the meantime Petr Savicky contacted saying he was about to release a
similar package called rngOpenMP.

So I decided to perform some quick benchmarks. The benchmark code is
available as a R package "rngBenchmarks" here:
https://r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests
but it depends on some unpublished package, like rngOpenMP, and my
preliminary package, yet available from the same URL.

As a benchmark I implemented a Monte-Carlo computation of PI.
I tried to use the exact same computation method, using a template argument
for the RNG, and providing wrappers for the
different available RNGs, except for the rngOpenMP that is not
instantiable, so I adapted specifically the code.

I included in the benchmark:
 -  the c implementation used by the R package Rlecuyer
 - the (GNU) random_r RNG available on GNU/linux systems and that is
reentrant
 - my RcppRandomSFMT,wrapping a modified version of the SIMD-oriented Fast
Mersenne Twister (SFMT) Random Number Generator
provided by http://www.agner.org/random Randomc
 - rngOpenMP
I tried to include the rsprng RNG, but could not manage to use it in my
code.


My conclusions:
  - all the implementations work, meaning that the computed values converge
towards PI with the number of iterations
  - all the implementations are scalable.
  - RcppRandomSFMT and random_r are an order of magnitude faster than
rlecuyer and rngOpenMP
  - actually RcppRandomSFMT and random_r have very similar performance.

The problem with random_r is that its cycle length according to my manpage
is ~ 3E10, enabling for instance only 3 millions permutations of a vector
of 10,000 elements,
to be compared with

Leaving the RcppRandomSFMT as best candidate. This implementation also
allows multiple seeds, solving my requisite number 4, reproducible results
independent of the number of threads, if I use as second seed the task
identifier.

Of course I am probably biased, so please tell me if you have some better
ideas of benchmarks, tests of correctness, if you'd like some other
implementations to be included.

People interested in this topic couldcontact me in order that we
collaboratively propose an implementation suiting all needs.

Thanks,

Karl Forner

Annex:

I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores
each ( CPU  E5520  @ 2.27GHz).

type threads nerrortime time_per_chunk
1 lecuyer   1 1e+07 2.105472e-04   1.538 0.00153800
2 lecuyer   1 1e+08 4.441492e-05  15.265 0.00152650
3 lecuyer   1 1e+09 2.026819e-05 153.209 0.00153209
4 lecuyer   2 1e+07 3.182633e-04   0.821 0.00082100
5 lecuyer   2 1e+08 7.375036e-05   7.751 0.00077510
6 lecuyer   2 1e+09 9.290323e-06  76.476 0.00076476
7 lecuyer   4 1e+07 9.630351e-05   0.401 0.00040100
8 lecuyer   4 1e+08 1.263486e-05   3.887 0.00038870
9 lecuyer   4 1e+09 1.151515e-06  38.618 0.00038618
10lecuyer   8 1e+07 1.239703e-05   0.241 0.00024100
11lecuyer   8 1e+08 7.894518e-05   2.133 0.00021330
12lecuyer   8 1e+09 6.782041e-06  20.420 0.00020420
13   random_r   1 1e+07 7.898746e-05   0.137 0.00013700
14   random_r   1 1e+08 4.748343e-05   1.290 0.00012900
15   random_r   1 1e+09 1.685692e-05  12.844 0.00012844
16   random_r   2 1e+07 4.757590e-06   0.095 0.9500
17   random_r   2 1e+08 7.389450e-05   0.663 0.6630
18   random_r   2 1e+09 2.913732e-05   6.469 0.6469
19   random_r   4 1e+07 1.664590e-04   0.037 0.3700
20   random_r   4 1e+08 1.138106e-04   0.330 0.3300
21   random_r   4 1e+09 3.734717e-05   3.209 0.3209
22   random_r   8 1e+07 1.034678e-04   0.051 0.5100
23   random_r   8 1e+08 4.733472e-05   0.167 0.1670
24   random_r   8 1e+09 1.985413e-05   1.694 0.1694
25 rng_openmp   1 1e+07 2.097492e-04   1.231 0.00123100
26 rng_openmp   1 1e+08 7.580436e-05  12.155 0.00121550
27 rng_openmp   1 1e+09 2.772810e-05 120.712 0.00120712
28 rng_openmp   2 1

Re: [Rd] portable parallel seeds project: request for critiques

2012-03-02 Thread Karl Forner
Thanks for your quick reply.

About the rngSetSeed package: is it usable at c/c++ level ?

The same can be said about initializations. Initialization is a random
> number generator, whose output is used as the initial state of some
> other generator. There is no proof that a particular initialization cannot
> be distinguished from truly random numbers in a mathematical sense for
> the same reason as above.
>
> A possible strategy is to use a cryptographically strong hash function
> for the initialization. This means to transform the seed to the initial
> state of the generator using a function, for which we have a good
> guarantee that it produces output, which is computationally hard to
> distinguish from truly random numbers. For this purpose, i suggest
> to use the package rngSetSeed provided currently at
>
>  http://www.cs.cas.cz/~savicky/randomNumbers/
>
> It is based on AES and Fortuna similarly as "randaes", but these
> components are used only for the initialization of Mersenne-Twister.
> When the generator is initialized, then it runs on its usual speed.
>
> In the notation of
>
>  http://www.agner.org/random/ran-instructions.pdf
>
> using rngSetSeed for initialization of Mersenne-Twister is Method 4
> in Section 6.1.
>


Hmm I had not paid attention to the last paragraph:

> The seeding procedure used in the
> present software use*s a separate random number* generator of a different
> design in order to
> avoid any interference. An extra feature is the RandomInitByArray function
> which makes
> it possible to initialize the random number generator with multiple seeds.
> We can make sure
> that the streams have different starting points by using the thread id as
> one of the seeds.
>

So it means that I am already using this solution ! (in the RcppRandomSFTM,
see other post).
and that I should be reasonably safe.


>
> I appreciate comments.
>
> Petr Savicky.
>
> P.S. I included some more comments on the relationship of provably good
> random number generators and P ?= NP question to the end of the page
>
>  http://www.cs.cas.cz/~savicky/randomNumbers/


Sorry but it's too involved for me.


>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] weird bug with parallel, RSQlite and tcltk

2012-12-31 Thread Karl Forner
Hello,

I spent a lot of a time on a weird bug, and I just managed to narrow it down.

In parallel code (here with parallel::mclappy, but I got it
doMC/multicore too), if the library(tcltk) is loaded, R hangs when
trying to open a DB connection.
I got the same behaviour on two different computers, one dual-core,
and one 2 xeon quad-core.

Here's the code:

library(parallel)
library(RSQLite)
library(tcltk)
#unloadNamespace("tcltk")

res <- mclapply(1:2, function(x) {
db <- DBI::dbConnect("SQLite", ":memory:")
}, mc.cores=2)
print("Done")   

When I execute it (R --vanilla  < test_parallel_db.R), it hangs
forever, and I have to type several times CTRL+C to interrupt it. I
then get this message:

Warning messages:
1: In selectChildren(ac, 1) : error 'Interrupted system call' in select
2: In selectChildren(ac, 1) : error 'Interrupted system call' in select

Then, just remove library(tcltk), or uncomment
unloadNamespace("tcltk"), and it works fine again.

I guess there's a bug somewhere, but where exactly ?

Best,

Karl Forner

Further info:


R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

ubuntu 12.04 and 12.10

ubuntu package tk8.5

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] weird bug with parallel, RSQlite and tcltk

2013-01-03 Thread Karl Forner
Hello,

The point is that I do not use tcltk, it gets loaded probably as a
dependency of a dependency of a package.
When I unload it all work perfectly fine. I just found it because one
of my computer did not have tk8.5 installed, and did not exhibit the
mentioned bug. So I really think something should be done about this.
Maybe the "gui loop" should not be run a the the loading of the tcltk
package, but
at the first function ran, or something like this.

As you can see in my example code, the in-memory database is opened in
the parallel code...

Best,
Karl

On Mon, Dec 31, 2012 at 10:58 PM, Simon Urbanek
 wrote:
>
> On Dec 31, 2012, at 1:08 PM, Karl Forner wrote:
>
>> Hello,
>>
>> I spent a lot of a time on a weird bug, and I just managed to narrow it down.
>>
>
> First, tcltk and multicore don't mix well, see the warning in the 
> documentation (it mentions GUIs and AFAIR tcltk fires up a GUI event loop 
> even if you don't actually create GUI elements). Second, using any kind of 
> descriptors in parallel code is asking for trouble since those will be owned 
> by multiple processes. If you use databases files, etc. they must be opened 
> in the parallel code, they cannot be shared by multiple workers. The latter 
> is ok in your code so you're probably bitten by the former.
>
> Cheers,
> Simon
>
>
>
>> In parallel code (here with parallel::mclappy, but I got it
>> doMC/multicore too), if the library(tcltk) is loaded, R hangs when
>> trying to open a DB connection.
>> I got the same behaviour on two different computers, one dual-core,
>> and one 2 xeon quad-core.
>>
>> Here's the code:
>>
>> library(parallel)
>> library(RSQLite)
>> library(tcltk)
>> #unloadNamespace("tcltk")
>>
>> res <- mclapply(1:2, function(x) {
>>   db <- DBI::dbConnect("SQLite", ":memory:")
>> }, mc.cores=2)
>> print("Done")
>>
>> When I execute it (R --vanilla  < test_parallel_db.R), it hangs
>> forever, and I have to type several times CTRL+C to interrupt it. I
>> then get this message:
>>
>> Warning messages:
>> 1: In selectChildren(ac, 1) : error 'Interrupted system call' in select
>> 2: In selectChildren(ac, 1) : error 'Interrupted system call' in select
>>
>> Then, just remove library(tcltk), or uncomment
>> unloadNamespace("tcltk"), and it works fine again.
>>
>> I guess there's a bug somewhere, but where exactly ?
>>
>> Best,
>>
>> Karl Forner
>>
>> Further info:
>>
>>
>> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
>> Copyright (C) 2012 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> ubuntu 12.04 and 12.10
>>
>> ubuntu package tk8.5
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] weird bug with parallel, RSQlite and tcltk

2013-01-07 Thread Karl Forner
Hello and thank you.
Indeed gsubfn is responsible for loading tcltk in my case.

On Thu, Jan 3, 2013 at 12:14 PM, Gabor Grothendieck
 wrote:
> options(gsubfn.engine = "R")

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Problem using raw vectors with inline cfunction

2013-02-01 Thread Karl Forner
Hello,

>From what I understood from the documentation I found, when using the
inline cfunction with convention=".C",
R raw vectors should be given as unsigned char* to the C function.

But consider the following script:

library(inline)

testRaw <- cfunction(signature(raw='raw', len='integer')
, body='
int l = *len;
int i = 0;
Rprintf("sizeof(raw[0])=%i\\n", sizeof(raw[0]));
for (i = 0; i < l; ++i) Rprintf("%i, ", (int)raw[i]);
for (i = 0; i < l; ++i) raw[i] = i*10;
'
, convention=".C", language='C', verbose=TRUE
)

tt <- as.raw(1:10)
testRaw(tt, length(tt))


When I execute it:

$ R --vanilla --quiet < work/inline_cfunction_raw_bug.R

sizeof(raw[0])=1
192, 216, 223, 0, 0, 0, 0, 0, 224, 214,
 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: .Primitive(".C")(, raw =
as.character(raw), len = as.integer(len))
 2: testRaw(tt, length(tt))
aborting ...
Segmentation fault (core dumped)


I was expecting to get in the C function a pointer on a byte array of
values (1,2,3,4,5,6,7,8,9,10).
Apparently that is not the case. I guess that the "raw =
as.character(raw)," printed in the traceback is responsible for the
observed behavior.

If it is expected behavior, how can I get a pointer on my array of bytes ?


Thanks.

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to avoid using gridextra via Depends instead of Imports in a package ?

2013-03-20 Thread Karl Forner
Hello,

I really need some insight on a problem we encountered using grid,
lattice and gridExtra.

I tried to reduce the problem, so the plot make no sense.

we have a package: gridextrabug

with:

DESCRIPTION
--
Package: gridextrabug
Title: gridextrabug
Version: 0.1
Author: toto
Maintainer: toto 
Description: gridextrabug
Imports:
grid,
gridExtra,
lattice,
latticeExtra,
reshape,
Depends:
R (>= 2.15),
methods
Suggests:
testthat,
devtools
License: GPL (>= 3)
Collate:
'zzz.R'
'plotFDR.R'

R/plotFDR.R

plot_fdr <- function(dt,qvalue_col,pvalue_col, zoom_x=NULL, zoom_y=NULL,
fdrLimit=0,overview_plot=FALSE,...)
{

frm <- as.formula(paste(qvalue_col,"~ rank(",pvalue_col,")"))
plt <- xyplot( frm ,
data=dt,
abline=list(h=fdrLimit,lty="dashed"),
pch=16,cex=1,
type="p",
panel=panelinplot2,
subscripts= TRUE,

)

return(plt)
}

panelinplot2 <- function(x,y,subscripts,cex,type,...){

panel.xyplot(x,y,subscripts=subscripts,
ylim=c(0,1),
type=type,
cex=cex,...)
pltoverview <- xyplot(y~x,xlab=NULL,
ylab=NULL,
type="l",
par.settings=qb_theme_nopadding(),
scales=list(draw=FALSE),
cex=0.6,...)
gr <- grob(p=pltoverview, ..., cl="lattice")


grid.draw(gr) # <---
problematic call
}

NAMESPACE
--
export(panelinplot2)
export(plot_fdr)
importFrom(grid,gpar)
importFrom(grid,grid.draw)
importFrom(grid,grid.rect)
importFrom(grid,grid.text)
importFrom(grid,grob)
importFrom(grid,popViewport)
importFrom(grid,pushViewport)
importFrom(grid,unit)
importFrom(grid,viewport)
importFrom(gridExtra,drawDetails.lattice)
importFrom(lattice,ltext)
importFrom(lattice,panel.segments)
importFrom(lattice,panel.xyplot)
importFrom(lattice,stripplot)
importFrom(lattice,xyplot)
importFrom(latticeExtra,as.layer)
importFrom(latticeExtra,layer)
importFrom(reshape,sort_df)

Then if you execute this script:

without_extra.R
--
library(gridextrabug)
p <- seq(10^-10,1,0.001)
p <- p[sample(1:length(p))]
q <- p.adjust(p, "BH")
df <- data.frame(p,q)


plt <-  plot_fdr(df,qvalue_col= "q", pvalue_col="p",
zoom_x=c(0,20),
fdrLimit=0.6,
overview_plot=TRUE)
X11()
print(plt)

you will not have the second plot corresponding the call to panelinplot2


If you execute this one:

with_extra.R
--
library(gridextrabug)
p <- seq(10^-10,1,0.001)
p <- p[sample(1:length(p))]
q <- p.adjust(p, "BH")
df <- data.frame(p,q)


plt <-  plot_fdr(df,qvalue_col= "q", pvalue_col="p",
zoom_x=c(0,20),
fdrLimit=0.6,
overview_plot=TRUE)
X11()

library(gridExtra)
print(plt)

you will have the second plot.


>From what I understood, the last line of panelinplot2(), "
grid.draw(x)", dispatches to  grid:::grid.draw.grob(), which in turn
calls grid:::drawGrob(), which calls grid::drawDetails() which is a S3
generic.
The gridExtra package defines the method drawDetails.lattice().
When the package is loaded in the search() path,  the "grid.draw(x)"
call dispatches to gridExtra:::drawDetails.lattice().

We would rather avoid messing with the search path, which is a best
practice if I'm not mistaken, so we tried hard to solve it using
Imports.
But I came to realize that the problem was in the grid namespace, not
in our package namespace.

I tested it with the following work-around:
parent.env(parent.env(getNamespace('grid'))) <- getNamespace('gridExtra')

which works.

So my questions are:
  * did we miss something obvious ?
  * what is the proper way to handle this situation ?


Thanks in advance for your wisdom.

Karl Forner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] parallel::mclapply does not return try-error objects with mc.preschedule=TRUE

2013-04-11 Thread Karl Forner
Hello,

Consider this:

1)
library(parallel)
res <- mclapply(1:2, stop)
#Warning message:
#In mclapply(1:2, stop) :
# all scheduled cores encountered errors in user code

is(res[[1]], 'try-error')
#[1] FALSE


2)
library(parallel)
res <- mclapply(1:2, stop, mc.preschedule=FALSE)
#Warning message:
#In mclapply(1:2, stop, mc.preschedule = FALSE) :
#  2 function calls resulted in an error

is(res[[1]], 'try-error')
#[1] TRUE

The documentation states that:
'Each forked process runs its job inside try(..., silent = TRUE) so if
errors occur they will be stored as class "try-error" objects in the
return value and a warning will be given.'


Is this a bug ?

Thanks
Karl


> sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] tools_2.15.3

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] parallel::mclapply does not return try-error objects with mc.preschedule=TRUE

2013-04-23 Thread Karl Forner
>
>> Is this a bug ?
>>
>
> Not in parallel.  Something else has changed, and I am about to commit a
> different version that still works as documented.
>
>
Thanks for replying.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Catch SIGINT from user in backend C++ code

2013-05-06 Thread Karl Forner
Hello,

I once wrote  a package called RcppProgress, that you can find here:
https://r-forge.r-project.org/R/?group_id=1230
I did not try it for a long time, but it was developed to solve this
exact problem.
You can have a look the its companion package: RcppProgressExample.
Here's a link to the original announcement:
http://tolstoy.newcastle.edu.au/R/e17/devel/12/02/0443.html

Hope it helps.
Karl Forner
Quartz Bio

On Thu, May 2, 2013 at 1:50 AM, Jewell, Chris  wrote:
> Hi,
>
> I was wondering if anybody knew how to trap SIGINTs (ie Ctrl-C) in backend 
> C++ code for R extensions?  I'm writing a package that uses the GPU for some 
> hefty matrix operations in a tightly coupled parallel algorithm implemented 
> in CUDA.
>
> The problem is that once running, the C++ module cannot apparently be 
> interrupted by a SIGINT, leaving the user sat waiting even if they realise 
> they've launched the algorithm with incorrect settings.  Occasionally, the 
> SIGINT gets through and the C++ module stops.  However, this leaves the CUDA 
> context hanging, meaning that if the algorithm is launched again R dies.  If 
> I could trap the SIGINT, then I could make sure a) that the algorithm stops 
> immediately, and b) that the CUDA context is destructed nicely.
>
> Is there a "R-standard" method of doing this?
>
> Thanks,
>
> Chris
>
>
> --
> Dr Chris Jewell
> Lecturer in Biostatistics
> Institute of Fundamental Sciences
> Massey University
> Private Bag 11222
> Palmerston North 4442
> New Zealand
> Tel: +64 (0) 6 350 5701 Extn: 3586
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] umlaut in path name (PR#14119)

2009-12-09 Thread karl . schilling
Full_Name: Karl Schilling
Version: 2.10.0 patched
OS: Win XP
Submission from: (NULL) (131.220.251.8)


I am running R 2.10.0 patched under WinXP (German version).
When I use the command file.choose() and try to navigate to a target with an
umlaut
(Ä, Ö, Ü) in the pathway, I get an error message "file not found". Also, in the
path name reproduced in the error message, the umlauts are replaced by sign
combinations.

If I tray to target files with no umlauts in the path name, everything is ok.

Any suggestions?

Thank you so much for your attention to this.

KArl Schilling

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] match function causing bad performance when using table function on factors with multibyte characters on Windows

2011-01-21 Thread Karl Ove Hufthammer
[I originally posted this on the R-help mailing list, and it was suggested that 
R-devel would be a better
place to dicuss it.]

Running ‘table’ on a factor with levels containing non-ASCII characters
seems to result in extremely bad performance on Windows. Here’s a simple
example with benchmark results (I’ve reduced the number of replications to
make the function finish within reasonable time):

  library(rbenchmark)
  x.num=sample(1:2, 10^5, replace=TRUE)
  x.fac.ascii=factor(x.num, levels=1:2, labels=c("A","B"))
  x.fac.nascii=factor(x.num, levels=1:2, labels=c("Æ","Ø"))
  benchmark( table(x.num), table(x.fac.ascii), table(x.fac.nascii), 
table(unclass(x.fac.nascii)), replications=20 )
  
test replications elapsed   relative user.self 
sys.self user.child sys.child
  4 table(unclass(x.fac.nascii))   201.53   4.636364  1.51 
0.01 NANA
  2   table(x.fac.ascii)   200.33   1.00  0.33 
0.00 NANA
  3  table(x.fac.nascii)   20  146.67 444.454545 38.52
81.74 NANA
  1 table(x.num)   201.55   4.696970  1.53 
0.01 NANA
  
  sessionInfo()
  R version 2.12.1 (2010-12-16)
  Platform: i386-pc-mingw32/i386 (32-bit)
  
  locale:
  [1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252  
LC_CTYPE=Norwegian-Nynorsk_Norway.1252
LC_MONETARY=Norwegian-Nynorsk_Norway.1252
  [4] LC_NUMERIC=C  
LC_TIME=Norwegian-Nynorsk_Norway.1252   
  
  attached base packages:
  [1] stats graphics  grDevices datasets  utils methods   base
  
  other attached packages:
  [1] rbenchmark_0.3

The timings are from R 2.12.1, but I also get comparable results
on the latest prelease (R 2.13.0 2011-01-18 r54032).

Running the same test (100 replications) on a Linux system with
R.12.1 Patched results in essentially no difference between the
performance on ASCII factors and non-ASCII factors:

test replications elapsed relative user.self 
sys.self user.child sys.child
  4 table(unclass(x.fac.nascii))  100   4.607 3.096102 4.455
0.092  0 0
  2   table(x.fac.ascii)  100   1.488 1.00 1.459
0.028  0 0
  3  table(x.fac.nascii)  100   1.616 1.086022 1.560
0.051  0 0
  1 table(x.num)  100   4.504 3.026882 4.403
0.079  0 0

  sessionInfo()
  R version 2.12.1 Patched (2011-01-18 r54033)
  Platform: i686-pc-linux-gnu (32-bit)
  
  locale:
   [1] LC_CTYPE=nn_NO.UTF-8   LC_NUMERIC=C   
LC_TIME=nn_NO.UTF-8   
   [4] LC_COLLATE=nn_NO.UTF-8 LC_MONETARY=C  
LC_MESSAGES=nn_NO.UTF-8   
   [7] LC_PAPER=nn_NO.UTF-8   LC_NAME=C  LC_ADDRESS=C   
   
  [10] LC_TELEPHONE=C LC_MEASUREMENT=nn_NO.UTF-8 
LC_IDENTIFICATION=C   
  
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base 

  other attached packages:
  [1] rbenchmark_0.3

Profiling the ‘table’ function indicates almost all the time if spent in
the ‘match’ function, which is used when ‘factor’ is used on a ‘factor’
inside ‘table’. Indeed, ‘x.fac.nascii = factor(x.fac.nascii)’ by itself
is extremely slow.

Is there any theoretical reason ‘factor’ on ‘factor’ with non-ASCII
characters must be so slow? And why doesn’t this happen on Linux?

Perhaps a fix for ‘table’ might be calculating the ‘table’ statistics
*including* all levels (not using the ‘factor’ function anywhere),
and then removing the ‘exclude’ levels in the end. For example,
something along these lines:

res = table.modified.to.not.use.factor(...)
ind = lapply(dimnames(res), function(x) !(x %in% exclude))
do.call("[", c(list(res), ind, drop=FALSE))

(I haven’t tested this very much, so there may be issues with this
way of doing things.)

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] table on numeric vector with exclude argument containing value missing from vector causes warning + "NaN" levels incorrectly removed from factors

2011-01-21 Thread Karl Ove Hufthammer
I *think* the following may be considered a bug or two, but would appreciate 
any comments before (not) filing an official bug report.

Possible bug 1: ‘table’ on numeric vector with ‘exclude’ argument containing 
value missing from vector causes warning
Possible bug 2: ‘table’ incorrectly tries to remove "NaN" levels

The help page for ‘table’ says the the first argument is ‘one or more 
objects which can be interpreted as factors (including character strings) 
[…]’. Does this include numeric vectors? Numeric vectors seems to work fine. 
Example:

  x = sample(1:3, 100, replace=TRUE)
  table(x)

The ‘exclude’ argument explicitly mentions factor levels, but seems to work 
fine for other objects too. Example:

  table(x, exclude=2)

It’s actually not clear from the help page what is meant by ‘levels to 
remove from all factors in ...’, but it seems like a character vector is 
expected. And indeed the following also works:

  table(x, exclude="2")

However, setting the ‘exclude’ argument to a value not contained in 
the vector to be tabulated,

  table(x, exclude="foo")

causes the following warning:

  In as.vector(exclude, typeof(x)) : NAs introduced by coercion’:

The correct results is produced, though. Note that all of the following does 
*not* cause any warning:

  table(x, exclude=NA)
  table(x, exclude=NaN)
  table(factor(x), exclude="foo")
  table(as.character(x), exclude="foo")

I also wonder about the inclusion of ‘NaN’ in the definition of ‘table’:

table(..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", 
"ifany", "always"), dnn = list.names(...), deparse.level = 1) 

A factor can’t include a NaN level, as the levels values are always
strings or NA. And having the above definition causes "NaN" (string)
levels to mysteriously disappear when run through ‘table’. Example:

  table(factor(c("NA",NA,"NcN","NbN", "NaN")))

Result:

   NA NbN NcN 
1   1   1

(The missing NA is not a bug; it’s caused by useNA="no".)



sessionInfo()
R version 2.12.1 Patched (2011-01-20 r54056)
Platform: i686-pc-linux-gnu (32-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base   

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-25 Thread Karl Ove Hufthammer
Matthew Dowle wrote:

> I'm not sure, but note the difference in locale between
> Linux (UTF-8) and Windows (non UTF-8). As far as I
> understand it R much prefers UTF-8, which Windows doesn't
> natively support. Otherwise you could just change your
> Windows locale to a UTF-8 locale to make R happier.
> 
[...]
> 
> If anybody knows a way to trick R on Linux into thinking it has
> an encoding similar to Windows then I may be able to take a
> look if I can reproduce the problem in Linux.

Changing the locale to an ISO 8859-1 locale, i.e.:

export LC_ALL="en_US.ISO-8859-1"
export LANG="en_US.ISO-8859-1"

I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII 
factor as it is on the ASCII factor.

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-26 Thread Karl Ove Hufthammer
Simon Urbanek wrote:

>> I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII
>> factor as it is on the ASCII factor.
> 
> Strange - are you sure you get the right locale names? Make sure it's
> listed in locale -a.

Yes, I managed to reproduce it now, using a locale listed in ‘locale -a’.
There is a performance hit, though *much* smaller than on Windows.

> FWIW if you care about speed you should use tabulate() instead - it's much
> faster and incurs no penalty:

Yes, that the solution I ended up using:

res = tabulate(x, nbins=nlevels(x)) # nbins needed for levels that don’t occur
names(res) = levels(x)
res

(Though I’m not sure it’s *guaranteed* that factors are internally stored in a
way that make this works, i.e., as the numbers 1, 2, ... for level 1, 2 ...)

Anyway, do you think it’s worth trying to change the ‘table’ function the way I
outlined in my first post¹? This should eliminate the performance hit on all
platforms. However, it will introduce a performance hit (CPU and memory use)
if the elements of ‘exclude’ make up a large part of the factor(s).

¹ http://permalink.gmane.org/gmane.comp.lang.r.devel/26576

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-26 Thread Karl Ove Hufthammer
Karl Ove Hufthammer wrote:

> Anyway, do you think it’s worth trying to change the ‘table’ function the
> way I outlined in my first post¹? This should eliminate the performance
> hit on all platforms.

Some additional notes: ‘table’ uses ‘factor’ directly, but also indirectly, 
in ‘addNA’. The definition of ‘addNA’ ends with:

if (!any(is.na(ll))) 
ll <- c(ll, NA)
factor(x, levels = ll, exclude = NULL)

Which is slow for non-ASCII levels. One *could* fix this by changing the 
last line to

  attr(x, "levels")=ll

But one soon ends up changing every function that uses ‘factor’ in this way, 
which seems like the wrong approach. The problems lies inside ‘factor’,
and that’s where it should be fixed, if feasible.

BTW, the defintion of ‘addNA’ looks suboptimal in a different way. The last 
line is always executed, even if the factor *does* contain NA values (and of 
course NA levels). For this case, basically it’s doing nothing, just taking 
a very long time doing it (at least on Windows). Moving the last line inside 
the ‘if’ clause, and adding a ‘else return(x)’ would fix this (correct me if 
I’m wrong).

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] How to get R to compile with PNG support

2011-04-18 Thread Karl-Dieter Crisman
Dear R devel list,

Good morning; I'm with the Sage (http://www.sagemath.org) project.
(Some of you might have seen my talk on this at last summer's useR
conference).

We have some rudimentary support for using R graphics in various
cases, which has proved useful to many of our users who want to go
back and forth between R and other capabilities within Sage.
Unfortunately, the way we originally implemented this was using the
png and plot functions in R itself, which perhaps isn't the best
(i.e., everyone uses ggplot now? but I digress).

That means that when people download a binary of ours, or compile
their own, whether R's plot and png functions work depends heavily on
the rather obscure (to users) issue of exactly what headers are
present on the compiling machine.

Unfortunately, it is *very* unclear what actually needs to be present!
 There are innumerable places where this has come up for us, but
http://trac.sagemath.org/sage_trac/ticket/8868 and
http://ask.sagemath.org/question/192/compiling-r-with-png-support are
two of the current places where people have compiled information.

The FAQ says, "Unless you do not want to view graphs on-screen you
need ‘X11’ installed, including its headers and client libraries. For
recent Fedora distributions it means (at least) ‘libX11’,
‘libX11-devel’, ‘libXt’ and ‘libXt-devel’. On Debian we recommend the
meta-package ‘xorg-dev’. If you really do not want these you will need
to explicitly configure R without X11, using --with-x=no."

Well, we don't actually need to view graphs on-screen, but we do need
to be able to generate them and save them (as pngs, for instance) to
the correct directory in Sage for viewing.  But we have people who've
tried to do this in Ubuntu, with libpng and xorg-dev installed, and
the file /usr/include/X11/Xwindows.h exists, but all to no avail.
There are almost as many solutions people have found as there are
computers out there, it seems - slight hyperbole, but that's what it
feels like.

We've posted more than once (I think) to the r-help list, but have
gotten no useful feedback.  Is there *anywhere* that the *exact*
requirements R has for having

capabilities("png")
  png
FALSE

come out TRUE are documented?

Then, not only could we be smarter in how we compile R (currently
somewhat naively searching for /usr/include/X11/Xwindows.h to
determine whether we'll try for png support), but we would be able to
tell users something very precise to do (e.g., apt-get foo) if they
currently have R without PNG support in Sage.  Again, I emphasize that
apparently getting xorg-dev doesn't always do the trick.

We do realize that for most people wanting to use just R, it's best to
download a binary, which will behave nicely; Sage's "batteries
included" philosophy means that we are asking for more specialized
info from upstream, and for that I apologize in advance.  I also
apologize if I said something silly above, because I don't actually
know what all these files are - I've just looked into enough support
requests to have a decent idea of what's required.We are trying
not to have to parse the makefile to figure all this out, and possibly
making some mistake there as well.

Thank you SO much for any help with this,
Karl-Dieter Crisman
for the Sage team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get R to compile with PNG support

2011-04-20 Thread Karl-Dieter Crisman
> Message: 12
> Date: Wed, 20 Apr 2011 02:09:23 -0700 (PDT)
> From: Sharpie 
> To: r-devel@r-project.org
> Subject: Re: [Rd] How to get R to compile with PNG support
> Message-ID: <1303290563237-3462502.p...@n4.nabble.com>
> Content-Type: text/plain; charset=UTF-8
>
>
> Dear R devel list,
>
> Good morning; I'm with the Sage (http://www.sagemath.org) project.
> (Some of you might have seen my talk on this at last summer's useR
> conference).
>
> Thanks for stoping by Karl! I have to say that I am a big fan of the Sage
> project---it is a very good idea and I really appreciate all the time you
> guys put into it. I may not be able to answer all of your questions
> concerning PNG support, but hopefully some of the following pointers will be
> useful.

Good morning, Charlie et al.,

Thanks for your words.  We like R, too!  We need to advertise it more,
and this thread is part of making sure that happens in the long run.

To the issue at hand.   Our main concern is just not to have to spend
hours reading the configuration and makefile to figure out exactly
where things happen.


>>
>> We have some rudimentary support for using R graphics in various
>> cases, which has proved useful to many of our users who want to go
>> back and forth between R and other capabilities within Sage.
>> Unfortunately, the way we originally implemented this was using the
>> png and plot functions in R itself, which perhaps isn't the best
>> (i.e., everyone uses ggplot now? but I digress).
>>
>
> One important distinction to make is between R graphics functions such as
> plot and ggplot, and R graphics *devices*, such as png. The devices provide
> back ends that take the R-level function calls and actually execute the
> low-level "draw line from a to b, clip to rectangle A, insert left-justified
> text at x,y" primitives that get written to an output format.


True.  It's the device enabling that I'm talking about.  We enable
aqua on Mac, and png on Linux.

We ignore Cairo, and ignore X11 on Mac because it is too touchy (at
least, according to the FAQ on this - different weird instructions for
each type, and of course not everyone has X on Mac).

> Bottom line for Sage is that as long as you implement at least one device
> function, such as png, your users should be able to call plot, ggplot, and
> the rest of R's graphics functions to their heart's content, they just won't
> have a wide selection of output formats.
>

Great.  That is okay with us; we aren't expecting (yet) people to be
able to save R graphics in various output formats.  Our native
(matplotlib) graphics, we do expect this.


>> Then, not only could we be smarter in how we compile R (currently
>> somewhat naively searching for /usr/include/X11/Xwindows.h to
>> determine whether we'll try for png support), but we would be able to
>> tell users something very precise to do (e.g., apt-get foo) if they
>> currently have R without PNG support in Sage.  Again, I emphasize that
>> apparently getting xorg-dev doesn't always do the trick.
>>


> In the trac ticket you linked, the configure output shows PNG is enabled
> (I.E. the library was found) but you may be ending up with no support for an
> actual png() graphics device due to one of the following
>
>  - configure didn't find Xlib as X11 is not listed under Interfaces
>  - configure didn't find cairo as it is not listed under Additional
> capabilities
>
> So, although R has the PNG library that is only useful for writing PNG
> files. R also needs the Xlib or Cairo libraries to provide drawing
> primitives that will create the figures those files will contain.

Gotcha.  I suspect that the X11 not listed under Interfaces is the
problem (again, we ignore Cairo).

What is the *exact* file or directory that the R configure looks for
in trying to list X11 under Interfaces?   And is there any way around
this at all?  That is, is there any way for R to create but not
display a graphic if it has (for instance) png support, like the one
on the Trac ticket did?  We can always just search for the png file
and serve it up in our own viewers.

Note that we already search for /usr/include/X11/Xwindows.h, and
adding xorg-dev didn't help with the latest one (which may not be on
the Trac ticket).


> In the ask.sagemath question the problem appears to be that the user had X11
> installed but not libpng.

Yes, I just referenced that for reference, as it were.

Thank you, and I hope we can get this resolved!

Karl-Dieter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get R to compile with PNG support

2011-04-21 Thread Karl-Dieter Crisman
Followup with the specific issue in our most recent (non-posted, as of
yet) attempts on a certain box.  We now have xorg-dev, libcairo-dev,
and Xwindows.h and libpng (as below) on this machine, but R is not
compiling with support for any of these things.

Once again, any help knowing *exactly* what to pass to the
configuration script or anything else would be *greatly* appreciated.
We are planning to use R in Sage on several occasions with this
machine this summer if we can get this going (see
http://www.maa.org/prep/2011/sage.html).



R is now configured for i686-pc-linux-gnu
 Source directory:  .
 Installation directory:/home/sageserver/sage/local
 C compiler:gcc -std=gnu99
-I/home/sageserver/sage/local/include
-L/home/sageserver/sage/local/lib/   Fortran 77 compiler:
sage_fortran  -g -O2
 C++ compiler:  g++  -g -O2
 Fortran 90/95 compiler:sage_fortran -g -O2  Obj-C compiler:
 Interfaces supported:  X11
 External libraries:readline, BLAS(ATLAS), LAPACK(generic)
Additional capabilities:   PNG, NLS
 Options enabled:   shared R library, R profiling
 Recommended packages:  yes


However:


> capabilities()
   jpeg  png tifftcltk  X11 aqua http/ftp  sockets
  FALSEFALSEFALSEFALSEFALSEFALSE TRUE TRUE

 libxml fifo   clediticonv  NLS  profmemcairo
   TRUE TRUE TRUE TRUE TRUEFALSEFALSE

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get R to compile with PNG support

2011-04-21 Thread Karl-Dieter Crisman
Thanks for your replies, Dirk and Matt.

On Thu, Apr 21, 2011 at 7:49 AM, Dirk Eddelbuettel  wrote:
>
> On 20 April 2011 at 12:16, Karl-Dieter Crisman wrote:
> | 
> |
> | R is now configured for i686-pc-linux-gnu
> |  Source directory:          .
> |  Installation directory:    /home/sageserver/sage/local
> |  C compiler:                gcc -std=gnu99
> | -I/home/sageserver/sage/local/include
> | -L/home/sageserver/sage/local/lib/   Fortran 77 compiler:
> | sage_fortran  -g -O2
> |  C++ compiler:              g++  -g -O2
> |  Fortran 90/95 compiler:    sage_fortran -g -O2  Obj-C compiler:
> |  Interfaces supported:      X11
> |  External libraries:        readline, BLAS(ATLAS), LAPACK(generic)
> | Additional capabilities:   PNG, NLS
> |  Options enabled:           shared R library, R profiling
> |  Recommended packages:      yes
> |
> |
> | However:
> |
> |
> | > capabilities()
> |    jpeg      png     tiff    tcltk      X11     aqua http/ftp  sockets
> |   FALSE    FALSE    FALSE    FALSE    FALSE    FALSE     TRUE     TRUE
> |
> |  libxml     fifo   cledit    iconv      NLS  profmem    cairo
> |    TRUE     TRUE     TRUE     TRUE     TRUE    FALSE    FALSE
>
> Random guess: did you connect via ssh without x11 forwarding?

Almost certainly, yes.  (I am an interlocutor right now for someone
who is actually doing this, my apologies.)
But it's a machine we just ssh into, I'm pretty sure, though it does
serve up web pages.

> I cannot see how configure find png.h and libpng but the binary fails. As all
> other X11 related formats are also shown false, methinks you are without a
> valid DISPLAY.

That is quite likely.  So it sounds like for png() to be set to use
the X11 device, there has to (somewhere) be a visual output -
presumably that is the part LOGICAL(ans)[i++] = X11; in Matt's answer.

> That is actually an issue related to your headless use---which is what Sage
> may default too; see the R FAQ on this and the prior discussion on the
> xvfb-run wrapper which 'simulates' an x11 environment (which you need for
> png).  So maybe you should revisit the Cairo devices---they allow you
> plotting without an x11 device (and also give you SVG).
>

Yeah, and I saw your SO answer on this (after the fact) as well.

In some sense, we are just trying to get graphics on one machine.
Note that we have installed the cairo devel package on this very
machine, but it's not being picked up - maybe it's looking in the
wrong place?  That is one of the reasons this is confusing.

But in a larger sense, because of Sage's "batteries included"
philosophy (which we know not everyone agrees with!), we would like to
have a one-shot way so that *everyone* will see R graphics, not just
people whose binary happens to have been compiled on a machine that
has X and a display.  If that means adding 22.5 MB to our tarball for
Cairo... maybe, maybe not.

I won't copy Matt's message here, but I appreciate the pointers to
exactly where these things are defined very much - without knowing
where to look, it would be a long slog.   Hopefully we'll have some
success!  Thanks for the replies, and for any other ideas.

Karl-Dieter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to get R to compile with PNG support

2011-04-21 Thread Karl-Dieter Crisman
Thanks for all the feedback.  First, our update, then two responses.

>From Jason Grout:
+++
I finally got it working.  After mucking around in the R configure
file a bit and trying out some of the different tests, as well as
comparing a working system with our broken system, I realized that
`pkg-config --exists pangocairo` was working on the good system and
not working on the broken system.  So I installed libpango1.0-dev, and
now R picks up the cairo package, which in turn means that my
capabilities is now:

> capabilities()
   jpeg  png tifftcltk  X11 aqua http/ftp  sockets
   TRUE TRUEFALSEFALSEFALSEFALSE TRUE TRUE
 libxml fifo   clediticonv  NLS  profmemcairo
   TRUE TRUE TRUE TRUE TRUEFALSE TRUE

So in short, I think what I did was install libcairo-dev and
libpango1.0-dev.  There might have been other stuff in there that was
needed; I'm not sure.  When I build a new system again, I'll try just
installing those packages and see if it is sufficient.  For the
record, I had also installed xorg-dev as well.

+++
My comment: As someone who didn't know what configure scripts were a
couple years ago, this is maddening; I don't see anything about
libpango or whatever in the FAQs.  Luckily, Jason knows a lot more
than I do!

@Dirk:

> | Note that we have installed the cairo devel package on this very
> | machine, but it's not being picked up - maybe it's looking in the
> | wrong place?  That is one of the reasons this is confusing.
>
> You have to understand that even though this problem may seem urgent and
> novel to you and the Sage team,

Novel, yes; urgent only to us, certainly we don't assume it's urgent to you :)

> it is actually about as old as the web and R
> itself.  In a nutshell, we all (in the people reading r-help and r-devel
> sense) have been explaining to folks since the late 1990s that in order to
> run png() to 'just create some charts for a webserver' ... you need an X11
> server because that is where the font metrics come from. Or else no png for

It's true this is findable, but the difference between having X11 on
the system and having the display is arcane for those who just want to
use R.  But I understand your point.

> is life.  System such as Sage become so large because having things like this
> around on all deployment systems implies (at least to some degree)
> replicating fundamental OS level features because they unfortunately have
> supply things missing or broken across OSs.

Yes, that is true.  We know of many people who download Sage because
it's the easiest way to install Z, where Z is some specific
mathematical program that is impossible to configure properly without
special knowledge.  Or, until fairly recently, to get Cython.

@Simon:

That's new to me that X11 is installed by default now, but it looks
like you are right.  However, we don't rely on this for Mac; we make
sure to configure for quartz when we build - which I assume is
separate from the other stuff?  But updating the FAQ about this would
be really great for future users :)

Also thanks for the hint on all the other (possibly) needed stuff.
Yikes!  AFAIK this is an Ubuntu machine we're talking about.

To all - if we come up with any more reliable way to make this work
universally, i.e. with *exact* instructions for what to download, we
will definitely pass that upstream. Thank you.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Invalid date-times and as.POSIXct problems (remotely related to DST issues)

2012-03-12 Thread Karl Ove Hufthammer
I think this should be handled as a bug, but I’m not sure which
platforms and versions it applies to, so I’m writing to this list. The
problem is that as.POSIXct on character strings behaves in a strange way
if one of the date-times are invalid; it converts all the date-times to
dates (i.e., it discards the time part).

Example, which I suspect only works on my locale, with the UTC+1/UTC+2
timezone:

  $ dates=c("2003-10-13 00:15:00", "2008-06-03 14:45:00", "2003-03-30 02:00:00")

Note that the last date-time doesn’t actually exist 
(due to daylight saving time):
http://www.timeanddate.com/worldclock/meetingtime.html?day=30&month=3&year=2003&p1=187&iv=0

  $ d12=as.POSIXct(dates)
  $ d123=as.POSIXct(dates[1:2])
  $ d12
  [1] "2003-10-13 CEST" "2008-06-03 CEST" "2003-03-30 CET"
  $ d123
  [1] "2003-10-13 00:15:00 CEST" "2008-06-03 14:45:00 CEST"

When I include all values, they are all converted to (POSIXct) *dates*,
but if I exclude the invalid one, the rest are properly converted to
(POSIXct) date-times. Note that this is not just a display issue:

 $ unclass(d12)
 [1] 1065996000 1212444000 1048978800
 attr(,"tzone")
 [1] ""
 $ unclass(d123)
 [1] 1065996900 1212497100
 attr(,"tzone")
 [1] ""

I can only reproduce this on Windows; on Linux all the strings are
converted to date-times (the last one to 2003-03-30 01:00:00 CET).
However, if ones specifies a completely invalid time, e.g., 25:00, the
same thing does happen on Linux (2.14.2 Patched). I think the right/best
behaviour would be to convert the invalid date-time string to NA and
convert the other ones proper POSIXct date-times, and perhaps issue a
warning about NAs being generated.

(I originally discovered this problem on data from an Oracle database,
using sqlQuery() from the RODBC package, which automatically converts
date-times to date-times in current timezone (except if you specify
as.is=TRUE), and was surprised that for some queries the date-times were
truncated to dates. A warning that parts of the data were invalid would
be very welcome.)


Version details (for Windows):

$ version
_
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  14.2
year   2012
month  02
day29
svn rev58522
language   R
version.string R version 2.14.2 (2012-02-29)

$ sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 
LC_CTYPE=Norwegian-Nynorsk_Norway.1252   
LC_MONETARY=Norwegian-Nynorsk_Norway.1252
[4] LC_NUMERIC=C 
LC_TIME=Norwegian-Nynorsk_Norway.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils methods   base

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Invalid date-times and as.POSIXct problems (remotely related to DST issues)

2012-03-14 Thread Karl Ove Hufthammer
Karl Ove Hufthammer wrote:
> I think this should be handled as a bug, but I’m not sure which
> platforms and versions it applies to, so I’m writing to this list.

No response, so I‘ve filed a bug at
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14845
(with some additional info).

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Add links to NEWS and CHANGES on help.start() page

2009-11-13 Thread Karl Ove Hufthammer
On Fri, 13 Nov 2009 09:37:31 +0100 Henrik Bengtsson 
 wrote:
> I'd like to recommend that links to (local) NEWS and CHANGES are added
> to the help.start() overview pages.  help("NEWS")/help("CHANGE LOG")
> and help("CHANGES") could display/refer to them as well.

Are you talking of the NEWS and CHANGES for R itself, or for packages 
too? It would be very useful having a convenience function for this for 
packages too. Perhaps something like

library(news=MASS) (or MASS as a character string)
and
library(changes=spdep)

similar to library(help=MASS)

Or have I overlooked something, and a function for this already exists?

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Add links to NEWS and CHANGES on help.start() page

2009-11-13 Thread Karl Ove Hufthammer
On Fri, 13 Nov 2009 14:31:10 +0100 Romain Francois 
 wrote:
> > Or have I overlooked something, and a function for this already exists?
> 
> ?news

I know about the 'news' function, but that doesn't *show* the NEWS or 
CHANGES file for a package, at least not in any useful format.

The feature I'd prefer doesn't require any fancy parsing, just an 
ordinary listing of the contents of the text files NEWS/CHANGES (in a 
separate window, or perhaps opened in the user's browser).

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R on Windows crashes when using certain characters in strings in data frames (PR#14125)

2009-12-11 Thread Karl Ove Hufthammer
On Thu, 10 Dec 2009 10:20:09 +0100 (CET) k...@huftis.org 
 wrote:

> The following commands trigger the crash for me:
> 
> n=1e5
> k=10
> x=sample(k,n,replace=TRUE)
> y=sample(k,n,replace=TRUE)
> xy=paste(x,y,sep=" × ")
> z=sample(n)
> d=data.frame(xy,z)

Note: On the R Bug Tracking System Web site, the character causing the 
problem seems to be incorrectly displayed as a '.', though on the 
mailing list the correct character is used. The character should be the 
multiplication symbol, U+00D7, which looks similar to an 'x'. The 
character does exist in both ISO 8859-1 and Windows-1252.

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] PGF Device

2007-01-31 Thread Karl Ove Hufthammer
jtxx000 skreiv:

> PGF is a package for LaTeX which works with both ps
> and pdf output without any nasty hacks like pictex.
> Is there any technical reason why there could not be a
> PGF graphic device for R?

Not that I can think of. PGF is certainly powerful enough 
for this.

> If not, I'm going to try to throw one together.

Sounds wonderful. I am sure this will be useful for a lot 
of people.

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ** operator

2008-05-16 Thread Karl Ove Hufthammer
Peter Dalgaard:

> Not really, just transcribed during the lexical analysis phase:
> 
> case '*':
> if (nextchar('*'))
> c='^';
> yytext[0] = c;
> yytext[1] = '\0';
> yylval = install(yytext);
> return c;
> 
> (There's no "->" function either...)

You can also use expression() to see what various expressions are parsed as:

  > expression(2**5)
  expression(2^5)

  > expression(3->x)
  expression(x <- 3)

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] significant digits (PR#9682)

2008-06-04 Thread Karl Ove Hufthammer
Duncan Murdoch:

> The number 0.12345 is not exactly representable, but (I think) it is
> represented by something slightly closer to 0.1235 than to 0.1234.

I like using formatC for checking such things. On my (Linux) system, I get:

$ formatC(.12345,digits=50)
[1] "0.12345417443857259058859199285507202148"

> So it looks as though Windows gets it right.

-- 
Karl Ove Hufthammer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] digits in summary.default

2006-09-15 Thread Karl Ove Hufthammer
Martin Maechler skreiv:

> Since I've now seen the code of summary.default in S-plus 6.2,
> I'm not in a good position to propose a code change here ---
> unless Insightful ``donates'' their 3 lines of implementation to
> R  {which I think would be quite fair given the recent flurry of
> things they've recently ported into S-plus 8.x}

It's also possible to be a bit smarter in specific cases. See for example
the LaTeX table functions for regression summaries in the Dmisc package[1],
which uses the magnitude of the standard errors to dermine the number of
digits shown for estimates (s.t. the number of digits vary for each row/
estimate).

[1] Not on CRAN. See http://www.menne-biomed.de/download/download.html

-- 
Karl Ove Hufthammer
E-mail and Jabber: [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [patch] Support many columns in model.matrix

2016-02-26 Thread Karl Millar via R-devel
Generating a model matrix with very large numbers of columns overflows
the stack and/or runs very slowly, due to the implementation of
TrimRepeats().

This patch modifies it to use Rf_duplicated() to find the duplicates.
This makes the running time linear in the number of columns and
eliminates the recursive function calls.

Thanks
Index: src/library/stats/src/model.c
===
--- src/library/stats/src/model.c	(revision 70230)
+++ src/library/stats/src/model.c	(working copy)
@@ -1259,11 +1259,12 @@
 
 static int TermZero(SEXP term)
 {
-int i, val;
-val = 1;
-for (i = 0; i < nwords; i++)
-	val = val && (INTEGER(term)[i] == 0);
-return val;
+for (int i = 0; i < nwords; i++) {
+if (INTEGER(term)[i] != 0) {
+return 0;
+}
+}
+return 1;
 }
 
 
@@ -1271,11 +1272,12 @@
 
 static int TermEqual(SEXP term1, SEXP term2)
 {
-int i, val;
-val = 1;
-for (i = 0; i < nwords; i++)
-	val = val && (INTEGER(term1)[i] == INTEGER(term2)[i]);
-return val;
+for (int i = 0; i < nwords; i++) {
+if (INTEGER(term1)[i] != INTEGER(term2)[i]) {
+return 0;
+}
+}
+return 1;
 }
 
 
@@ -1303,18 +1305,37 @@
 
 
 /* TrimRepeats removes duplicates of (bit string) terms 
-   in a model formula by repeated use of ``StripTerm''.
+   in a model formula.
Also drops zero terms. */
 
 static SEXP TrimRepeats(SEXP list)
 {
-if (list == R_NilValue)
-	return R_NilValue;
-/* Highly recursive */
-R_CheckStack();
-if (TermZero(CAR(list)))
-	return TrimRepeats(CDR(list));
-SETCDR(list, TrimRepeats(StripTerm(CAR(list), CDR(list;
+// Drop zero terms at the start of the list.
+while (list != R_NilValue && TermZero(CAR(list))) {
+	list = CDR(list);
+}
+if (list == R_NilValue || CDR(list) == R_NilValue)
+	return list;
+
+// Find out which terms are duplicates.
+SEXP all_terms = PROTECT(Rf_PairToVectorList(list));
+SEXP duplicate_sexp = PROTECT(Rf_duplicated(all_terms, FALSE));
+int* is_duplicate = LOGICAL(duplicate_sexp);
+int i = 0;
+
+// Remove the zero terms and duplicates from the list.
+for (SEXP current = list; CDR(current) != R_NilValue; i++) {
+	SEXP next = CDR(current);
+
+	if (is_duplicate[i + 1] || TermZero(CAR(next))) {
+	// Remove the node from the list.
+	SETCDR(current, CDR(next));
+	} else {
+	current = next;
+	}
+}
+
+UNPROTECT(2);
 return list;
 }
 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [patch] Support many columns in model.matrix

2016-02-29 Thread Karl Millar via R-devel
Thanks.

Couldn't you implement model.matrix(..., sparse = TRUE)  with a small
amount of R code similar to MatrixModels::model.Matrix ?

On Mon, Feb 29, 2016 at 10:01 AM, Martin Maechler
 wrote:
>>>>>> Karl Millar via R-devel 
>>>>>> on Fri, 26 Feb 2016 15:58:20 -0800 writes:
>
> > Generating a model matrix with very large numbers of
> > columns overflows the stack and/or runs very slowly, due
> > to the implementation of TrimRepeats().
>
> > This patch modifies it to use Rf_duplicated() to find the
> > duplicates.  This makes the running time linear in the
> > number of columns and eliminates the recursive function
> > calls.
>
> Thank you, Karl.
> I've committed this (very slightly modified) to R-devel,
>
> (also after looking for a an example that runs on a non-huge
>  computer and shows the difference) :
>
> nF <- 11 ; set.seed(1)
> lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), 
> letters[1:nF])
> str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels))
> ## 'data.frame':128 obs. of  11 variables:
> ##  $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ...
> ##  $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ...
> ##  $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ...
> ##  $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ...
> ##  $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ...
> ##  $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ...
> ##  $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ...
> ##  $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ...
> ##  $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
> ##  $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ...
> ##  $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ...
> ##
> ## [1] 139968
>
> system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = 
> "contr.helmert")))
> ##  user  system elapsed
> ## 0.255   0.033   0.287  --- *with* the patch on my desktop (16 GB)
> ## 1.489   0.031   1.522  --- for R-patched (i.e. w/o the patch)
>
>> dim(mff)
> [1]128 139968
>> object.size(mff)
> 154791504 bytes
>
> ---
>
> BTW: These example would gain tremendously if I finally got
>  around to provide
>
>model.matrix(, sparse = TRUE)
>
> which would then produce a Matrix-package sparse matrix.
>
> Even for this somewhat small case, a sparse matrix is a factor
> of 13.5 x smaller :
>
>> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); 
>> as.vector( s1/s2 )
> [1] 13.47043
>
> I'm happy to collaborate with you on adding such a (C level)
> interface to sparse matrices for this case.
>
> Martin Maechler

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Undocumented 'use.names' argument to c()

2016-09-20 Thread Karl Millar via R-devel
'c' has an undocumented 'use.names' argument.  I'm not sure if this is
a documentation or implementation bug.

> c(a = 1)
a
1
> c(a = 1, use.names = F)
[1] 1

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Undocumented 'use.names' argument to c()

2016-09-23 Thread Karl Millar via R-devel
I'd expect that a lot of the performance overhead could be eliminated
by simply improving the underlying code.  IMHO, we should ignore it in
deciding the API that we want here.

On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson
 wrote:
> I'd vote for it to stay.  It could of course suprise someone who'd
> expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2,
> use.names=FALSE).   On the upside, is the performance gain from using
> use.names=FALSE.  Below benchmarks show that the combining of the
> names attributes themselves takes ~20-25 times longer than the
> combining of the integers themselves.  Also, at no surprise,
> use.names=FALSE avoids some memory allocations.
>
>> options(digits = 2)
>>
>> a <- b <- c <- d <- 1:1e4
>> names(c) <- c
>> names(d) <- d
>>
>> stats <- microbenchmark::microbenchmark(
> +   c(a, b, use.names=FALSE),
> +   c(c, d, use.names=FALSE),
> +   c(a, d, use.names=FALSE),
> +   c(a, b, use.names=TRUE),
> +   c(a, d, use.names=TRUE),
> +   c(c, d, use.names=TRUE),
> +   unit = "ms"
> + )
>>
>> stats
> Unit: milliseconds
>expr   minlq  mean medianuq   max neval
>  c(a, b, use.names = FALSE) 0.031 0.032 0.049  0.034 0.036 1.474   100
>  c(c, d, use.names = FALSE) 0.031 0.031 0.035  0.034 0.035 0.064   100
>  c(a, d, use.names = FALSE) 0.031 0.031 0.049  0.034 0.035 1.452   100
>   c(a, b, use.names = TRUE) 0.031 0.031 0.055  0.034 0.036 2.094   100
>   c(a, d, use.names = TRUE) 0.510 0.526 0.588  0.549 0.617 1.998   100
>   c(c, d, use.names = TRUE) 0.780 0.815 0.886  0.841 0.944 1.430   100
>
>> profmem::profmem(c(c, d, use.names=FALSE))
> Rprofmem memory profiling of:
> c(c, d, use.names = FALSE)
>
> Memory allocations:
>   bytes  calls
> 1 80040 
> total 80040
>
>> profmem::profmem(c(c, d, use.names=TRUE))
> Rprofmem memory profiling of:
> c(c, d, use.names = TRUE)
>
> Memory allocations:
>bytes  calls
> 1  80040 
> 2 160040 
> total 240080
>
> /Henrik
>
> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel
>  wrote:
>> In Splus c() and unlist() called the same C code, but with a different
>> 'sys_index'  code (the last argument to .Internal) and c() did not consider
>> an argument named 'use.names' special.
>>
>>> c
>> function(..., recursive = F)
>> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1)
>>> unlist
>> function(data, recursive = T, use.names = T)
>> .Internal(unlist(data, recursive = recursive, use.names = use.names),
>> "S_unlist", TRUE, 2)
>>> c(A=1,B=2,use.names=FALSE)
>>  A B use.names
>>  1 2 0
>>
>> The C code used sys_index==2 to mean 'the last  argument is the 'use.names'
>> argument, if sys_index==1 only the recursive argument was considered
>> special.
>>
>> Sys.funs.c:
>>  405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator)
>>  406 {
>>  407 int which = sys_index; boolean named, recursive, names;
>>  ...
>>  419 args = arglist->value.tree; n = arglist->length;
>>  ...
>>  424 names = which==2 ? logical_value(args[--n], ent, S_evaluator)
>> : (which == 1);
>>
>> Thus there is no historical reason for giving c() the use.names argument.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via
>> R-devel  wrote:
>>
>>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/
>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no 'use.names'
>>> argument.
>>>
>>> Because 'c' is a generic function, I don't think that changing formal
>>> arguments is good.
>>>
>>> In R devel r71344, 'use.names' is not an argument of functions 'c.Date',
>>> 'c.POSIXct' and 'c.difftime'.
>>>
>>> Could 'use.names' be documented to be accepted by the default method of
>>> 'c', but not listed as a formal argument of 'c'? Or, could the code that
>>> handles the argument name 'use.names' be removed?
>>> 
>>> >>>>> David Winsemius 
>>> >>>>> on Tue, 20 Sep 2016 23:46:48 -0700 writes:
>>>
>>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via

[Rd] Is importMethodsFrom actually needed?

2016-11-02 Thread Karl Millar via R-devel
IIUC, loading a namespace automatically registers all the exported
methods as long as the generic can be found when the namespace gets
loaded.  Generics can be exported and imported as regular functions.

In that case, code in a package should be able to simply import the
generic and the methods will automatically work correctly without any
need for importMethodsFrom.

Is there something that I'm missing here?  What breaks if you don't
explicitly import methods?

Thanks,

Karl

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Upgrading a package to which other packages are LinkingTo

2016-12-16 Thread Karl Millar via R-devel
A couple of points:
  - rebuilding dependent packages is needed if there is an ABI change,
not just an API change.  For packages like Rcpp which export inline
functions or macros that might have changed, this is potentially any
change to existing functions, but for packages like Matrix, it isn't
really an issue at all IIUC.

  - If we're looking into a way to check if package APIs are
compatible, then that's something that's relevant for all packages,
since they all export an R API.  I believe that CRAN only tests
package compatibility with the most recent versions of packages on
CRAN that import or depend on it.  There's no guarantee that a package
update won't contain API or behaviour changes that breaks older
versions of packages, packages not on CRAN or any scripts that use the
package, and these sorts of breakages do happen semi-regularly.

 - AFAICT, the only difference with packages like Rcpp is that you can
potentially have all of your CRAN packages at the latest version, but
some of them might have inlined code from an older version of Rcpp
even after running update.packages().  While that is an issue, in my
experience that's been a lot less trouble than the general case of
backwards compatibility.

Karl

On Fri, Dec 16, 2016 at 8:19 AM, Dirk Eddelbuettel  wrote:
>
> On 16 December 2016 at 11:00, Duncan Murdoch wrote:
> | On 16/12/2016 10:40 AM, Dirk Eddelbuettel wrote:
> | > On 16 December 2016 at 10:14, Duncan Murdoch wrote:
> | > | On 16/12/2016 8:37 AM, Dirk Eddelbuettel wrote:
> | > | >
> | > | > On 16 December 2016 at 08:20, Duncan Murdoch wrote:
> | > | > | Perhaps the solution is to recommend that packages which export 
> their
> | > | > | C-level entry points either guarantee them not to change or offer
> | > | > | (require?) version checks by user code.  So dplyr should start out 
> by
> | > | > | saying "I'm using Rcpp interface 0.12.8".  If Rcpp has a new version
> | > | > | with a compatible interface, it replies "that's fine".  If Rcpp has
> | > | > | changed its interface, it says "Sorry, I don't support that any 
> more."
> | > | >
> | > | > We try. But it's hard, and I'd argue, likely impossible.
> | > | >
> | > | > For example I even added a "frozen" package [1] in the sources / unit 
> tests
> | > | > to test for just this. In practice you just cannot hit every possible 
> access
> | > | > point of the (rich, in our case) API so the tests pass too often.
> | > | >
> | > | > Which is why we relentlessly test against reverse-depends to _at 
> least ensure
> | > | > buildability_ from our releases.
> | >
> | > I meant to also add:  "... against a large corpus of other packages."
> | > The intent is to empirically answer this.
> | >
> | > | > As for seamless binary upgrade, I don't think in can work in 
> practice.  Ask
> | > | > Uwe one day we he rebuilds everything every time on Windows. And for 
> what it
> | > | > is worth, we essentially do the same in Debian.
> | > | >
> | > | > Sometimes you just need to rebuild.  That may be the price of 
> admission for
> | > | > using the convenience of rich C++ interfaces.
> | > | >
> | > |
> | > | Okay, so would you say that Kirill's suggestion is not overkill?  Every
> | > | time package B uses LinkingTo: A, R should assume it needs to rebuild B
> | > | when A is updated?
> | >
> | > Based on my experience is a "halting problem" -- i.e. cannot know ex ante.
> | >
> | > So "every time" would be overkill to me.  Sometimes you know you must
> | > recompile (but try to be very prudent with public-facing API).  Many times
> | > you do not. It is hard to pin down.
> | >
> | > At work we have a bunch of servers with Rcpp and many packages against 
> them
> | > (installed system-wide for all users). We _very really_ needs rebuild.
>
> Edit:  "We _very rarely_ need rebuilds" is what was meant there.
>
> | So that comes back to my suggestion:  you should provide a way for a
> | dependent package to ask if your API has changed.  If you say it hasn't,
> | the package is fine.  If you say it has, the package should abort,
> | telling the user they need to reinstall it.  (Because it's a hard
> | question to answer, you might get it wrong and say it's fine when it's
> | not.  But that's easy to fix:  just make a new release that does require
>
> Sure.
>
> We have always increased the higher-order version number when that is needed.
>
> One problem with your proposal is that the testing code may run after the
> package load, and in the case where it matters ... that very code may not get
> reached because the package didn't load.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-20 Thread Karl Millar via R-devel
It's not always clear when it's safe to remove the DLL.

The main problem that I'm aware of is that native objects with
finalizers might still exist (created by R_RegisterCFinalizer etc).
Even if there are no live references to such objects (which would be
hard to verify), it still wouldn't be safe to unload the DLL until a
full garbage collection has been done.

If the DLL is unloaded, then the function pointer that was registered
now becomes a pointer into the memory where the DLL was, leading to an
almost certain crash when such objects get garbage collected.

A better approach would be to just remove the limit on the number of
DLLs, dynamically expanding the array if/when needed.


On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms  wrote:
> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
>  wrote:
>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>> packages don't unload their DLLs when they being unloaded themselves.
>
> I am surprised by this. Why does R not do this automatically? What is
> the case for keeping the DLL loaded after the package has been
> unloaded? What happens if you reload another version of the same
> package from a different library after unloading?
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c

2016-12-21 Thread Karl Millar via R-devel
It does, but you'd still be relying on the R code ensuring that all of
these objects are dead prior to unloading the DLL, otherwise they'll
survive the GC.  Maybe if the package counted how many such objects
exist, it could work out when it's safe to remove the DLL.  I'm not
sure that it can be done automatically.

What could be done is to to keep the DLL loaded, but remove it from
R's table of loaded DLLs.  That way, there's no risk of dangling
function pointers and a new DLL of the same name could be loaded.  You
could still run into issues though as some DLLs assume that the
associated namespace exists.

Currently what I do is to never unload DLLs.  If I need to replace
one, then I just restart R.  It's less convenient, but it's always
correct.


On Wed, Dec 21, 2016 at 9:10 AM, Henrik Bengtsson
 wrote:
> On Tue, Dec 20, 2016 at 7:39 AM, Karl Millar  wrote:
>> It's not always clear when it's safe to remove the DLL.
>>
>> The main problem that I'm aware of is that native objects with
>> finalizers might still exist (created by R_RegisterCFinalizer etc).
>> Even if there are no live references to such objects (which would be
>> hard to verify), it still wouldn't be safe to unload the DLL until a
>> full garbage collection has been done.
>>
>> If the DLL is unloaded, then the function pointer that was registered
>> now becomes a pointer into the memory where the DLL was, leading to an
>> almost certain crash when such objects get garbage collected.
>
> Very good point.
>
> Does base::gc() perform such a *full* garbage collection and thereby
> trigger all remaining finalizers to be called?  In other words, do you
> think an explicit call to base::gc() prior to cleaning out left-over
> DLLs (e.g. R.utils::gcDLLs()) would be sufficient?
>
> /Henrik
>
>>
>> A better approach would be to just remove the limit on the number of
>> DLLs, dynamically expanding the array if/when needed.
>>
>>
>> On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms  
>> wrote:
>>> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson
>>>  wrote:
>>>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some
>>>> packages don't unload their DLLs when they being unloaded themselves.
>>>
>>> I am surprised by this. Why does R not do this automatically? What is
>>> the case for keeping the DLL loaded after the package has been
>>> unloaded? What happens if you reload another version of the same
>>> package from a different library after unloading?
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unlicense

2017-01-17 Thread Karl Millar via R-devel
Please don't use 'Unlimited' or 'Unlimited + ...'.

Google's lawyers don't recognize 'Unlimited' as being open-source, so
our policy doesn't allow us to use such packages due to lack of an
acceptable license.  To our lawyers, 'Unlimited + file LICENSE' means
something very different than it presumably means to Uwe.

Thanks,

Karl

On Sat, Jan 14, 2017 at 12:10 AM, Uwe Ligges
 wrote:
> Dear all,
>
> from "Writing R Extensions":
>
> The string ‘Unlimited’, meaning that there are no restrictions on
> distribution or use other than those imposed by relevant laws (including
> copyright laws).
>
> If a package license restricts a base license (where permitted, e.g., using
> GPL-3 or AGPL-3 with an attribution clause), the additional terms should be
> placed in file LICENSE (or LICENCE), and the string ‘+ file LICENSE’ (or ‘+
> file LICENCE’, respectively) should be appended to the
> corresponding individual license specification.
> ...
> Please note in particular that “Public domain” is not a valid license, since
> it is not recognized in some jurisdictions."
>
> So perhaps you aim for
> License: Unlimited
>
> Best,
> Uwe Ligges
>
>
>
>
>
> On 14.01.2017 07:53, Deepayan Sarkar wrote:
>>
>> On Sat, Jan 14, 2017 at 5:49 AM, Duncan Murdoch
>>  wrote:
>>>
>>> On 13/01/2017 3:21 PM, Charles Geyer wrote:
>>>>
>>>>
>>>> I would like the unlicense (http://unlicense.org/) added to R
>>>> licenses.  Does anyone else think that worthwhile?
>>>>
>>>
>>> That's a question for you to answer, not to ask.  Who besides you thinks
>>> that it's a good license for open source software?
>>>
>>> If it is recognized by the OSF or FSF or some other authority as a FOSS
>>> license, then CRAN would probably also recognize it.  If not, then CRAN
>>> doesn't have the resources to evaluate it and so is unlikely to recognize
>>> it.
>>
>>
>> Unlicense is listed in https://spdx.org/licenses/
>>
>> Debian does include software "licensed" like this, and seems to think
>> this is one way (not the only one) of declaring something to be
>> "public domain".  The first two examples I found:
>>
>> https://tracker.debian.org/media/packages/r/rasqal/copyright-0.9.29-1
>>
>> https://tracker.debian.org/media/packages/w/wiredtiger/copyright-2.6.1%2Bds-1
>>
>> This follows the format explained in
>>
>> https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-specification,
>> which does not explicitly include Unlicense, but does include CC0,
>> which AFAICT is meant to formally license something so that it is
>> equivalent to being in the public domain. R does include CC0 as a
>> shorthand (e.g., geoknife).
>>
>> https://www.debian.org/legal/licenses/ says that
>>
>> 
>>
>> Licenses currently found in Debian main include:
>>
>> - ...
>> - ...
>> - public domain (not a license, strictly speaking)
>>
>> 
>>
>> The equivalent for CRAN would probably be something like "License:
>> public-domain + file LICENSE".
>>
>> -Deepayan
>>
>>> Duncan Murdoch
>>>
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] unlicense

2017-01-17 Thread Karl Millar via R-devel
Unfortunately, our lawyers say that they can't give legal advice in
this context.

My question would be, what are people looking for that the MIT or
2-clause BSD license don't provide?  They're short, clear, widely
accepted and very permissive.  Another possibility might be to
dual-license packages with both an OSI-approved license and
whatever-else-you-like, e.g.  'MIT | ', but IIUC
there's a bunch more complexity there than just using an OSI-approved
license.

Karl


On Tue, Jan 17, 2017 at 3:35 PM, Uwe Ligges
 wrote:
>
>
> On 18.01.2017 00:13, Karl Millar wrote:
>>
>> Please don't use 'Unlimited' or 'Unlimited + ...'.
>>
>> Google's lawyers don't recognize 'Unlimited' as being open-source, so
>> our policy doesn't allow us to use such packages due to lack of an
>> acceptable license.  To our lawyers, 'Unlimited + file LICENSE' means
>> something very different than it presumably means to Uwe.
>
>
>
> Karl,
>
> thanks for this comment. What we like to hear now is a suggestion what the
> maintainer is supposed to do to get what he aims at, as we already know that
> "freeware" does not work at all and was hard enough to get to the
> "Unlimited" options.
>
> We have many CRAN requests asking for what they should write for "freeware".
> Can we get an opinion from your layers which standard license comes closest
> to what these maintainers probably aim at and will work more or less
> globally, i.e. not only in the US?
>
> Best,
> Uwe
>
>
>
>
>> Thanks,
>>
>> Karl
>>
>> On Sat, Jan 14, 2017 at 12:10 AM, Uwe Ligges
>>  wrote:
>>>
>>> Dear all,
>>>
>>> from "Writing R Extensions":
>>>
>>> The string ‘Unlimited’, meaning that there are no restrictions on
>>> distribution or use other than those imposed by relevant laws (including
>>> copyright laws).
>>>
>>> If a package license restricts a base license (where permitted, e.g.,
>>> using
>>> GPL-3 or AGPL-3 with an attribution clause), the additional terms should
>>> be
>>> placed in file LICENSE (or LICENCE), and the string ‘+ file LICENSE’ (or
>>> ‘+
>>> file LICENCE’, respectively) should be appended to the
>>> corresponding individual license specification.
>>> ...
>>> Please note in particular that “Public domain” is not a valid license,
>>> since
>>> it is not recognized in some jurisdictions."
>>>
>>> So perhaps you aim for
>>> License: Unlimited
>>>
>>> Best,
>>> Uwe Ligges
>>>
>>>
>>>
>>>
>>>
>>> On 14.01.2017 07:53, Deepayan Sarkar wrote:
>>>>
>>>>
>>>> On Sat, Jan 14, 2017 at 5:49 AM, Duncan Murdoch
>>>>  wrote:
>>>>>
>>>>>
>>>>> On 13/01/2017 3:21 PM, Charles Geyer wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I would like the unlicense (http://unlicense.org/) added to R
>>>>>> licenses.  Does anyone else think that worthwhile?
>>>>>>
>>>>>
>>>>> That's a question for you to answer, not to ask.  Who besides you
>>>>> thinks
>>>>> that it's a good license for open source software?
>>>>>
>>>>> If it is recognized by the OSF or FSF or some other authority as a FOSS
>>>>> license, then CRAN would probably also recognize it.  If not, then CRAN
>>>>> doesn't have the resources to evaluate it and so is unlikely to
>>>>> recognize
>>>>> it.
>>>>
>>>>
>>>>
>>>> Unlicense is listed in https://spdx.org/licenses/
>>>>
>>>> Debian does include software "licensed" like this, and seems to think
>>>> this is one way (not the only one) of declaring something to be
>>>> "public domain".  The first two examples I found:
>>>>
>>>> https://tracker.debian.org/media/packages/r/rasqal/copyright-0.9.29-1
>>>>
>>>>
>>>> https://tracker.debian.org/media/packages/w/wiredtiger/copyright-2.6.1%2Bds-1
>>>>
>>>> This follows the format explained in
>>>>
>>>>
>>>> https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-specification,
>>>> which does not explicitly include Unlicense, but does include CC0,
>>>> which AFAICT is meant to formally license something so that it is
>>>> equivalent to being in the public domain. R does include CC0 as a
>>>> shorthand (e.g., geoknife).
>>>>
>>>> https://www.debian.org/legal/licenses/ says that
>>>>
>>>> 
>>>>
>>>> Licenses currently found in Debian main include:
>>>>
>>>> - ...
>>>> - ...
>>>> - public domain (not a license, strictly speaking)
>>>>
>>>> 
>>>>
>>>> The equivalent for CRAN would probably be something like "License:
>>>> public-domain + file LICENSE".
>>>>
>>>> -Deepayan
>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>> __
>>>>> R-devel@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
>>>> __
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]

2017-03-07 Thread Karl Millar via R-devel
Is there anything that actually requires R core members to manually do
significant amounts of work here?

IIUC, you can do a CRAN run to detect the broken packages, and a simple
script can collect the emails of the affected maintainers, so you can send
a single email to them all.  If authors don't respond by fixing their
packages, then those packages should be archived, since there's high
probability of those packages being buggy anyway.

If you expect a non-trivial amount of questions regarding this change from
the affected package maintainers, then you can create a FAQ page for it,
which you can fill in as questions arrive, so you don't get too many
duplicated questions.

Karl

On Mon, Mar 6, 2017 at 4:51 AM, Martin Maechler 
wrote:

> >>>>> Michael Lawrence 
> >>>>> on Sat, 4 Mar 2017 12:20:45 -0800 writes:
>
> > Is there really a need for these complications? Packages
> > emitting this warning are broken by definition and should be fixed.
>
> I agree and probably Henrik, too.
>
> (Others may disagree to some extent .. and find it convenient
>  that R does translate 'if(x)'  to  'if(x[1])'  for them albeit
>  with a warning .. )
>
> > Perhaps we could "flip the switch" in a test
> > environment and see how much havoc is wreaked and whether
> > authors are sufficiently responsive?
>
> > Michael
>
> As we have > 10'000 packages on CRAN alonce,  and people have
> started (mis)using suppressWarnings(.) in many places,  there
> may be considerably more packages affected than we optimistically assume...
>
> As R core member who would  "flip the switch"  I'd typically then
> have to be the one sending an e-mail to all package maintainers
> affected and in this case I'm very reluctant to volunteer
> for that and so, I'd prefer the environment variable where R
> core and others can decide how to use it .. for a while .. until
> the flip is switched for all.
>
> or have I overlooked an issue?
>
> Martin
>
> > On Sat, Mar 4, 2017 at 12:04 PM, Martin Maechler
> >  >> wrote:
>
> >> >>>>> Henrik Bengtsson  >>>>>
> >> on Fri, 3 Mar 2017 10:10:53 -0800 writes:
> >>
> >> > On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham >
> >>  wrote: >>> But, how you propose a
> >> warning-to-error transition >>> should be made without
> >> wreaking havoc?  Just flip the >>> switch in R-devel and
> >> see CRAN and Bioconductor packages >>> break overnight?
> >> Particularly Bioconductor devel might >>> become
> >> non-functional (since at times it requires >>> R-devel).
> >> For my own code / packages, I would be able >>> to handle
> >> such a change, but I'm completely out of >>> control if
> >> one of the package I'm depending on does not >>> provide
> >> a quick fix (with the only option to remove >>> package
> >> tests for those dependencies).
> >> >>
> >> >> Generally, a package can not be on CRAN if it has any
> >> >> warnings, so I don't think this change would have any
> >> >> impact on CRAN packages.  Isn't this also true for >>
> >> bioconductor?
> >>
> >> > Having a tests/warn.R file with:
> >>
> >> > warning("boom")
> >>
> >> > passes through R CMD check --as-cran unnoticed.
> >>
> >> Yes, indeed.. you are right Henrik that many/most R
> >> warning()s would not produce R CMD check 'WARNING's ..
> >>
> >> I think Hadley and I fell into the same mental pit of
> >> concluding that such warning()s from
> >> if() ...  would not currently happen
> >> in CRAN / Bioc packages and hence turning them to errors
> >> would not have a direct effect.
> >>
> >> With your 2nd e-mail of saying that you'd propose such an
> >> option only for a few releases of R you've indeed
> >> clarified your intent to me.  OTOH, I would prefer using
> >> an environment variable (as you've proposed as an
> >> alternative) which is turned "active" at the beginning
> >> only manually or for the "CRAN incoming" checks of the
> >> CRAN team (and bioconductor submission checks?)  and
> >> later for '--as-cran' etc until it eventually becomes the
> >> unconditional behavior of R (and the env.variable is no
> >> longer used).
> >>
> >> Martin
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
>
> >   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] segfault when trying to allocate a large vector

2014-12-18 Thread Karl Millar via R-devel
Hi Pierrick,

You're storing largevec on the stack, which is probably causing a stack
overflow.  Allocate largvec on the heap with malloc or one of the R memory
allocation routines instead and it should work fine.

Karl

On Thu, Dec 18, 2014 at 12:00 AM, Pierrick Bruneau 
wrote:
>
> Dear R contributors,
>
> I'm running into trouble when trying to allocate some large (but in
> theory viable) vector in the context of C code bound to R through
> .Call(). Here is some sample code summarizing the problem:
>
> SEXP test() {
>
> int size = 1000;
> double largevec[size];
> memset(largevec, 0, size*sizeof(double));
> return(R_NilValue);
>
> }
>
> If size if small enough (up to 10^6), everything is fine. When it
> reaches 10^7 as above, I get a segfault. As far as I know, a double
> value is represented with 8 bytes, which would make largevec above
> approx. 80Mo -> this is certainly large for a single variable, but
> should remain well below the limits of my machine... Also, doing a
> calloc for the same vector size leads to the same outcome.
>
> In my package, I would use large vectors that cannot be assumed to be
> sparse - so utilities for sparse matrices may not be considered.
>
> I run R on ubuntu 64-bit, with 8G RAM, and a 64-bit R build (3.1.2).
> As my problem looks close to that seen in
> http://r.789695.n4.nabble.com/allocMatrix-limits-td864864.html,
> following what I have seen in ?"Memory-limits" I checked that ulimit
> -v returns "unlimited".
>
> I guess I must miss something, like contiguity issues, or other. Does
> anyone have a clue for me?
>
> Thanks by advance,
> Pierrick
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Makefile: add support for git svn clones

2015-01-19 Thread Karl Millar via R-devel
Fellipe,

CXXR development has moved to github, and we haven't fixed up the build for
using git yet.  Could you send a pull request with your change to the repo
at https://github.com/cxxr-devel/cxxr/?

Also, this patch may be useful for pqR too.
https://github.com/radfordneal/pqR

Thanks

On Mon, Jan 19, 2015 at 2:35 PM, Dirk Eddelbuettel  wrote:

>
> On 19 January 2015 at 17:11, Duncan Murdoch wrote:
> | The people who would have to maintain the patch can't test it.
>
> I don't understand this.
>
> The patch, as we may want to recall, was all of
>
>+GIT := $(shell if [ -d "$(top_builddir)/.git" ]; then \
>+echo "git"; fi)
>+
>
> and
>
>-  (cd $(srcdir); LC_ALL=C TZ=GMT svn info || $(ECHO) "Revision:
> -99") 2> /dev/null \
>+  (cd $(srcdir); LC_ALL=C TZ=GMT $(GIT) svn info || $(ECHO)
> "Revision: -99") 2> /dev/null \
>
> I believe you can test that builds works before applying the patch, and
> afterwards---even when you do not have git, or in this case a git checkout.
> The idiom of expanding a variable to "nothing" if not set is used all over
> the R sources and can be assumed common.  And if (hypothetically speaking)
> the build failed when a .git directory was present?  None of R Core's
> concern
> either as git was never supported.
>
> I really do not understand the excitement over this.  The patch is short,
> clean, simple, and removes an entirely unnecessary element of friction.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   >