Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hervé Pagès

Hi Paul,

On 11-12-07 10:29 AM, Roebuck,Paul L wrote:

Do this first and try again.

R>  Sys.setlocale("LC_COLLATE", "C")


OK I see it now (in ?Sys.setlocale):

  Sys.setlocale("LC_COLLATE", "C")   # turn off locale-specific sorting,
 #  usually

Thanks all for the answers!

I never really realized how far some collating sequence could go in
terms of counter-intuitiveness e.g. the fact that LC_COLLATE=en_CA.UTF-8
doesn't preserve the order of the strings when a common suffix is
added to them is scary. Also it's not that LC_COLLATE=en_CA.UTF-8
just ignores the '_' (underscores) and the '.' (dots), that can only be
the first pass, then it needs to break ties in a way that defines a
total order. So it looks like the exact definition of this collating
sequence is counter-intuitive and complicated.

Maybe that's just how things are and the developers that want
portability and reproducibility of their code are already putting
a Sys.setlocale("LC_COLLATE", "C") statement somewhere in their package
to force all their users to be on the same collating sequence.
It sounds a little bit drastic though and it might introduce some
conflicts with other packages.

So maybe a better approach is to only alter LC_COLLATE temporarily
inside the functions where it matters i.e. where the returned value
actually depends on the collating sequence? If I don't do this, then
there is no way I can write a test for my function because the
test would work for me but fail for someone else.

Actually this is the situation I was facing when I did my first post:
I have a function that downloads a list of sequences from the Ensembl
FTP server, sorts them by name, and returns them to the user. I have
a test for that function and the test was working for me when I was
doing

  tools::testInstalledPackage("MyPackage", "types="tests")

but it was failing when I was doing 'R CMD check'. It seems that
the latter alters LC_COLLATE before running the tests (maybe to
LC_COLLATE=C) but not the former. I fixed this by enforcing
LC_COLLATE=C inside my function.

A naive question: wouldn't everything be simpler if LC_COLLATE=C
was the default for everybody?

Thanks,
H.




On 12/7/11 3:41 AM, "Hervé Pagès"  wrote:


Hi,

This looks OK:


x<- c("_1_", "1_9", "2_9")
rank(x)

[1] 1 2 3

But this does not:


xa<- paste(x, "a", sep="")
xa

[1] "_1_a" "1_9a" "2_9a"

rank(xa)

[1] 2 1 3

Cheers,
H.


sessionInfo()

R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
   [5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
   [7] LC_PAPER=C LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.14.0






--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RcppArmadillo compilation error: R CMD SHLIB returns status 1

2011-12-08 Thread Paul Viefers
Having followed the suggestion by Duncan, the shell returned literally
nothing. No error message and no *.so file anywhere on my HDD.
I also suspect that there is something wrong with my compiler setup, which I
have struggled with quite a while. 

Maybe this is a good opportunity to write a less technical setup manual than
Appendix D in the Developers Guide, because I must have messed up
installation and setup.

 

Let me do my homework once more and see how it works. Is there a more easily
comprehensible step-by-step manual to get C++ compilation working with R?

 

Cheers,

Paul


On Dec 6, 2011 8:30 AM, "Duncan Murdoch" < 
murdoch.dun...@gmail.com> wrote:
>
> On 05/12/2011 1:22 PM, Paul Viefers wrote:
>>
>> Dear all,
>>
>> running the example by D. Eddebuettel (

http://dirk.eddelbuettel.com/blog/2011/04/23/) I get an error message.
Specifically, the R code I was taking from the above example is
>>
>> ### BEGIN EXAMPLE ###
>>
>> suppressMessages(require(RcppArmadillo))
>> suppressMessages(require(Rcpp))
>> suppressMessages(require(inline))
>> code<- '
>>arma::mat coeff = Rcpp::as(a);
>>arma::mat errors = Rcpp::as(e);
>>int m = errors.n_rows; int n = errors.n_cols;
>>arma::mat simdata(m,n);
>>simdata.row(0) = arma::zeros(1,n);
>>for (int row=1; row>  simdata.row(row) = simdata.row(row-1)*trans(coeff)+errors.row(row);
>>}
>>return Rcpp::wrap(simdata);
>>  '
>> ## create the compiled function
>> rcppSim<- cxxfunction(signature(a="numeric",e="numeric"),
>> code,plugin="RcppArmadillo")
>>
>> ### END OF EXAMPLE ###
>>
>> Executing this inside R, returned the following:
>>
>> ERROR(s) during compilation: source code errors or compiler configuration
errors!
>>
>> Program source:
>>   1:
>>   2: // includes from the plugin
>>   3: #include
>>   4: #include
>>   5:
>>   6:
>>   7: #ifndef BEGIN_RCPP
>>   8: #define BEGIN_RCPP
>>   9: #endif
>>  10:
>>  11: #ifndef END_RCPP
>>  12: #define END_RCPP
>>  13: #endif
>>  14:
>>  15: using namespace Rcpp;
>>  16:
>>  17:
>>  18: // user includes
>>  19:
>>  20:
>>  21: // declarations
>>  22: extern "C" {
>>  23: SEXP file33765791( SEXP a, SEXP e) ;
>>  24: }
>>  25:
>>  26: // definition
>>  27:
>>  28: SEXP file33765791( SEXP a, SEXP e ){
>>  29: BEGIN_RCPP
>>  30:
>>  31:arma::mat coeff = Rcpp::as(a);
>>  32:arma::mat errors = Rcpp::as(e);
>>  33:int m = errors.n_rows; int n = errors.n_cols;
>>  34:arma::mat simdata(m,n);
>>  35:simdata.row(0) = arma::zeros(1,n);
>>  36:for (int row=1; row>  37:  simdata.row(row) =
simdata.row(row-1)*trans(coeff)+errors.row(row);
>>  38:}
>>  39:return Rcpp::wrap(simdata);
>>  40:
>>  41: END_RCPP
>>  42: }
>>  43:
>>  44:
>> Error in compileCode(f, code, language = language, verbose = verbose) :
>>   Compilation ERROR, function(s)/method(s) not created!
>> Executing command 'C:/PROGRA~1/R/R-214~1.0/bin/i386/R CMD SHLIB
file33765791.cpp 2>  file33765791.cpp.err.txt' returned status 1
>>
>> I am working under R 2.14.0 and as the pros among you might guess, I am
new to using the C++ interfaces within R. I think all I have to do is to
edit some settings on my Windows 7 machine here, but the error message is
too cryptic to me. Alas, I could also not find any thread or help topic that
deals with this online. I appreciate any direct reply or reference where I
can find a solution to this.
>> Please let me know in case I am leaving out some essential details here.
>
>
> If you put the program source into a file (e.g. fn.cpp) and in a Windows
cmd shell you run
>
> R CMD SHLIB fn.cpp
>
> what do you get?   I would guess you've got a problem with your setup of
the compiler or other tools, and this would likely show it.

I don't think that will work because you need the appropriate -I option to
get the headers from the RcppArmadillo package.  It may be easier to use the
RcppArmadillo.package.skeleton function to create a package.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Gordon Brown
Hi, folks,

Underscores are, in fact, ignored in some collation orders, including (if I
recall correctly) en_CA.UTF-8.  It's caused me a bit of confusion now and
then.  No idea about "English_United States.1252", but from the fact that
Joris' example does not agree with Hervé's, it seems most likely that it
does not ignore them.

Cheers,

 - Gord Brown


On 2011/12/07 14:48, "Joris Meys"  wrote:

> @Barry : regardless of whether '_' comes before or after '1' , it
> should be consistent. Adding an 'a' shouldn't shift '_' from before
> '1' to between '1' and '2', that's clearly an error. The help files
> are not stating anything about that. The only thing I can imagine, is
> that '_' gets ignored (in that case 19a would rank before 1a).
> 
> This said, I can't reproduce.
> 
>> x <- c("_1_", "1_9", "2_9")
>> xa <- paste(x,'a',sep='')
>> rank(x)
> [1] 1 2 3
>> rank(xa)
> [1] 1 2 3
> 
>> sessionInfo()
> R version 2.14.0 Patched (2006-00-00 r0)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C   LC_TIME=English_United
> States.1252
> 
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats tcltk utils
>methods   base
> 
> other attached packages:
> [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2  Hmisc_3.8-3
> survival_2.36-9
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.14.1  grid_2.14.0 lattice_0.19-33 svMisc_0.9-63
> tools_2.14.0
> 
> 
> 2011/12/7 Hervé Pagès :
>> Hi,
>> 
>> This looks OK:
>> 
>>> x <- c("_1_", "1_9", "2_9")
>>> rank(x)
>> [1] 1 2 3
>> 
>> But this does not:
>> 
>>> xa <- paste(x, "a", sep="")
>>> xa
>> [1] "_1_a" "1_9a" "2_9a"
>>> rank(xa)
>> [1] 2 1 3
>> 
>> Cheers,
>> H.
>> 
>>> sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>> 
>> locale:
>>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>> 
>> loaded via a namespace (and not attached):
>> [1] tools_2.14.0
>> 
>> 
>> --
>> Hervé Pagès
>> 
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>> 
>> E-mail: hpa...@fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hadley Wickham
> Actually this is the situation I was facing when I did my first post:
> I have a function that downloads a list of sequences from the Ensembl
> FTP server, sorts them by name, and returns them to the user. I have
> a test for that function and the test was working for me when I was
> doing
>
>  tools::testInstalledPackage("MyPackage", "types="tests")
>
> but it was failing when I was doing 'R CMD check'. It seems that
> the latter alters LC_COLLATE before running the tests (maybe to
> LC_COLLATE=C) but not the former. I fixed this by enforcing
> LC_COLLATE=C inside my function.
>
> A naive question: wouldn't everything be simpler if LC_COLLATE=C
> was the default for everybody?

Or if at least LC_COLLATE=C was the default for everyone when running
R CMD check.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] object type changes after being used as an argument in .Internal(paste(...))

2011-12-08 Thread Joris Meys
OK, I realize I'm hacking away in R in a manner that was not intended,
but I found this interesting behaviour nonetheless, and I am not sure
whether this was intended to be so or not.

> x <- list(1:3,1:3,1:3)
> r1 <- do.call(paste,x) # the correct way
> sapply(x,typeof)
[1] "integer" "integer" "integer"

> r2 <- .Internal(paste(x,sep=" ",collapse=NULL))
> sapply(x,typeof)
[1] "character" "character" "character"

So although I don't change x explicitly, after the call to
.Internal(paste(...)) it suddenly is a list of characters instead of
integers. Is this supposed to happen? (Normally .Internal(paste(...))
takes an anonymous list(...) as argument, so it might very well be the
intended way of working.)

Cheers
Joris
-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Roebuck,Paul L
On 12/8/11 3:57 AM, "Hervé Pagès"  wrote:

> On 11-12-07 10:29 AM, Roebuck,Paul L wrote:
>> Do this first and try again.
>> 
>> R>  Sys.setlocale("LC_COLLATE", "C")
> 
> OK I see it now (in ?Sys.setlocale):
> 
>Sys.setlocale("LC_COLLATE", "C")   # turn off locale-specific sorting,
>   #  usually
> 
> Thanks all for the answers!
> 
> I never really realized how far some collating sequence could go in
> terms of counter-intuitiveness e.g. the fact that LC_COLLATE=en_CA.UTF-8
> doesn't preserve the order of the strings when a common suffix is
> added to them is scary. Also it's not that LC_COLLATE=en_CA.UTF-8
> just ignores the '_' (underscores) and the '.' (dots), that can only be
> the first pass, then it needs to break ties in a way that defines a
> total order. So it looks like the exact definition of this collating
> sequence is counter-intuitive and complicated.
> 
> Maybe that's just how things are and the developers that want
> portability and reproducibility of their code are already putting
> a Sys.setlocale("LC_COLLATE", "C") statement somewhere in their package
> to force all their users to be on the same collating sequence.
> It sounds a little bit drastic though and it might introduce some
> conflicts with other packages.
> 
> So maybe a better approach is to only alter LC_COLLATE temporarily
> inside the functions where it matters i.e. where the returned value
> actually depends on the collating sequence? If I don't do this, then
> there is no way I can write a test for my function because the
> test would work for me but fail for someone else.
> 
> Actually this is the situation I was facing when I did my first post:
> I have a function that downloads a list of sequences from the Ensembl
> FTP server, sorts them by name, and returns them to the user. I have
> a test for that function and the test was working for me when I was
> doing
> 
>tools::testInstalledPackage("MyPackage", "types="tests")
> 
> but it was failing when I was doing 'R CMD check'. It seems that
> the latter alters LC_COLLATE before running the tests (maybe to
> LC_COLLATE=C) but not the former. I fixed this by enforcing
> LC_COLLATE=C inside my function.

Another developer here just ran into the problem two weeks ago when
data being processed on different machines (Linux,Windows) had different
results due to sorting. From my standpoint, I'm very hesitant to make
changes that affect behavior globally, so we changed it at the function
level in the package, did the sort and reset to original value using
on.exit() method.

As far as analysis reports, I believe we may need to set the LC_COLLATE
to the POSIX locale in ALL our standard Sweave templates as well to
ensure reproducibility, which is a BIG deal here.

> 
> A naive question: wouldn't everything be simpler if LC_COLLATE=C
> was the default for everybody?

Sure, but where's the fun in that? :)

>> 
>> 
>> On 12/7/11 3:41 AM, "Hervé Pagès"  wrote:
>> 
>>> This looks OK:
>>> 
 x<- c("_1_", "1_9", "2_9")
 rank(x)
>>> [1] 1 2 3
>>> 
>>> But this does not:
>>> 
 xa<- paste(x, "a", sep="")
 xa
>>> [1] "_1_a" "1_9a" "2_9a"
 rank(xa)
>>> [1] 2 1 3
>>> 
>>> Cheers,
>>> H.
>>> 
 sessionInfo()
>>> R version 2.14.0 (2011-10-31)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>> 
>>> locale:
>>>[1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
>>>[3] LC_TIME=en_CA.UTF-8LC_COLLATE=en_CA.UTF-8
>>>[5] LC_MONETARY=en_CA.UTF-8LC_MESSAGES=en_CA.UTF-8
>>>[7] LC_PAPER=C LC_NAME=C
>>>[9] LC_ADDRESS=C   LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods   base
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.14.0
>>> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in rank(), order(), is.unsorted() on character vector

2011-12-08 Thread Hervé Pagès

Hi Barry,

Hope you don't mind if I put this back on the list.

On 11-12-08 05:50 AM, Barry Rowlingson wrote:

2011/12/8 Hervé Pagès:


A naive question: wouldn't everything be simpler if LC_COLLATE=C
was the default for everybody?


  Yet when we Brits suggest everything would be simpler if the whole
world spoke the Queen's English it causes all sorts of trouble...


:-) Sure I see your point.

But it's a programming language here, used by a lot of researchers.
And having the result of an analysis depend on a crazy collate is
causing all sorts of troubles too.

Note that trying to strike back the Empire is a lost battle anyway.
When you use R (as a user or a developer), any function name you
type (sort, rank, print, summary, etc...) is in Queen's English.
And their man pages too.

Also note that I was just talking about the *default*. AFAIK other
very serious projects like Python or SQLite *by default* use a
collating sequence that behaves like LC_COLLATE=C on strings
that contain ASCII chars only. And they let you change that if you
want. Are they being imperialist? Most R users/developers are in
research or academics where I suspect consistency and reproducibility
is even a bigger deal than in the Python or SQLite community.

Cheers,
H.




Barry



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unable to collate and parse R files with R CMD check

2011-12-08 Thread Sumukh Sathnur
Thanks! That fixed the problem, it turned out one of my source files was
incomplete.

Sumukh

2011/12/7 Uwe Ligges 

>
>
> On 06.12.2011 20:05, Sumukh Sathnur wrote:
>
>> Hi all,
>>
>> I'm trying to build a package on Windows 7 (64 bit) and although R cmd
>> build worked fine and I got pkg.tar.gz with no errors, but when I tried
>> doing R CMD check everything turns out ok except for the warning:
>>
>> "checking whether package 'pkg' can be installed ... ERROR.
>> Installation failed."
>>
>> It then refers me to the error file "00install.out" which reads as
>> follows:
>>
>> * installing *source* package 'pkg' ...
>> ** R
>> Error in parse(outFile) : 94:0: unexpected end of input
>> 92:
>> 93: dat
>>
>
> Oh, it means R cannot even parse the code. So try to source() each file in
> your ./R directory separately. You will find that at least one won't work.
>
> Uwe Ligges
>
>
> ^
>> ERROR: unable to collate and parse R files for package 'pkg'
>> * removing 'C:/PROGRA~1/R/R-214~1.0/bin/**x64/PKG~1.RCH/pkg'
>>
>>
>> I have no idea what this means and I was not able to find any similar
>> errors online. Any help would be appreciated.
>>
>> Thanks in advance,
>> Sumukh
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-devel
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Reference class finalize() fails with 'attempt to apply non-function'

2011-12-08 Thread Martin Morgan
This bug appears intermittently in R CMD check when reference classes 
have finalize methods. The problem is that garbage collection can be run 
after the methods package is no longer available. It affects 
(periodically) the Bioconductor AnnotationDbi package as well as 
packages that contain Rcpp classes. To reproduce:


  library(methods)
  a = setRefClass("A", methods=list(finalize=function() cat("A\n")))
  b = setRefClass("B", contains="A")

repeat b = setRefClass("B", contains="A") until finalize does not run 
(no garbage collection triggered during setRefClass)


  b = setRefClass("B", contains="A")
  b = setRefClass("B", contains="A")

and then

> detach("package:methods")
> gc()
Error in function (x)  : attempt to apply non-function
Error in function (x)  : attempt to apply non-function

> traceback()
1: function (x)
   x$.self$finalize()()

I believe a variant of the same type of problem generates an error

Error in function (x)  : no function to return from, jumping to top level

also seen in AnnotationDbi and Rcpp packages
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel