Re: [Rd] Runnable R packages

2019-01-07 Thread Murray Stokely
Some other major tech companies have in the past widely use Runnable R
Archives (".Rar" files), similar to Python .par files [1], and integrate
them completely into the proprietary R package build system in use there.
I thought there were a few systems like this that had made their way to
CRAN or the UseR conferences, but I don't have a link.

Building something specific to your organization on top of the python .par
framework to archive up R, your needed packages/shared libraries, and other
dependencies with a runner script to R CMD RUN your entry point in a
sandbox is pretty straightforward way to have control in a way that makes
sense for your environment.

  - Murray

[1] https://google.github.io/subpar/subpar.html

On Mon, Jan 7, 2019 at 12:53 PM David Lindelof  wrote:

> Dear all,
>
> I’m working as a data scientist in a major tech company. I have been using
> R for almost 20 years now and there’s one issue that’s been bugging me of
> late. I apologize in advance if this has been discussed before.
>
> R has traditionally been used for running short scripts or data analysis
> notebooks, but there’s recently been a growing interest in developing full
> applications in the language. Three examples come to mind:
>
> 1) The Shiny web application framework, which facilitates the developent of
> rich, interactive web applications
> 2) The httr package, which provides lower-level facilities than Shiny for
> writing web services
> 3) Batch jobs run by data scientists according to, say, a cron schedule
>
> Compared with other languages, R’s support for such applications is rather
> poor. The Rscript program is generally used to run an R script or an
> arbitrary R expression, but I feel it suffers from a few problems:
>
> 1) It encourages developers of batch jobs to provide their code in a single
> R file (bad for code structure and unit-testability)
> 2) It provides no way to deal with dependencies on other packages
> 3) It provides no way to "run" an application provided as an R package
>
> For example, let’s say I want to run a Shiny application that I provide as
> an R package (to keep the code modular, to benefit from unit tests, and to
> declare dependencies properly). I would then need to a) uncompress my R
> package, b) somehow, ensure my dependencies are installed, and c) call
> runApp(). This can get tedious, fast.
>
> Other languages let the developer package their code in "runnable"
> artefacts, and let the developer specify the main entry point. The
> mechanics depend on the language but are remarkably similar, and suggest a
> way to implement this in R. Through declarations in some file, the
> developer can often specify dependencies and declare where the program’s
> "main" function resides. Consider Java:
>
> Artefact: .jar file
> Declarations file: Manifest file
> Entry point: declared as 'Main-Class'
> Executed as: java -jar 
>
> Or Python:
>
> Artefact: Python package, typically as .tar.gz source distribution file
> Declarations file: setup.py (which specifies dependencies)
> Entry point: special __main__() function
> Executed as: python -m 
>
> R has already much of this machinery:
>
> Artefact: R package
> Declarations file: DESCRIPTION
> Entry point: ?
> Executed as: ?
>
> I feel that R could benefit from letting the developer specify, possibly in
> DESCRIPTION, how to "run" the package. The package could then be run
> through, for example, a new R CMD command, for example:
>
> R CMD RUN  
>
> I’m sure there are plenty of wrinkles in this idea that need to be ironed
> out, but is this something that has ever been considered, or that is on R’s
> roadmap?
>
> Thanks for reading so far,
>
>
>
> David Lindelöf, Ph.D.
> +41 (0)79 415 66 41 or skype:david.lindelof
> http://computersandbuildings.com
> Follow me on Twitter:
> http://twitter.com/dlindelof
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel on FreeBSD: Support for C99 complex type is required

2011-02-06 Thread Murray Stokely
On Sun, Feb 6, 2011 at 8:50 AM, Rainer Hurling  wrote:
>> I think this is really a FreeBSD support question. In 2011, an OS really
>> should have support for a 1999 standard. Darwin, a FreeBSD derivative,
>> does and its help page says
>
> Hmm, on FreeBSD I really have no other piece of software which complains
> about lack of C99.

FreeBSD is planning on switching to a different compiler, llvm/clang,
and so the version of gcc is stale, but still it should be more than
sufficient to support C99.  FreeBSD started a C99 effort a decade ago
and I haven't heard from this initiative in a long time as I thought
it was completed.

http://www.freebsd.org/projects/c99/index.html

There is I believe experimental support for llvm/clang built into
FreeBSD 9, so you could try compiling with that instead of gcc.

> Ok, I understand. This seems consistent. I will try to contact FreeBSD
> support about it. Please do not change back the behaviour for FreeBSD
> (towards emulation code) until this is clarified.

Yes, please mail freebsd-standa...@google.com

I haven't looked at what autoconf is testing exactly but I suspect
simply another argument must be provided in the autoconf script to get
it to pull up the C99 math functions its looking for.

  - Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel on FreeBSD: Support for C99 complex type is required

2011-02-06 Thread Murray Stokely
On Sun, Feb 6, 2011 at 9:24 AM, Murray Stokely  wrote:
> Yes, please mail freebsd-standa...@google.com

Ugh, that should be freebsd-standa...@freebsd.org of course.  Silly brain-o.

- Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Signal handling / alarm timeouts

2011-04-12 Thread Murray Stokely
What are the ramifications of setting up user signal handling to allow
the use of e.g. alarm(2) to send a SIGALRM to the R process at some
number of seconds in the future to e.g. interrupt a routine that is
taking too long to complete.

I can't find any R language support for this (e.g. a timeout argument
to tryCatch() would be ideal), so am wondering what kinds of problems
are to be expected if I do this with native C code in a package.

Are there other ways to accomplish timeouts for blocks of R code like this?

   - Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
The lack of 64 bit integer support causes lots of problems when dealing
with certain types of data where the loss of precision from coercing to 53
bits with double is unacceptable.

Two packages were developed to deal with this:  int64 and bit64.

You may need to find archival versions of these packages if they've fallen
off cran.

Murray (mobile phone)

On Jan 20, 2017 7:20 AM, "Gabriel Becker"  wrote:

I am not on R-core, so cannot speak to future plans to internally support
int8 (though my impression is that there aren't any, at least none that are
close to fruition).

The standard way of dealing with whole numbers too big to fit in an integer
is to put them in a numeric (double down in C land). this can represent
integers up to 2^53 without loss of precision see (
http://stackoverflow.com/questions/1848700/biggest-
integer-that-can-be-stored-in-a-double).
This is how long vector indices are (currently) implemented in R. If it's
good enough for indices it's probably good enough for whatever you need
them for.

Hope that helps.

~G


On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris 
wrote:

> Hello r users,
>
> I have to deal with int8 data with R. AFAIK  R does only handle int4
> with `as.integer` function [1]. I wonder:
> 1. what is the better approach to handle int8 ? `as.character` ?
> `as.numeric` ?
> 2. is there any plan to handle int8 in the future ? As you might know,
> int4 is to small to deal with earth population right now.
>
> Thanks for you ideas,
>
> int8 eg:
>
>  human_id
> --
>  -1311071933951566764
>  -4708675461424073238
>  -6865005668390999818
>   5578000650960353108
>  -3219674686933841021
>  -6469229889308771589
>   -606871692563545028
>  -8199987422425699249
>   -463287495999648233
>   7675955260644241951
>
> reference:
> 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
>
> --
> Nicolas PARIS
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to handle INT8 data

2017-01-20 Thread Murray Stokely
2^53 == 2^53+1
TRUE

Which makes joining or grouping data sets with 64 bit identifiers
problematic.

Murray (mobile)

On Jan 20, 2017 9:15 AM, "Nicolas Paris"  wrote:

Le 20 janv. 2017 à 18h09, Murray Stokely écrivait :
> The lack of 64 bit integer support causes lots of problems when dealing
with
> certain types of data where the loss of precision from coercing to 53
bits with
> double is unacceptable.

Hello Murray,
Do you mean, by eg. -1311071933951566764 loses in precision during
as.numeric(-1311071933951566764) process ?
Thanks,
>
> Two packages were developed to deal with this:  int64 and bit64.
>
> You may need to find archival versions of these packages if they've
fallen off
> cran.
>
> Murray (mobile phone)
>
> On Jan 20, 2017 7:20 AM, "Gabriel Becker"  wrote:
>
> I am not on R-core, so cannot speak to future plans to internally
support
> int8 (though my impression is that there aren't any, at least none
that are
> close to fruition).
>
> The standard way of dealing with whole numbers too big to fit in an
integer
> is to put them in a numeric (double down in C land). this can
represent
> integers up to 2^53 without loss of precision see (
> http://stackoverflow.com/questions/1848700/biggest-
> integer-that-can-be-stored-in-a-double).
> This is how long vector indices are (currently) implemented in R. If
it's
> good enough for indices it's probably good enough for whatever you
need
> them for.
>
> Hope that helps.
>
> ~G
>
>
> On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris 
> wrote:
>
> > Hello r users,
> >
> > I have to deal with int8 data with R. AFAIK  R does only handle int4
> > with `as.integer` function [1]. I wonder:
> > 1. what is the better approach to handle int8 ? `as.character` ?
> > `as.numeric` ?
> > 2. is there any plan to handle int8 in the future ? As you might
know,
> > int4 is to small to deal with earth population right now.
> >
> > Thanks for you ideas,
> >
> > int8 eg:
> >
> >  human_id
> > --
> >  -1311071933951566764
> >  -4708675461424073238
> >  -6865005668390999818
> >   5578000650960353108
> >  -3219674686933841021
> >  -6469229889308771589
> >   -606871692563545028
> >  -8199987422425699249
> >   -463287495999648233
> >   7675955260644241951
> >
> > reference:
> > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/
> >
> > --
> > Nicolas PARIS
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Gabriel Becker, PhD
> Associate Scientist (Bioinformatics)
> Genentech Research
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Nicolas PARIS

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] regenerate Rscript after moving R installation

2013-09-23 Thread Murray Stokely
Simon, do you have some examples of packages with this attribute?  Removing
the hard-coding of paths in base R and Rscript is one of the many local
patches we've maintained in the R I use at my workplace since at least the
R 2.5 days.  We do this to enable us to send R and all its dependencies off
to build farms, unit test clusters, and production clusters for running
parallel computations among other use cases where the path of the build
server is irrelevant to the server running the R code.

I don't recall running into any packages where an absolute path from the
build host was hard-coded into the package such that we had to update code
to get it to work.  But maybe I'm just not using those packages.

  - Murray


On Sat, Sep 21, 2013 at 6:45 PM, Simon Urbanek
wrote:

> I forgot to mention that some packages bake-in paths as well, so even if
> you fix both R and Rscript, it will still not work in general.
>
> On Sep 22, 2013, at 3:42 AM, Simon Urbanek 
> wrote:
>
> > On Sep 21, 2013, at 8:43 PM, Tobias Verbeke <
> tobias.verb...@openanalytics.eu> wrote:
> >
> >> L.S.
> >>
> >> In this bug report
> >>
> >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14493#c1
> >>
> >> it is mentioned that after moving an R installation
> >> one should regenerate the Rscript executable.
> >>
> >> Is there an easy way to do so (after an R installation has been
> >> moved)?
> >>
> >
> > You cannot move installed R. Once you run make install, there are
> several places in which paths get baked in - mainly Rscript and the R start
> script. What I typically do for deployment on the Labs machines is to use
> make install rhome= where  is some path that I can always create
> a symlink in (I also use DESTDIR so that path doesn't actually need to
> exist on the build machine and it avoid polluting --prefix which is not
> needed). That way you can move R wherever you want as long so you keep that
> one symlink up to date.
> >
> > Cheers,
> > Simon
> >
> >
> >> I have not found any information in the R installation and
> >> administration manual.
> >>
> >> Many thanks in advance for any pointer.
> >>
> >> Best wishes,
> >> Tobias
> >>
> >> P.S. The background to this question is the usage of Rscript
> >> calls in the Makevars files of some R packages on CRAN, so
> >> the 'broken' Rscript prevents installation of certain R packages.
> >>
> >> --
> >>
> >> Tobias Verbeke
> >> Manager
> >>
> >> OpenAnalytics BVBA
> >> Jupiterstraat 20
> >> 2600 Antwerp
> >> Belgium
> >>
> >> E tobias.verb...@openanalytics.eu
> >> M +32 499 36 33 15
> >> http://www.openanalytics.eu
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Determining files opened by an R session

2013-11-04 Thread Murray Stokely
Most operating systems have tools which allow you to audit the resources
used by a running process, for example the 'lsof' (list open files) command
on Unix and MacOS X.  Or, for more complex dynamic tracing, the DTrace
framework again on MacOS X or BSD Unix.

Not sure what the Windows equivalent would be, or what platform you are
using, but given the number of ways that code in packages and such may be
accessing files in C code possibly based on environment variables or other
configuration parameters, I would want to lean heavily on the operating
systems tools for things like this rather than rely on parsing your R code
looking for specific file access.

   - Murray



On Mon, Nov 4, 2013 at 1:32 PM, Martin Gregory  wrote:

> I'm using R in a regulated environment and one of the requirements is to
> be able to trace how a result is arrived at. I would like to be able to
> determine which files are opened in read or write mode by an R session, for
> example when a program uses source, sink, file, open, read.table,
> write.table or any of the other functions which can be used to read or
> write files. I'm also interested in output to graphics devices.
>
> I've looked in the documentation but only found information relating to
> profiling. Looking through the source code it seems that much file i/o is
> done via the C functions *_open in main/connections.c but don't see
> anything there that looks like logging.
>
> Could someone let me know if it is possible to log which files are opened?
>
> Regards,
> Martin Gregory
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] inflate zlib compressed data using base R or CRAN package?

2013-11-27 Thread Murray Stokely
I think none of these examples describe a zlib compressed data block inside
a binary file that the OP asked about, as all of your examples are e.g.
prepending gzip or zip headers.

Greg, is memDecompress what you are looking for?

  - Murray



On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel  wrote:

>
> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote:
> |
> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote:
> | | I have a binary file type that includes a zlib compressed data block
> (ie
> | | not gzip). Is anyone aware of a way using base R or a CRAN package to
> | | decompress this kind of data (from disk or memory). So far I have found
> | | Rcompression::decompress on omegahat, but I would prefer to keep
> | | dependencies on CRAN (or bioconductor). I am also trying to avoid
> | | writing yet another C level interface to part of zlib.
> |
> | Unless I am missing something, this is in base R; see help(connections).
> |
> | Here is a quick demo:
> |
> | R> write.csv(trees, file="/tmp/trees.csv")# data we all have
> | R> system("gzip -v /tmp/trees.csv")   # as I am lazy here
> | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz
> | R> read.csv(gzfile("/tmp/trees.csv.gz"))  # works out of the box
>
> Oh, and in case you meant zip file containing a data file, that also works.
>
> First converting what I did last
>
> edd@max:/tmp$ gunzip trees.csv.gz
> edd@max:/tmp$ zip trees.zip trees.csv
>   adding: trees.csv (deflated 50%)
> edd@max:/tmp$
>
> Then reading the csv from inside the zip file:
>
> R> read.csv(unz("/tmp/trees.zip", "trees.csv"))
> X Girth Height Volume
> 1   1   8.3 70   10.3
> 2   2   8.6 65   10.3
> 3   3   8.8 63   10.2
> 4   4  10.5 72   16.4
> 5   5  10.7 81   18.8
> 6   6  10.8 83   19.7
> 7   7  11.0 66   15.6
> 8   8  11.0 75   18.2
> 9   9  11.1 80   22.6
> 10 10  11.2 75   19.9
> 11 11  11.3 79   24.2
> 12 12  11.4 76   21.0
> 13 13  11.4 76   21.4
> 14 14  11.7 69   21.3
> 15 15  12.0 75   19.1
> 16 16  12.9 74   22.2
> 17 17  12.9 85   33.8
> 18 18  13.3 86   27.4
> 19 19  13.7 71   25.7
> 20 20  13.8 64   24.9
> 21 21  14.0 78   34.5
> 22 22  14.2 80   31.7
> 23 23  14.5 74   36.3
> 24 24  16.0 72   38.3
> 25 25  16.3 77   42.6
> 26 26  17.3 81   55.4
> 27 27  17.5 82   55.7
> 28 28  17.9 80   58.3
> 29 29  18.0 80   51.5
> 30 30  18.0 80   51.0
> 31 31  20.6 87   77.0
> R>
>
> Regards, Dirk
>
> --
> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C API to get numrow of data frame

2014-03-31 Thread Murray Stokely
The simplest case would be:

   int num_rows = Rf_length(VECTOR_ELT(dataframe, 0));
   int num_columns = Rf_length(dataframe);

There may be edge cases for which this doesn't work; would need to
look into how the dim primitive is implemented to be sure.

   - Murray


On Mon, Mar 31, 2014 at 4:40 PM, Sandip Nandi  wrote:
> Hi ,
>
> Is there any C API to the R API  nrow of dataframe ?
>
> x<- data.frame()
> n<- nrow(x)
> print(n)
> 0
>
>
> Example :
> My C function which deals with data frame looks like and I don't to send
> the  number of rows of data frame .I want to detect it from the function
> itself, my function take data frame as argument and do some on it. I want
> API equivalent to nrow. I tried Rf_nrows,Rf_ncols . No much help.
>
> SEXP  writeRR(SEXP dataframe) {
>
> }
>
>
> Any help is very appreciated.
>
> Thanks,
> Sandip
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C API to get numrow of data frame

2014-03-31 Thread Murray Stokely
I didn't look at the names because I believe that would be incorrect
if the row names were stored internally in the compact form.

See ?.set_row_names (hat tip, Tim Hesterberg who showed me this years ago) :

 'row.names' can be stored internally in compact form.
 '.set_row_names(n)' generates that form for automatic row names of
 length 'n', to be assigned to 'attr(, "row.names")'.
 '.row_names_info' gives information on the internal form of the
 row names for a data frame: for details of what information see
 the argument 'type'.

The function I wrote obviously doesn't work for 0 row or 0 column
data.frames, you need to check for that.

On Mon, Mar 31, 2014 at 6:12 PM, Gábor Csárdi  wrote:
> I think it is actually better to check the length of the row names. In case
> the data frame has zero columns. (FIXME, of course.)
>
> Gabor
>
>
> On Mon, Mar 31, 2014 at 8:04 PM, Murray Stokely  wrote:
>>
>> The simplest case would be:
>>
>>int num_rows = Rf_length(VECTOR_ELT(dataframe, 0));
>>int num_columns = Rf_length(dataframe);
>>
>> There may be edge cases for which this doesn't work; would need to
>> look into how the dim primitive is implemented to be sure.
>>
>>- Murray
>>
>>
>> On Mon, Mar 31, 2014 at 4:40 PM, Sandip Nandi 
>> wrote:
>> > Hi ,
>> >
>> > Is there any C API to the R API  nrow of dataframe ?
>> >
>> > x<- data.frame()
>> > n<- nrow(x)
>> > print(n)
>> > 0
>> >
>> >
>> > Example :
>> > My C function which deals with data frame looks like and I don't to send
>> > the  number of rows of data frame .I want to detect it from the function
>> > itself, my function take data frame as argument and do some on it. I
>> > want
>> > API equivalent to nrow. I tried Rf_nrows,Rf_ncols . No much help.
>> >
>> > SEXP  writeRR(SEXP dataframe) {
>> >
>> > }
>> >
>> >
>> > Any help is very appreciated.
>> >
>> > Thanks,
>> > Sandip
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-17 Thread Murray Stokely
On Thu, Apr 17, 2014 at 6:42 AM, McGehee, Robert
 wrote:
> Here's my use case: I have a function that pulls arbitrary financial data 
> from a web service call such as a stock's industry, price, volume, etc. by 
> reading the web output as a text table. The data may be either character 
> (industry, stock name, etc.) or numeric (price, volume, etc.), and the 
> function generally doesn't know the class in advance. The problem is that we 
> frequently get numeric values represented with more precision than actually 
> exists, for instance a price of "2.6999" rather than "2.70". The 
> numeric representation is exactly one digit too much for type.convert which 
> (in R 3.10.0) converts it to character instead of numeric (not what I want). 
> This caused a bunch of "non-numeric argument to binary operator" errors to 
> appear today as numeric data was now being represented as characters.
>
> I have no doubt that this probably will cause some unwanted RODBC side 
> effects for us as well. IMO, getting the class right is more important than 
> infinite precision. What use is a character representation of a number anyway 
> if you can't perform arithmetic on it? I would favor at least making the new 
> behavior optional, but I think many packages (like RODBC) potentially need to 
> be patched to code around the new feature if it's left in.

The uses of character representation of a number are many: unique
identifiers/user ids, hash codes, timestamps, or other values where
rounding results to the nearest value that can be represented as a
numeric type would completely change the results of any data analysis
performed on that data.

Database join operations are certainly an area where R's previous
behavior of silently dropping precision of numbers with type.convert
can get you into trouble.  For example, things like join operations or
group by operations performed in R code would produce erroneous
results if you are joining/grouping by a key without the full
precision of your underlying data.  Records can get joined up
incorrectly or aggregated with the wrong groups.

If you later want to do arithmetic on them, you can choose to lose
precision by using as.numeric() or use one of the large number
packages on CRAN (GMP, int64, bit64, etc.).  But once you've dropped
the precision with as.numeric you can never get it back, which is why
the previous behavior was clearly dangerous.

I think I had some additional examples in the original bug/patch I
filed about this issue a few years ago, but I'm unable to find it on
bugs.r-project.org and its not referenced in the cl descriptions or
news file.

 - Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-17 Thread Murray Stokely
On Thu, Apr 17, 2014 at 2:35 PM, Gabor Grothendieck
 wrote:
> Only if you knew that that column was supposed to be numeric. There is

The columns that are "supposed" to be numeric are those that can fit
into a numeric data type.  Previously that was not always the case
with columns that could not be represented as a numeric erroneously
coerced into a truncated/rounded numeric.

> nothing in type.convert or read.table to allow you to override how it
> works (colClasses only works if you knew which columns are which in
> the first place) nor is there anything to allow you to know which
> columns were affected so that you know which columns to look at to fix
> it yourself afterwards.

You want a casting operation in your SQL query or similar if you want
a rounded type that will always fit in a double.  Cast or Convert
operators in SQL, or similar for however you are getting the data you
want to use with type.convert().  This is all application specific and
sort of beyond the scope of type.convert(), which now behaves as it
has been documented to behave.

In my code for this kind of thing I have however typically introduced
an option() to let the user control casting behavior for e.g. 64-bit
ints in C++.  Should they be returned as truncated precision numeric
types or the full precision data in a character string representation?
 In the RProtoBuf package we let the user specify an option() to
specify which behavior they need for their application as a shortcut
to just always returning the safer character representation and making
them coerce to numeric often.

- Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-20 Thread Murray Stokely
Yes, I'm also strongly in favor of having an option for this.  If
there was an option in base R for controlling this we would just use
that and get rid of the separate RProtoBuf.int64AsString option we use
in the RProtoBuf package on CRAN to control whether 64-bit int types
from C++ are returned to R as numerics or character vectors.

I agree that reasonable people can disagree about the default, but I
found my original bug report about this, so I will counter Robert's
example with my favorite example of what was wrong with the previous
behavior :

tmp<-data.frame(n=c("72057594037927936", "72057594037927937"),
name=c("foo", "bar"))
length(unique(tmp$n))
# 2
write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE)
data <- read.csv("/tmp/foo.csv")
length(unique(data$n))
# 1

  - Murray


On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek
 wrote:
> On Apr 19, 2014, at 9:00 AM, Martin Maechler  
> wrote:
>
>>> McGehee, Robert 
>>>on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>>
 This is all application specific and
 sort of beyond the scope of type.convert(), which now behaves as it
 has been documented to behave.
>>
>>> That's only a true statement because the documentation was changed to 
>>> reflect the new behavior! The new feature in type.convert certainly does 
>>> not behave according to the documentation as of R 3.0.3. Here's a snippit:
>>
>>> The first type that can accept all the
>>> non-missing values is chosen (numeric and complex return values
>>> will represented approximately, of course).
>>
>>> The key phrase is in parentheses, which reminds the user to expect a 
>>> possible loss of precision. That important parenthetical was removed from 
>>> the documentation in R 3.1.0 (among other changes).
>>
>>> Putting aside the fact that this introduces a large amount of unnecessary 
>>> work rewriting SQL / data import code, SQL packages, my biggest conceptual 
>>> problem is that I can no longer rely on a particular function call 
>>> returning a particular class. In my example querying stock prices, about 5% 
>>> of prices came back as factors and the remaining 95% as numeric, so we had 
>>> random errors popping in throughout the morning.
>>
>>> Here's a short example showing us how the new behavior can be unreliable. I 
>>> pass a character representation of a uniformly distributed random variable 
>>> to type.convert. 90% of the time it is converted to "numeric" and 10% it is 
>>> a "factor" (in R 3.1.0). In the 10% of cases in which type.convert converts 
>>> to a factor the leading non-zero digit is always a 9. So if you were 
>>> expecting a numeric value, then 1 in 10 times you may have a bug in your 
>>> code that didn't exist before.
>>
 options(digits=16)
 cl <- NULL; for (i in 1:1) cl[i] <- 
 class(type.convert(format(runif(1
 table(cl)
>>> cl
>>> factor numeric
>>> 9909010
>>
>> Yes.
>>
>> Murray's point is valid, too.
>>
>> But in my view, with the reasoning we have seen here,
>> *and* with the well known software design principle of
>> "least surprise" in mind,
>> I also do think that the default for type.convert() should be what
>> it has been for > 10 years now.
>>
>
> I think there should be two separate discussions:
>
> a) have an option (argument to type.convert and possibly read.table) to 
> enable/disable this behavior. I'm strongly in favor of this.
>
> b) decide what the default for a) will be. I have no strong opinion, I can 
> see arguments in both directions
>
> But most importantly I think a) is better than the status quo - even if the 
> discussion about b) drags out.
>
> Cheers,
> Simon
>
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug in sum() on integer vector

2011-12-13 Thread Murray Stokely
FYI, the new int64 package on CRAN gets this right, but is of course
somewhat slower since it is not doing hardware 64-bit arithmetic.

 x <- c(rep(180003L, 1000), -rep(120002L, 1500))
 library(int64)
 sum(as.int64(x))
# [1] 0

 - Murray

2011/12/9 Hervé Pagès :
> Hi,
>
>  x <- c(rep(180003L, 1000), -rep(120002L, 1500))
>
> This is correct:
>
>  > sum(as.double(x))
>  [1] 0
>
> This is not:
>
>  > sum(x)
>  [1] 4996000
>
> Returning NA (with a warning) would also be acceptable for the latter.
> That would make it consistent with cumsum(x):
>
>  > cumsum(x)[length(x)]
>  [1] NA
>  Warning message:
>  Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
>
> Thanks!
> H.
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
>  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN policies

2012-03-27 Thread Murray Stokely
Lots of very sensible policies here.  I have one request as someone
who has in several cases had to involve company lawyers over
intellectual property issues with packages on CRAN -- the first bullet
point on ownership of copyright and intellectual property rights could
be strengthened further.

To the existing text "The ownership of copyright and intellectual
property rights of all components of the package must be clear and
unambiguous (including from the authors specification in the
DESCRIPTION file). Where code is copied (or derived) from the work of
others (including from R itself), care must be taken that any
copyright statements are preserved and authorship is not
misrepresented.
Trademarks must be respected."

I would add a few additional points :

1. The text of the license itself should be included in the package in
a LICENSE or COPYING file, as most of these licenses have things that
need to be filled in with names and other data, and just referencing a
license name in the DESCRIPTION file is not really a great way to deal
with licensing metadata when used exclusively (it's a great complement
to a full, filled-out license in the package itself).

2. Per file copyright comment headers can help immensely with ensuring
compliance and the accidental incorporation of files under a different
license.  Comment header blocks with the author name and terms of
distribution could be recommended for all source files.

   - Murray

On Tue, Mar 27, 2012 at 4:52 AM, Prof Brian Ripley
 wrote:
> CRAN has for some time had a policies page at
> http://cran.r-project.org/web/packages/policies.html
> and we would like to draw this to the attention of package maintainers.  In
> particular, please
>
> - always send a submission email to c...@r-project.org with the package
> name and version on the subject line.  Emails sent to individual members of
> the team will result in delays at best.
>
> - run R CMD check --as-cran on the tarball before you submit it.  Do
> this with the latest version of R possible: definitely R 2.14.2,
> preferably R 2.15.0 RC or a recent R-devel.  (Later versions of R are
> able to give better diagnostics, e.g. for compiled code and especially
> on Windows. They may also have extra checks for recently uncovered
> problems.)
>
> Also, please note that CRAN has a very heavy workload (186 packages were
> published last week) and to remain viable needs package maintainers to make
> its life as easy as possible.
>
> Kurt Hornik
> Uwe Ligges
> Brian Ripley
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel on FreeBSD: new C99 functions don't build

2012-05-15 Thread Murray Stokely
On Tue, May 15, 2012 at 10:05 AM, Rainer Hurling  wrote:
> About April 25th, there had been some changes within R-devel's
> src/nmath/pnbeta.c (and probably some other relevant places) and now
> building R-devel on FreeBSD 10.0-CURRENT (amd64) with gcc-4.6.4 and
> math/R-devel (selfmade forked port from math/R) fails like this:

> It seems, that at least one new C99 function (log1pl) is introduced in
> R-devel, see
>
> src/nmath/pnbeta.c:l95
> return (double) (log_p ? log1pl(-ans) : (1 - ans));

AFAIK, Bruce Evans is not happy with the numerical accuracy of other
open-source implementations of log1pl and so has blocked their
inclusion in FreeBSD pending work on a better implementation.

Can you put a conditional FreeBSD check here and use log1p instead of
log1pl instead as a workaround?

I can admire the insistence on correctness from the FreeBSD libm
maintainers for their technical purity, but it can be a bit of a pain
for things like this.

 - Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] r-devel fails tests for parallel

2012-05-17 Thread Murray Stokely
On Thu, May 17, 2012 at 8:09 AM, Prof Brian Ripley
 wrote:
> This is getting increasingly difficult.  GCC 4.6.x and 4.7.x detect a lot of
> errors (especially C++ errors) that earlier versions did not -- and that
> means CRAN gets a fair number of submissions that we cannot compile.  And
> there have been a lot of optimization advances since 4.1.x.

I would also point out that clang has significantly better error
detection and diagnostics compared to current GCC.  Installations
stuck with old GCC releases for GPL3 reasons should really migrate to
clang / llvm.

- Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] R ignores PATH_MAX and fails in long directories (PR#14228)

2010-03-14 Thread Murray Stokely
Indeed thanks to ripley@ for submitting it.  I don't see a note in the
NEWS file and it would be nice to point this out as fixed since others
may have run into this problem.  Could someone submit something like
this as well?

Index: NEWS
===
--- NEWS(revision 51276)
+++ NEWS(working copy)
@@ -577,6 +577,8 @@

 o  read.fwf() works again when 'file' is a connection.

+o   R now works correctly with filesystem paths longer than 255
+characters on platforms that support it (PR#14228).


CHANGES IN R VERSION 2.10.1


On Thu, Mar 11, 2010 at 9:12 AM, Seth Falcon  wrote:
> On 3/11/10 12:45 AM, Henrik Bengtsson wrote:
>>
>> Thanks for the troubleshooting,
>>
>> I just want to second this patch; it would be great if PATH_MAX could
>> be used everywhere.
>
> The patch, or at least something quite similar, was applied in r51229.
>
> + seth
>
> --
> Seth Falcon | @sfalcon | http://userprimary.net/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R in sandbox/jail (long question)

2010-05-20 Thread Murray Stokely
On Tue, May 18, 2010 at 7:38 PM, Assaf Gordon  wrote:
> I've found this old thread:
> http://r.789695.n4.nabble.com/R-in-a-sandbox-jail-td921991.html
> But for technical reasons I'd prefer not to setup a chroot jail.
>

I would also point out that the state of the art in the operating
system community has moved on significantly since 1982 when chroot was
added.  BSD Jails, Solaris Zones/Containers, SELinux, etc. all provide
much more control over the system calls, network connections, and file
and device access granted to applications in different jails/zones.

These operating system capabilities solve exactly some of the problems
you are trying to solve by painstakingly modifying R, but in a more
secure and configurable manner.

 - Murray

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel