Re: [Rd] type.convert and doubles

2014-04-22 Thread Martin Maechler
> McGehee, Robert 
> on Mon, 21 Apr 2014 09:24:13 -0400 writes:

> Agreed. Perhaps even a global option would make sense. We
> already have an option with a similar spirit:
> 'options(³stringsAsFactors"=T/F)'. Perhaps
> 'options(³exactNumericAsString²=T/F)' [or something else]
> would be desirable, with the option being the default
> value to the type.convert argument.

No, please, no, not a global option here!

Global options that influence default behavior of basic
functions is too much against the principle of functional
programming, and my personal opinion has always been that
'stringsAsFactors' has been a mistake (as a global option, not
as an argument).

Note that with such global options, the output of sessionInfo()
would in principle have to contain all (such) global options in
addtion to R and package versions in order to diagnose behavior
of R functions.

I think we have more or less agreed that we'd like to have
a new function *argument* to type.convert(); 
passed "upstream" to read.table() and via ... the other
read.() that call read.table.


> I also like Gabor¹s idea of a ³distinguishing class². R
> doesn¹t natively support arbitrary precision numbers
> (AFAIK), but I think that¹s what Murray wants. I could
> imagine some kind of new class emerging here that
> initially looks just like a character/factor, but may
> evolve over time to accept arithmetic methods and act more
> like a number (e.g. knowing that ³0.1², ³.10² and "1e-1"
> are the same number, or that ³-9²<³-0.2"). A class
> ³bignum² perhaps?

That's another interesting idea. As maintainer of CRAN package
'Rmpfr' and co-maintainer of 'gmp', I'm even biased about this
issue.

Martin

> Cheers, Robert


> On 4/20/14, 3:24 AM, "Murray Stokely" 
> wrote:

>> Yes, I'm also strongly in favor of having an option for
>> this.  If there was an option in base R for controlling
>> this we would just use that and get rid of the separate
>> RProtoBuf.int64AsString option we use in the RProtoBuf
>> package on CRAN to control whether 64-bit int types from
>> C++ are returned to R as numerics or character vectors.
>> 
>> I agree that reasonable people can disagree about the
>> default, but I found my original bug report about this,
>> so I will counter Robert's example with my favorite
>> example of what was wrong with the previous behavior :
>> 
>> tmp<-data.frame(n=c("72057594037927936",
>> "72057594037927937"), name=c("foo", "bar"))
>> length(unique(tmp$n)) # 2 write.csv(tmp, "/tmp/foo.csv",
>> quote=FALSE, row.names=FALSE) data <-
>> read.csv("/tmp/foo.csv") length(unique(data$n)) # 1
>> 
>> - Murray
>> 
>> 
>> On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek
>>  wrote:
>>> On Apr 19, 2014, at 9:00 AM, Martin Maechler
>>>  wrote:
>>> 
> McGehee, Robert 
> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
 
> This is all application specific and
> sort of beyond the scope of type.convert(), which now
 behaves as it
> has been documented to behave.
 
> That's only a true statement because the documentation
> was changed to reflect the new behavior! The new
> feature in type.convert certainly does not behave
> according to the documentation as of R 3.0.3. Here's a
> snippit:
 
> The first type that can accept all the non-missing
> values is chosen (numeric and complex return values
> will represented approximately, of course).
 
> The key phrase is in parentheses, which reminds the
> user to expect a possible loss of precision. That
> important parenthetical was removed from the
> documentation in R 3.1.0 (among other changes).
 
> Putting aside the fact that this introduces a large
> amount of unnecessary work rewriting SQL / data import
> code, SQL packages, my biggest conceptual problem is
> that I can no longer rely on a particular function
> call returning a particular class. In my example
> querying stock prices, about 5% of prices came back as
> factors and the remaining 95% as numeric, so we had
> random errors popping in throughout the morning.
 
> Here's a short example showing us how the new behavior
> can be unreliable. I pass a character representation
> of a uniformly distributed random variable to
> type.convert. 90% of the time it is converted to
> "numeric" and 10% it is a "factor" (in R 3.1.0). In
> the 10% of cases in which type.convert converts to a
> factor the leading non-zero digit is always a 9. So if
> you were expecting a numeric value, then 1 in 10 times
> you may have a bug in your code that didn't exist
> before.
   

Re: [Rd] R 3.1.0: 'R CMD Sweave' deletes non tex files created upon batch mode exit

2014-04-22 Thread Thomas Rusch

Hi Martin,

I use R CMD Sweave often as well, so thanks for looking into this. I 
have now tested some of my scripts with R-devel revision 65449 
(2014-04-22) on 64-Bit Linux Mint 14.


I can confirm Marc's report that running R CMD Sweave no longer deletes 
graphic files and retains created non-tex files (eps and/or pdf).


What does not behave as before is the output printed to stdout, which 
before listed the number and name of each  code chunk it processed and 
also the options. It now outputs only any Sweave errors or messages from 
within R, and finishes with Output file: foo.tex


In case all works fine and no messages are printed, all that is written 
to stdout is Output file: foo.tex after R CMD Sweave processed the .Rnw.


In an R version prior to 3.1.0 output was e.g.,

Writing to file foo.tex
Processing code chunks with options ...
 1 : keep.source (label = setup, foo.Rnw:22)
 2 : keep.source term verbatim (label = packages, foo.Rnw:90)
 3 : keep.source term verbatim (label = data, foo:148)
...

It still outputs this information when running Sweave from within R if I 
say e.g.,


R> Sweave("foo.Rnw")

I'm not sure whether the new behavior of not printing information to 
stdout for R CMD Sweave is intended or not, but I thought I'll report it 
along with confirming R CMD Sweave no works again for me.


Best wishes
Thomas



> sessionInfo()
R Under development (unstable) (2014-04-22 r65449)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

> Martin Maechler <[hidden email] 
> 


> on Thu, 17 Apr 2014 11:22:04 +0200 writes:

 []

> PS: I'm currently testing a patch where 'R CMD Sweave' will
> revert to not deleting anything after running the R code by 
default.


> Martin Maechler

Some may have noted that R-devel, since svn revsion 65401 (= 
2014-04-17 12:19:44 +0200)

now is patched, with log message

> R CMD Sweave must not delete files by default; buildVignette(*, keep);
>  update (and fix/clarify) documentation;
> cosmetic (& speedup in buildVignettes())

The daily (source!) snapshots of R devel now also contain it.

We plan to port the patch to 'R 3.1.0 patched' (to become 3.1.1
in the future) after the Easter holidays...
and would be glad if some volunteers could support development
of R by testing this (or newer) version of R-devel.

Martin Maechler, ETH Zurich


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-22 Thread Therneau, Terry M., Ph.D.

"No global options"
I don't have an opinion about type.convert, but I must object to Martin's sweeping 
statement about global options, and stringsAsFactors in particular.  There have been only 
a few decisions in Splus/R that were so bad that our biostat group modified the core 
routines in order to return to sane behavior: automatic conversion of strings to factors 
was one of them --- not just when reading a data set but every bloody time you modified a 
data frame.  Addition of the global option was a blessing.


I work in a large biostatistics group whose mission is the advancement of medicine. 
Nothing frightens me more about the long term viability of R as a tool than sweeping 
announcements about "principles" which brush pragmatic considerations aside as irrelevant. 
 Some of us need to get work done.  (S4 zealots can be particularly annoying in this 
regard.)


How many of you remember the orignal S decision to have all modeling functions fail upon 
seeing a missing value?  The na.action argument was only available within lm() etc calls, 
with no global override, because "missing values are serious artifacts and should not be 
removed without thought".   Martin- should this be removed from the global options as well?


Terry T.




On 04/22/2014 05:00 AM, r-devel-requ...@r-project.org wrote:

>McGehee, Robert
> on Mon, 21 Apr 2014 09:24:13 -0400 writes:

 > Agreed. Perhaps even a global option would make sense. We
 > already have an option with a similar spirit:
 > 'options(?stringsAsFactors"=T/F)'. Perhaps
 > 'options(?exactNumericAsString?=T/F)' [or something else]
 > would be desirable, with the option being the default
 > value to the type.convert argument.

No, please, no, not a global option here!

Global options that influence default behavior of basic
functions is too much against the principle of functional
programming, and my personal opinion has always been that
'stringsAsFactors' has been a mistake (as a global option, not
as an argument).

Note that with such global options, the output of sessionInfo()
would in principle have to contain all (such) global options in
addtion to R and package versions in order to diagnose behavior
of R functions.

I think we have more or less agreed that we'd like to have
a new function*argument*  to type.convert();
passed "upstream" to read.table() and via ... the other
read.() that call read.table.


 > I also like Gabor?s idea of a ?distinguishing class?. R
 > doesn?t natively support arbitrary precision numbers
 > (AFAIK), but I think that?s what Murray wants. I could
 > imagine some kind of new class emerging here that
 > initially looks just like a character/factor, but may
 > evolve over time to accept arithmetic methods and act more
 > like a number (e.g. knowing that ?0.1?, ?.10? and "1e-1"
 > are the same number, or that ?-9? ?bignum? perhaps?

That's another interesting idea. As maintainer of CRAN package
'Rmpfr' and co-maintainer of 'gmp', I'm even biased about this
issue.

Martin



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-22 Thread Milan Bouchet-Valat
Le mardi 22 avril 2014 à 12:18 -0500, Therneau, Terry M., Ph.D. a
écrit :
> "No global options"
> I don't have an opinion about type.convert, but I must object to Martin's 
> sweeping 
> statement about global options, and stringsAsFactors in particular.  There 
> have been only 
> a few decisions in Splus/R that were so bad that our biostat group modified 
> the core 
> routines in order to return to sane behavior: automatic conversion of strings 
> to factors 
> was one of them --- not just when reading a data set but every bloody time 
> you modified a 
> data frame.  Addition of the global option was a blessing.
> 
> I work in a large biostatistics group whose mission is the advancement of 
> medicine. 
> Nothing frightens me more about the long term viability of R as a tool than 
> sweeping 
> announcements about "principles" which brush pragmatic considerations aside 
> as irrelevant. 
>   Some of us need to get work done.  (S4 zealots can be particularly annoying 
> in this 
> regard.)
> 
> How many of you remember the orignal S decision to have all modeling 
> functions fail upon 
> seeing a missing value?  The na.action argument was only available within 
> lm() etc calls, 
> with no global override, because "missing values are serious artifacts and 
> should not be 
> removed without thought".   Martin- should this be removed from the global 
> options as well?
Very interesting. Do you have any written references about this
behavior, and how it was eventually changed?

Thanks

> Terry T.
> 
> 
> 
> 
> On 04/22/2014 05:00 AM, r-devel-requ...@r-project.org wrote:
> >> >McGehee, Robert
> >> > on Mon, 21 Apr 2014 09:24:13 -0400 writes:
> >  > Agreed. Perhaps even a global option would make sense. We
> >  > already have an option with a similar spirit:
> >  > 'options(?stringsAsFactors"=T/F)'. Perhaps
> >  > 'options(?exactNumericAsString?=T/F)' [or something else]
> >  > would be desirable, with the option being the default
> >  > value to the type.convert argument.
> >
> > No, please, no, not a global option here!
> >
> > Global options that influence default behavior of basic
> > functions is too much against the principle of functional
> > programming, and my personal opinion has always been that
> > 'stringsAsFactors' has been a mistake (as a global option, not
> > as an argument).
> >
> > Note that with such global options, the output of sessionInfo()
> > would in principle have to contain all (such) global options in
> > addtion to R and package versions in order to diagnose behavior
> > of R functions.
> >
> > I think we have more or less agreed that we'd like to have
> > a new function*argument*  to type.convert();
> > passed "upstream" to read.table() and via ... the other
> > read.() that call read.table.
> >
> >
> >  > I also like Gabor?s idea of a ?distinguishing class?. R
> >  > doesn?t natively support arbitrary precision numbers
> >  > (AFAIK), but I think that?s what Murray wants. I could
> >  > imagine some kind of new class emerging here that
> >  > initially looks just like a character/factor, but may
> >  > evolve over time to accept arithmetic methods and act more
> >  > like a number (e.g. knowing that ?0.1?, ?.10? and "1e-1"
> >  > are the same number, or that ?-9? >  > ?bignum? perhaps?
> >
> > That's another interesting idea. As maintainer of CRAN package
> > 'Rmpfr' and co-maintainer of 'gmp', I'm even biased about this
> > issue.
> >
> > Martin
> >
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] type.convert and doubles

2014-04-22 Thread Martin Maechler
> Therneau, Terry M , Ph D 
> on Tue, 22 Apr 2014 12:18:55 -0500 writes:

> "No global options"
> I don't have an opinion about type.convert, but I must object to Martin's 
sweeping 
> statement about global options, and stringsAsFactors in particular.  
There have been only 
> a few decisions in Splus/R that were so bad that our biostat group 
modified the core 
> routines in order to return to sane behavior: automatic conversion of 
strings to factors 
> was one of them --- not just when reading a data set but every bloody 
time you modified a 
> data frame.  Addition of the global option was a blessing.

> I work in a large biostatistics group whose mission is the advancement of 
medicine. 
> Nothing frightens me more about the long term viability of R as a tool 
than sweeping 
> announcements about "principles" which brush pragmatic considerations 
aside as irrelevant. 
> Some of us need to get work done.  (S4 zealots can be particularly 
annoying in this 
> regard.)

> How many of you remember the orignal S decision to have all modeling 
functions fail upon 
> seeing a missing value?  The na.action argument was only available within 
lm() etc calls, 
> with no global override, because "missing values are serious artifacts 
and should not be 
> removed without thought".   Martin- should this be removed from the 
global options as well?

Terry,  you are right that sweeping statements in general are
not something scientists should use often.

First note that I would not advocate abolishing existing global
options,  because at the same time I do advocate back
compatibility often more than colleagues.

But I do continue the argument that global options are something
tempting but never necessary.  Almost all agree that their
convenience, e.g. for output printing, e.g. number of digits, or
plotting -- adapting  to "current state" is something we just do
want for convenience.
But I'm still arguing that using an explicit 'stringsAsfactor'
*argument* -- or your own wrapper for  read.table() with 
different defaults, would be much cleaner.  There are not so
many cases where you'd have to pass such an argument, and - I
think also pass a 'na.action' argument to modelling functions, 
rather than getting these from a global option.

Said all that, yes, I'd try to fight hard introducing 
*more* global options that influence basic R functionality
apart from *output* configuration.

Martin







> On 04/22/2014 05:00 AM, r-devel-requ...@r-project.org wrote:
>>> >McGehee, Robert
>>> > on Mon, 21 Apr 2014 09:24:13 -0400 writes:
>> > Agreed. Perhaps even a global option would make sense. We
>> > already have an option with a similar spirit:
>> > 'options(?stringsAsFactors"=T/F)'. Perhaps
>> > 'options(?exactNumericAsString?=T/F)' [or something else]
>> > would be desirable, with the option being the default
>> > value to the type.convert argument.
>> 
>> No, please, no, not a global option here!
>> 
>> Global options that influence default behavior of basic
>> functions is too much against the principle of functional
>> programming, and my personal opinion has always been that
>> 'stringsAsFactors' has been a mistake (as a global option, not
>> as an argument).
>> 
>> Note that with such global options, the output of sessionInfo()
>> would in principle have to contain all (such) global options in
>> addtion to R and package versions in order to diagnose behavior
>> of R functions.
>> 
>> I think we have more or less agreed that we'd like to have
>> a new function*argument*  to type.convert();
>> passed "upstream" to read.table() and via ... the other
>> read.() that call read.table.
>> 
>> 
>> > I also like Gabor?s idea of a ?distinguishing class?. R
>> > doesn?t natively support arbitrary precision numbers
>> > (AFAIK), but I think that?s what Murray wants. I could
>> > imagine some kind of new class emerging here that
>> > initially looks just like a character/factor, but may
>> > evolve over time to accept arithmetic methods and act more
>> > like a number (e.g. knowing that ?0.1?, ?.10? and "1e-1"
>> > are the same number, or that ?-9?> > ?bignum? perhaps?
>> 
>> That's another interesting idea. As maintainer of CRAN package
>> 'Rmpfr' and co-maintainer of 'gmp', I'm even biased about this
>> issue.
>> 
>> Martin
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel