Re: [Rd] Base R examples that write to current working directory

2018-04-04 Thread Gabe Becker
Martin et al,

I have submitted a patch on bugzilla which fixes all of the examples I
could easily find which were not already writing only to temporary files or
switching to a temp directory before writing files to the working
directory. https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17403

Passes make check-devel so the examples run and the packages that were
modified pass check.

Best,
~G

On Tue, Apr 3, 2018 at 2:37 AM, Martin Maechler 
wrote:

> > Henrik Bengtsson 
> > on Fri, 30 Mar 2018 10:14:04 -0700 writes:
>
> > So, the proposal would then be to write to tempdir(),
> > correct?  If so, I see three alternatives:
>
> > 1. explicitly use file.path(tempdir(), filename), or
> > tempfile() everywhere.
>
> I think it should clearly be  '1.',
> as both '2.' and '3.' would need new functionality in R.
>
> Ideally we'd get the patch within a day or two, so we can safely
> apply it also to  R 3.5.0 alpha  (before it turns beta!).
>
> I think the 'eval.path' argument to example() is a nice idea,
> but also changing its default to  tempdir() is definitely out of
> the question for R 3.5.0.
>
> Martin
>
>
> > 2. wrap example code in a withTempDir({ ... }) call.
>
> > 3. Add an 'eval.path' (*) argument to example() and make
> > it default to eval.path = tempdir(). This would probably
> > be backward compatible and keep the code example clean.
> > The downside is when a user runs an example and can't
> > locate produced files. (*) or 'wd', 'workdir', ...
>
> > /Henrik
>
> > On Fri, Mar 30, 2018 at 9:25 AM, Uwe Ligges
> >  wrote:
> >>
> >>
> >> On 30.03.2018 00:08, Duncan Murdoch wrote:
> >>>
> >>> On 29/03/2018 5:23 PM, Hadley Wickham wrote:
> 
>  Hi all,
> 
>  Given the recent CRAN push to prevent examples writing
>  to the working directory, is there any interest in
>  fixing base R examples that write to the working
>  directory? A few candidates are the graphics devices,
>  file.create(), writeBin(), writeChar(), write(), and
>  saveRDS(). I'm sure there are many more.
> 
>  One way to catch these naughty examples would be to
>  search for unlink() in examples: e.g.,
> 
>  https://github.com/wch/r-source/search?utf8=✓&q=unlink+
> extension%3ARd&type=
> 
> .
>  Of course, simply cleaning up after yourself is not
>  sufficient because if those files existed before the
>  examples were run, the examples will destroy them.
> 
> >>>
> >>> Why not put together a patch that fixes these?  This
> >>> doesn't seem to be something that needs discussion,
> >>> fixing the bad examples would be a good idea.
> >>
> >>
> >> Seconded. CRAN would not accept these base packages,
> >> hence we should urgently give better examples.
> >>
> >> Best, Uwe
> >>
> >>
> >>
> >>> Duncan Murdoch
> >>>
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Part of fastpass in 'sort.list' can make sorting unstable

2018-04-06 Thread Gabe Becker
Thanks for catching this. This is easy to take out without touching the
rest of the machinery. It also wouldn't be too hard to write a
still-faster-but-not-quite-as-much-path which correctly reverses the
sortedness of a sorted vector that includes ties. My suspicion, without
being the one who will ultimately make that decision, is that that wouldn't
go into 3.5.0 though.

Best,
~G

On Fri, Apr 6, 2018 at 3:03 PM, Suharto Anggono Suharto Anggono via R-devel
 wrote:

> In the code of functions 'order' and 'sort.list' in R 3.5.0 alpha (in
> https://svn.r-project.org/R/branches/R-3-5-branch/src/
> library/base/R/sort.R), in "fastpass, take advantage of ALTREP metadata",
> there is "try the reverse since that's easy too...". If it succeeds, ties
> are reordered, violating stability of sorting.
>
> Example:
> x <- sort(c(1, 1, 3))
> x  # 1 1 3
> sort.list(x, decreasing=TRUE)  # should be 3 1 2
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean(x) for ALTREP

2018-04-26 Thread Gabe Becker
Serguei,

The R 3.5.0 release includes the fundamental ALTREP framework but does not
include many 'hooks' within R's source code to make use of methods on the
ALTREP custom vector classes. I have implemented a fair number, including
for mean() to use the custom Sum method when available, in the ALTREP
branch but unfortunately we did not have time to test and port them to the
trunk in time for this release. The current plan, as I understand it, is
that we will continue to develop and test these, and other hooks, and then
when ready they will be ported into trunk/R-devel over the course this
current development cycle for inclusion in the next release of R.

My hope is that the end-user benefits of ALTREP will really show through
much more in future releases, but for now, things like mean will will
behave as they always have from a user perspective.

Best,
~G


On Thu, Apr 26, 2018 at 2:31 AM, Serguei Sokol 
wrote:

> Hi,
>
> By looking at a doc about ALTREP https://svn.r-project.org/R/br
> anches/ALTREP/ALTREP.html (by the way congratulations for that and for
> R-3.5.0 in general), I was a little bit surprised by the following example:
>
> > x <- 1:1e10
> > system.time(print(mean(x)))
> [1] 5e+09
>user  system elapsed
>  38.520   0.008  38.531
>
> Taking 38.520 s to calculate a mean value of an arithmetic sequence seemed
> a lot to me. It probably means that calculations are made by running into a
> for loop while in the case of arithmetic sequence a mean value can simply
> be calculated as (b+e)/2 where b and e are the begin and end value
> respectively. Is it planned to take benefit of ALTREP for functions like
> mean(), sum(), min(), max() and some others to avoid running a for loop
> wherever possible? It seems so natural to me but after all some
> implementation details preventing this can escape to me.
>
> Best,
> Serguei.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] debugonce() functions are not considered as debugged

2018-05-01 Thread Gabe Becker
Gabor,

Others can speak to the origins of this more directly, but from what I
recall this has been true at least since I was working in this space on the
debugcall stuff a couple years ago. I imagine the reasoning  is what you
would expect: a single bit of course can't tell R both that a function is
debugged AND that it should undebug after the first call.  I don't know of
any R-facing way to check for debugonce status, though its possible I
missed it

That said, it would be possible to alter how the two bits are used so that
debugonce sets both of them, and debug (not once) only sets one, rather
them being treated as mutually exclusive. This would alter the behavior so
that debugonce'ed functions that haven't been called yet are considered
debugged, e.g., by isdebugged.

This would not, strictly speaking, be backwards compatible, but by the very
nature of what debugging means, it would not break any existing script
code. It could, and likely would, effect code implementing GUIs, however.

R-core - is this a patch that you are interested in and would consider
incorporating? If so I can volunteer to work on it.

Best,
~G

On Sat, Apr 28, 2018 at 4:57 AM, Gábor Csárdi 
wrote:

> debugonce() sets a different flag (RSTEP), and this is not queried by
> isdebugged(), and it is also not unset by undebug().
>
> Is this expected? If yes, is there a way to query and unset the RSTEP flag
> from R code?
>
> ❯ f <- function() { }
> ❯ debugonce(f)
> ❯ isdebugged(f)
> [1] FALSE
>
> ❯ undebug(f)
> Warning message:
> In undebug(f) : argument is not being debugged
>
> ❯ f()
> debugging in: f()
> debug at #1: {
> }
> Browse[2]>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Gabe Becker
As of 3.5.0 the ...length() function does exactly what you are asking for.
Before that, I don't know of an easy way to get the length without
evaluation via R code. There may be one I'm not thinking of though, I
haven't needed to do this myself.

Hope that helps.

~G

On Thu, May 3, 2018 at 7:52 AM, Mark van der Loo 
wrote:

> This question is better aimed at the r-help mailinglist as it is not about
> developing R itself.
>
>
> having said that,
>
> I can only gues why you want to do this, but why not do something like
> this:
>
>
> f <- function(...){
>L <- list(...)
>len <- length()
>   # you can stll pass the ... as follows:
>   do.call(someotherfunction, L)
>
> }
>
>
> -Mark
>
> Op do 3 mei 2018 om 16:29 schreef Dénes Tóth :
>
> > Hi,
> >
> >
> > In some cases the number of arguments passed as ... must be determined
> > inside a function, without evaluating the arguments themselves. I use
> > the following construct:
> >
> > dotlength <- function(...) length(substitute(expression(...))) - 1L
> >
> > # Usage (returns 3):
> > dotlength(1, 4, something = undefined)
> >
> > How can I define a method for length() which could be called directly on
> > `...`? Or is it an intention to extend the base length() function to
> > accept ellipses?
> >
> >
> > Regards,
> > Denes
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Debugging "SETLENGTH() cannot be applied to an ALTVEC object."?

2018-05-04 Thread Gabe Becker
Tal,

I don't have a debian machine at my fingertips, but I don't see that error
when installing heatmaply into a clean library in R 3.5.0 (that takes a
while...).

I suspect you're hitting old installed versions of packages in that build
on that machine, especially since the failure is not universal, but I don't
have any visibliity into the internals of that system so I have no way of
knowing if that's true or not.

You can use Gabor's builder.r-hub.io to try checking your package on the
Debian VMs over there. If it can pass there that would be more evidence
that something more complicated is going on.

Sorry I couldn't be more direct help.
~G

On Fri, May 4, 2018 at 12:46 PM, Tal Galili  wrote:

> Hi all,
> I wish to push a new version of a package (heatmaply).
>
> I sent it to winbuild with no issues but after submitting it to CRAN I got
> an update that it breaks on Debian, see:
>
> package heatmaply_0.15.0.tar.gz does not pass the incoming checks
> automatically, please see the following pre-tests:
> Windows:  heatmaply_0.15.0_20180502_082353/Windows/00check.log>
> Status: OK
> Debian:  heatmaply_0.15.0_20180502_082353/Debian/00check.log>
> Status: 1 ERROR, 1 WARNING
>
>
>
> Looking at the errors I get, they are all of the type:
> "SETLENGTH() cannot be applied to an ALTVEC object."
> I assume this is somehow related to changes in R 3.5.0 (maybe related to
> this
>  aa4a2c>?),
> but I'm not sure how to debug it (as I don't have this environment set-up),
> not am I sure what is actually causing the issue.
>
> Any suggestions would be most appreciated.
>
> (I debated if to post it here or on r-package-devel, and it seems a more
> general R issue than package development issue - but feel free to correct
> me about this if you think otherwise)
>
> Cheers,
> Tal
>
>
>
>
>
>
> Contact
> Details:---
> Tal Galili, Ph.D. in Statistics
>
> tal.gal...@gmail.com
>
> www.r-statistics.com (English)
> www.biostatistics.co.il (Hebrew)  | www.talgalili.com (Hebrew)
> 
> --
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-07 Thread Gabe Becker
Hey all,

I don't have a strong opinion about whether the default should ultimately
eventually change or not. Many people who use windows (a set which does not
include me) seem to think it would be better.

I will say that like Hugh, I'm strongly against making the argument
mandatory as an interim step. That is much less backwards compatible (ie it
will break much more existing code) than just changing the default would. I
would be for smarter heuristics, perhaps a warning, and eventually a change
instead if the change is ultimately decided on as the way forward.

Best,
~G

On Mon, May 7, 2018 at 5:32 AM, Hugh Parsonage 
wrote:

> I'd add my support for mode = "wb" to (eventually) become the default,
> though I respect Tomas's comments about backwards-compatibility.
>
> Instead of making the argument mandatory (which would immediately
> break scripts -- even ones that won't be helped by changing to mode =
> 'wb') or otherwise changing behaviour, perhaps download.file could
> start to emit a message (not a warning) whenever the argument is
> missing on Windows. The message could say something like 'Using `mode
> = 'w'` which will corrupt non-text files. Set `mode = 'wb'` for binary
> downloads or see the help page for other options.' Emitting a message
> has the lightest impact on existing scripts, while alerting new users
> to future mistakes.
>
> On 7 May 2018 at 18:49, Joris Meys  wrote:
> > Martin, also from me a heartfelt thank you for taking care of this. Some
> > thoughts on Henrik's response:
> >
> > On Mon, May 7, 2018 at 2:28 AM, Henrik Bengtsson <
> henrik.bengts...@gmail.com
> >> wrote:
> >
> >>
> >> I still argue that the current behavior cause more harm than it helps.
> >>
> >
> > I agree with your analysis of the problems this legacy behaviour causes.
> >
> > Deprecating the default mode="w" on Windows can be done in steps, e.g.
> >> by making the argument mandatory for a while. This could be done on
> >> all platforms because we're already all affected, i.e. we need to
> >> specify 'mode' to avoid surprises.
> >>
> >
> > That sounds like a reasonable way to move away from this discrepancy
> > between OS.
> >
> >
> >> What about case-insensitive matching, e.g. data.ZIP and data.Rdata?
> >>
> >
> > Totally agree, and easily solved by eg adding ignore.case = TRUE to the
> > grep() call.
> >
> >
> >> A quick scan of the R source code suggests that R is also working with
> >> the following filename extensions (using various case styles):
> >>
> >> What about all the other file extensions that we know for sure are
> binary?
> >>
> >
> > If the default isn't changed, doesn't it make more sense to actually turn
> > the logic around? Text files that are downloaded over the internet are
> > almost always .txt, .csv, or a few other extensions used for text data .
> > Those are actually the only files where some people with very old Windows
> > programs for text processing can get into trouble. So instead of adding
> > every possible binary extension, one can put "wb" as default and change
> to
> > "w" if it is a text file instead of the other way around. That would not
> > change the concept of the behaviour, but ensures that the function
> doesn't
> > fail to detect a binary file. Not detecting a text file is far less of a
> > problem, as not converting the line endings doesn't destruct the file.
> >
> > Cheers
> > Joris
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Department of Data Analysis and Mathematical Modelling
> > Ghent University
> > Coupure Links 653, B-9000 Gent (Belgium)
> >  9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
> >
> > ---
> > Biowiskundedagen 2017-2018
> > http://www.biowiskundedagen.ugent.be/
> >
> > ---
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposed speedup of ifelse

2018-05-07 Thread Gabe Becker
Hugh,

(Note I speak for myself only and not for R-core) Thanks for looking into
this. I think it's great to have community members that are interested in
contributing to R and helping it continue to get better.

And I think, and my local experiments bear out, that using anyNA as a
fastpass condition does allow us to get a significant speedup over what's
in there now. To do so, though, I took a somewhat different approach than
your proposal:

ifelse2 = function(test, yes, no) {
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1) {
yat <- attributes(yes)
if (is.null(yat) || (is.function(yes) &&
identical(names(yat),

 "srcref")))
return(yes)
}
}
else if (length(no) == 1) {
nat <- attributes(no)
if (is.null(nat) || (is.function(no) &&
identical(names(nat),

"srcref")))
return(no)
}
}
}
else test <- if (isS4(test))
 methods::as(test, "logical")
 else as.logical(test)
## this is to ensure the documented behavior re: attributes of result
ans <- test
len = length(ans)
if(nonas <- !anyNA(test)) {
ypos = test
npos = !test
} else {
ok <- !(nas <- is.na(test))
ypos = test & ok
npos = !test & ok
}
if(any(ypos, na.rm = TRUE)) ##equivalent to any(test[ok])
ans[ypos] = rep(yes, length.out = len)[ypos]
if(any(npos, na.rm = TRUE)) ##equivalent to any(!test[ok])
ans[npos] = rep(no, length.out = len)[npos]
## This is in the original but I don't see why it's necessary
## due to ans being initialized to test. The NAs should already
## be there...
if(!nonas)
ans[nas] = NA
ans
}

On my machine, after an initial call to invoke the JIT and get the function
compiled, this is faster at lengths of test 100 and 1 (with the lengths
of yes and no at 10% of the length of test) by ~1.7x and ~2x respectively
for no NAs and ~1.3x and ~1.6x respectively for 10% NAs.

The key, from what I saw, is to avoid as much &ing and subsetting as we
can.  If there are no NAs none of the test&ok or test[ok] operations do
anything because ok has only TRUEs in it. Even when there are, we want to
do the & once and avoid test[ok].

There are further savings for the NAs present case if I'm correct about the
ans[nas] = NA being redundant and we're able to remove that as well.

I'm happy to submit this as a patch and share credit if that is ok with
you. Let me know.

Best,

On Thu, May 3, 2018 at 9:58 PM, Hugh Parsonage 
wrote:

> Thanks Radford. I concur with all your points. I've attempted to address
> the issues you raised through the github.io post.  The new method appears
> to be slower for test lengths < 100 and possibly longer lengths (not just <
> 10). Of course length(test) < 100 is very quick, so I simply added this to
> the conditions that cause the old ifelse method to be invoked. I'll leave
> it to R-core to decide whether or not the benefits for longer vectors are
> worth it.
>
>
>
>
>
>
> On Fri, 4 May 2018 at 01:01 Radford Neal  wrote:
>
> > > I propose a patch to ifelse that leverages anyNA(test) to achieve an
> > > improvement in performance. For a test vector of length 10, the change
> > > nearly halves the time taken and for a test of length 1 million, there
> > > is a tenfold increase in speed. Even for small vectors, the
> > > distributions of timings between the old and the proposed ifelse do
> > > not intersect.
> >
> > For smaller vectors, your results are significantly affected by your
> > invoking the old version via base::ifelse.  You could try defining
> > your new version as new_ifelse, and invoking the old version as just
> > ifelse.  There might still be some issues with the two versions having
> > different context w.r.t environments, and hence looking up functions
> > in different ways.  You could copy the code of the old version and
> > define it in the global environment just like new_ifelse.
> >
> > When using ifelse rather than base::ifelse, it seems the new version
> > is slower for vectors of length 10, but faster for long vectors.
> >
> > Also, I'd use system.time rather than microbenchmark.  The latter will
> > mix invocations of the two functions in a way where it is unclear that
> > garbage collection time will be fairly attributed.  Also, it's a bit
> > silly to plot the distributions of times, which will mostly reflect
> > variations in when garbage collections at various levels occur - just
> > the mean is what is relevant.
> >
> > Regards,
> >
> >Radford Neal
> >
>
> [[alternative HTML version deleted]]
>
> __

Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-11 Thread Gabe Becker
Emil et al.,


On Mon, Jun 11, 2018 at 1:08 AM, Emil Bode  wrote:

> I don't think there's much wrong with is.na(as_date(Inf,
> origin='1970-01-01'))==FALSE, as there still is some "non-NA-ness" about
> the value (as difftime shows), but that the output when printing is
> confusing. The way cat is treating it is clearer: it does print Inf.
>
> So would this be a solution?
>
> format.Date <- function (x, ...)
> {
>   xx <- format(as.POSIXlt(x), ...)
>   names(xx) <- names(x)
>   xx[is.na(xx) & !is.na(x)] <- paste('Invalid date:',as.numeric(x[is.na(xx)
> & !is.na(x)]))
>   xx
> }
>
> Which causes this behaviour, which I think is clearer:
>
> environment(print.Date) <- .GlobalEnv
> x <- as_date(Inf, origin='1970-01-01')
> print(x)
> # [1] "Invalid date: Inf"
>

In my opinion, it's either invalid or it isn't. If it's actually invalid,
as_date (and the equivalent core function which is actually relevant on
this list) should fail; because it's an invalid date.

If it *isn't* invalid, having the print method tell users it is seems
problematic.

And I think people seem to be leaning towards it not being invalid. A bit
surprising to me, as my personal first thought was that infinite dates
don't make any sense, but I don't really have a horse in this race and so
defer to the cooler heads that are saying having an infinite date perhaps
should not be disallowed explicitly. If it's not, though, it's not invalid
and we shouldn't confuse users by saying it is, imho.

Best,
~G


>
> Best regards,
> Emil Bode
>
> Data-analyst
>
> +31 6 43 83 89 33
> emil.b...@dans.knaw.nl
>
> DANS: Netherlands Institute for Permanent Access to Digital Research
> Resources
> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 |
> i...@dans.knaw.nl  | dans.knaw.nl
> 
> DANS is an institute of the Dutch Academy KNAW  and
> funding organisation NWO .
>
> Who will be the winner of the Dutch Data Prize 2018? Go to researchdata.nl
> to nominate.
>
> On 09/06/2018, 13:52, "R-devel on behalf of Joris Meys" <
> r-devel-boun...@r-project.org on behalf of jorism...@gmail.com> wrote:
>
> And now I've seen I copied the wrong part of ?is.na
>
> > The default method for is.na applied to an atomic vector returns a
> logical vector of the same length as its argument x, containing TRUE
> for
> those elements marked NA or, for numeric or complex vectors, NaN, and
> FALSE
> otherwise.
>
> Key point being "atomic vector" here.
>
>
> On Sat, Jun 9, 2018 at 1:41 PM, Joris Meys 
> wrote:
>
> > Hi Werner,
> >
> > on ?is.na it says:
> >
> > > The default method for anyNA handles atomic vectors without a
> class and
> > NULL.
> >
> > I hear you, and it is confusing to say the least. Looking deeper, the
> > culprit seems to be in the conversion of a Date to POSIXlt prior to
> the
> > formatting:
> >
> > > x <- as.Date(Inf,origin = '1970-01-01')
> > > is.na(as.POSIXlt(x))
> > [1] TRUE
> >
> > Given this implicit conversion, I'd argue that as.Date should really
> > return NA as well when passed an infinite value. The other option is
> to
> > provide an is.na method for the Date class, which is -given is.na
> is an
> > internal generic- rather trivial:
> >
> > > is.na.Date <- function(x) is.na(as.POSIXlt(x))
> > > is.na(x)
> > [1] TRUE
> >
> > This might be a workaround for your current problem without needing
> > changes to R itself. But this will give a "wrong" answer in the
> sense that
> > this still works:
> >
> > > Sys.Date() - x
> > Time difference of -Inf days
> >
> > I personally would go for NA as the "correct" date for an infinite
> value,
> > but given that this will have implications in other areas, there is a
> > possibility of breaking code and it should be investigated a bit
> further
> > imho.
> > Cheers
> > Joris
> >
> >
> >
> >
> > On Fri, Jun 8, 2018 at 11:21 PM, Werner Grundlingh <
> wgrundli...@gmail.com>
> > wrote:
> >
> >> Indeed. as_date is from lubridate, but the same holds for as.Date.
> >>
> >> The output and it's interpretation should be consistent, otherwise
> it
> >> leads
> >> to confusion when programming. I understand that the difference
> exists
> >> after asking a question on Stack Overflow:
> >>   https://stackoverflow.com/q/50766089/914686
> >> This understanding is never mentioned in the documentation - that
> an Inf
> >> date is actually represented as NA:
> >>   https://www.rdocumentation.org/packages/base/versions/3.5.0/
> >> topics/as.Date
> >> So I'm of the impression that the display should be fixed as a first
> >> option
> >> (thereby providing clarity/transparency in terms of back-end and
> output),
> >> or the documentation amended (to highlight this) as a second option.
> >>
> >>

Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-13 Thread Gabe Becker
Greg,

I see what you mean, but on the other hand, that's not how we think about
real numbers working either, and doubles have that behavior generally. It
might be possible to put checks in (with a potentially non-trivial overhead
cost) to disallow that kind of thing, but again R (and everyone else, I
think?) doesn't do so for regular doubles.

Also, I would expect the year 1e50 and the "year" Inf to be functionally
equivalent in meaning (and largely meaningless) in context.

Best,
~G

On Tue, Jun 12, 2018 at 4:23 PM, Greg Minshall  wrote:

> Martin, et al.,
>
> > I think we should allow 'year' to be "double" instead, and so it
> > could also be +Inf or -Inf and we'd nicely cover
> > the conversions from and to 'numeric' -- which is really used
> > internally for dates and date-times in  POSIXct.
>
> storing years as a double makes me worry slightly about
> 
> > year <- 1e50
> > (year+1)-year
> [1] 0
> 
> which is not how one thinks of years (or integers) as behaving.
>
> cheers, Greg
>
> ps -- sorry for the ">" overloading!
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug in tools::md5sum - does not work when filepath contains tilde (ie home directory)

2018-06-29 Thread Gabe Becker
Dean,

I filed a patch for this in bugzilla yesterday so depending on reception
this should be fixed in devel soon.

Best,
~G


On Fri, Jun 29, 2018, 3:58 AM Dean Attali  wrote:

> I've reproduced on ubuntu and winodws with R3.4.3
>
> When the filepath contains a tilde, the result is NA. But if the file path
> is expanded then the function works.
>
> Example:
> tools::md5sum("~/.Rprofile") returns NA
> tools::md5sum(normalizePath("~/.Rprofile")) returns the proper md5
>
>
> Perhaps this is expected behaviour because the documentation does say NA is
> returned for unreadable files, but I didn't think "~" would make a file
> unreadable to the function.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Parametrized Vignettest in R packages

2018-07-02 Thread Gabe Becker
Witold,

Vignettes, in the package sense, are and must be entirely self-contained as
far as I know. They are run automatically in completely clean R sessions.
I'm not sure a parameterized vignette makes a ton of sense within that
context.

Can you explain what you would want to have happen when the package is
built that would require parameterization?

~G

On Mon, Jul 2, 2018 at 7:30 AM, Witold E Wolski  wrote:

> Hello,
>
> I have a package which includes some parameterized r-markdown report
> which I would also like to build as package vignettes.
>
> Is there a way to run the parameterized vignette creation with the
> package build or package check?
>
> Thank you
>
> --
> Witold Eryk Wolski
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-07 Thread Gabe Becker
Hadley,


> I was thinking primarily of completing the set of is.matrix() and
> is.array(), or generally, how do you say: is `x` a 1d dimensional
> thing?
>

Can you clarify what you mean by dimensionality sense and specifically 1d
here?

You can have a 1d array which is different from what your proposed function
would call a vector. So is.null(dim(x)) doesn't seem the same as 1d, right?

> x = array(1:10)

> x

 [1]  1  2  3  4  5  6  7  8  9 10

> class(x)

[1] "array"

> dim(x)

[1] 10

> dim(1:10)

NULL


You can also have an n x 1 matrix, which *technically* has 2 dimensions but
conceptually is equivalent to a 1d array and/or a vector.

Also, are you including lists in your conceptions of 1d vector here? I'm
with Duncan here, in that i'm having trouble understanding exactly what you
want to do without a bit more context.

Best,
~G



>
> (I don't have any feel for whether the check should be is.null(dim(x))
> vs. length(dim(x)) <= 1)
>
> Hadley
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Testing for vectors

2018-07-07 Thread Gabe Becker
Hadley,


On Sat, Jul 7, 2018 at 1:32 PM, Hadley Wickham  wrote:

> On Sat, Jul 7, 2018 at 1:50 PM, Gabe Becker  wrote:
> > Hadley,
> >
> >>
> >> I was thinking primarily of completing the set of is.matrix() and
> >> is.array(), or generally, how do you say: is `x` a 1d dimensional
> >> thing?
> >
> >
> > Can you clarify what you mean by dimensionality sense and specifically 1d
> > here?
>
> What do we call a vector that is not an array? (or matrix)
>
> What do we call an object that acts 1-dimensional? (i.e. has
> length(dim()) %in% c(0, 1)) ?
>


Right, or even (length(dim()) == 0 || sum(dim() > 1) <= 1)

 but that is exactly my point, those two(/three) sets of things are not the
same. 1d arrays meet the second definition but not the first. Matrices and
arrays that don't meet either of yours would still meet mine. Which
definition are you proposing strictly define what a vector is?

Before we have a function which performs the test we need to actually know
what is being tested. So again I would mirror Duncan's question: What code
do you have now or are you planning to write that needs to check this, and
how would it feel about the various types of objects being discussed here?

In other words, in what cases do you actually need to strictly check if
something "is a vector"?

Another completely unrelated way to define vector, btw, is via the vector
interface (from what I recall this is roughly [, [[, length, and format
methods, though I'm probably forgetting some). This is (more or less)
equivalent to defining a vector as "a thing that can be the column of a
data.frame and have all the base-provided machinery work".

> You can also have an n x 1 matrix, which technically has 2 dimensions but
> > conceptually is equivalent to a 1d array and/or a vector.
>
> Yes. You can also have array that's n x 1 x 1.
>
> > Also, are you including lists in your conceptions of 1d vector here? I'm
> > with Duncan here, in that i'm having trouble understanding exactly what
> you
> > want to do without a bit more context.
>
> Isn't it standard terminology that a vector is the set of atomic vectors +
> list?
>

Maybe. If by standard you mean commonly used/understood, though, I doubt
most R users would understand a list to be a vector. I think most people
think of atomic vectors exclusively when they hear "vector" unless they've
very specifically been trained not to do so.

Best,
~G



>
> Hadley
>
> --
> http://hadley.nz
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-06 Thread Gabe Becker
Hadley,

Looks interesting and like a fun project from what you said in the email (I
don't have time right now to dig deep into the readme) A few thoughts.

First off, you are using the word "type" throughout this email; You seem to
mean class (judging by your Date and factor examples, and the fact you
mention S3 dispatch) as opposed to type in the sense of what is returned by
R's  typeof() function. I think it would be clearer if you called it class
throughout unless that isn't actually what you mean (in which case I would
have other questions...)

More thoughts inline.

On Mon, Aug 6, 2018 at 9:21 AM, Hadley Wickham  wrote:

> Hi all,
>
> I wanted to share with you an experimental package that I’m currently
> working on: vctrs, . The motivation for
> vctrs is to think deeply about the output “type” of functions like
> `c()`, `ifelse()`, and `rbind()`, with an eye to implementing one
> strategy throughout the tidyverse (i.e. all the functions listed at
> ). Because this is
> going to be a big change, I thought it would be very useful to get
> comments from a wide audience, so I’m reaching out to R-devel to get
> your thoughts.
>
> There is quite a lot already in the readme
> (), so here I’ll try to motivate
> vctrs as succinctly as possible by comparing `base::c()` to its
> equivalent `vctrs::vec_c()`. I think the drawbacks of `c()` are well
> known, but to refresh your memory, I’ve highlighted a few at
> . I think they arise
> because of two main challenges: `c()` has to both combine vectors *and*
> strip attributes, and it only dispatches on the first argument.
>
> The design of vctrs is largely driven by a pair of principles:
>
> -   The type of `vec_c(x, y)` should be the same as `vec_c(y, x)`
>
> -   The type of `vec_c(x, vec_c(y, z))` should be the same as
> `vec_c(vec_c(x, y), z)`
>
> i.e. the type should be associative and commutative. I think these are
> good principles because they makes types simpler to understand and to
> implement.
>
> Method dispatch for `vec_c()` is quite simple because associativity and
> commutativity mean that we can determine the output type only by
> considering a pair of inputs at a time. To this end, vctrs provides
> `vec_type2()` which takes two inputs and returns their common type
> (represented as zero length vector):
>
> str(vec_type2(integer(), double()))
> #>  num(0)
>
> str(vec_type2(factor("a"), factor("b")))
> #>  Factor w/ 2 levels "a","b":
>

What is the reasoning behind taking the union of the levels here? I'm not
sure that is actually the behavior I would want if I have a vector of
factors and I try to append some new data to it. I might want/ expect to
retain the existing levels and get either NAs or an error if the new data
has (present) levels not in the first data. The behavior as above doesn't
seem in-line with what I understand the purpose of factors to be (explicit
restriction of possible values).

I guess what I'm saying is that while I agree associativity is good for
most things, it doesn't seem like the right behavior to me in the case of
factors.

Also, while we're on factors, what does

vec_type2(factor("a"), "a")

return, character or factor with levels "a"?



>
> # NB: not all types have a common/unifying type
> str(vec_type2(Sys.Date(), factor("a")))
> #> Error: No common type for date and factor
>

Why is this not a list? Do you have the additional restraint that vec_type2
must return the class of one of its operands? If so, what is the
justification of that? Are you not counting list as a "type of vector"?


>
> (`vec_type()` currently implements double dispatch through a combination
> of S3 dispatch and if-else blocks, but this will change to a pure S3
> approach in the near future.)
>
> To find the common type of multiple vectors, we can use `Reduce()`:
>
> vecs <- list(TRUE, 1:10, 1.5)
>
> type <- Reduce(vec_type2, vecs)
> str(type)
> #>  num(0)
>
> There’s one other piece of the puzzle: casting one vector to another
> type. That’s implemented by `vec_cast()` (which also uses double
> dispatch):
>
> str(lapply(vecs, vec_cast, to = type))
> #> List of 3
> #>  $ : num 1
> #>  $ : num [1:10] 1 2 3 4 5 6 7 8 9 10
> #>  $ : num 1.5
>
> All up, this means that we can implement the essence of `vec_c()` in
> only a few lines:
>
> vec_c2 <- function(...) {
>   args <- list(...)
>   type <- Reduce(vec_type, args)
>
>   cast <- lapply(type, vec_cast, to = type)
>   unlist(cast, recurse = FALSE)
> }
>
> vec_c(factor("a"), factor("b"))
> #> [1] a b
> #> Levels: a b
>
> vec_c(Sys.Date(), Sys.time())
> #> [1] "2018-08-06 00:00:00 CDT" "2018-08-06 11:20:32 CDT"
>
> (The real implementation is little more complex:
> )
>
>

Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Gabe Becker
Hadley,

Responses inline.

On Wed, Aug 8, 2018 at 7:34 AM, Hadley Wickham  wrote:

> >>> Method dispatch for `vec_c()` is quite simple because associativity and
> >>> commutativity mean that we can determine the output type only by
> >>> considering a pair of inputs at a time. To this end, vctrs provides
> >>> `vec_type2()` which takes two inputs and returns their common type
> >>> (represented as zero length vector):
> >>>
> >>> str(vec_type2(integer(), double()))
> >>> #>  num(0)
> >>>
> >>> str(vec_type2(factor("a"), factor("b")))
> >>> #>  Factor w/ 2 levels "a","b":
> >>
> >>
> >> What is the reasoning behind taking the union of the levels here? I'm
> not
> >> sure that is actually the behavior I would want if I have a vector of
> >> factors and I try to append some new data to it. I might want/ expect to
> >> retain the existing levels and get either NAs or an error if the new
> data
> >> has (present) levels not in the first data. The behavior as above
> doesn't
> >> seem in-line with what I understand the purpose of factors to be
> (explicit
> >> restriction of possible values).
> >
> > Originally (like a week ago 😀), we threw an error if the factors
> > didn't have the same level, and provided an optional coercion to
> > character. I decided that while correct (the factor levels are a
> > parameter of the type, and hence factors with different levels aren't
> > comparable), that this fights too much against how people actually use
> > factors in practice. It also seems like base R is moving more in this
> > direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas
> > in R 3.5 it returns FALSE.
>
> I now have a better argument, I think:
>
> If you squint your brain a little, I think you can see that each set
> of automatic coercions is about increasing resolution. Integers are
> low resolution versions of doubles, and dates are low resolution
> versions of date-times. Logicals are low resolution version of
> integers because there's a strong convention that `TRUE` and `FALSE`
> can be used interchangeably with `1` and `0`.
>
> But what is the resolution of a factor? We must take a somewhat
> pragmatic approach because base R often converts character vectors to
> factors, and we don't want to be burdensome to users.


I don't know, I personally just don't buy this line of reasoning. Yes, you
can convert between characters and factors, but that doesn't make factors
"a special kind of character", which you seem to be implicitly arguing they
are. Fundamentally they are different objects with different purposes. As I
said in my previous email, the primary semantic purpose of factors is value
restriction. You don't WANT to increase the set of levels when your set of
values has already been carefully curated. Certainly not automagically.


> So we say that a
> factor `x` has finer resolution than factor `y` if the levels of `y`
> are contained in `x`. So to find the common type of two factors, we
> take the union of the levels of each factor, given a factor that has
> finer resolution than both.


I'm not so sure. I think a more useful definition of resolution may be that
it is about increasing the precision of information. In that case, a factor
with 4 levels each of which is present has a *higher* resolution than the
same data with additional-but-absent levels on the factor object.  Now that
may be different when the the new levels are not absent, but my point is
that its not clear to me that resolution is a useful way of talking about
factors.


> Finally, you can think of a character
> vector as a factor with every possible level, so factors and character
> vectors are coercible.
>



If users want unrestricted character type behavior, then IMHO they should
just be using characters, and it's quite easy for them to do so in any case
I can easily think of where they have somehow gotten their hands on a
factor. If, however, they want a factor, it must be - I imagine - because
they actually want the the semantics and behavior *specific* to factors.

Best,
~G


>
> (extracted from the in-progress vignette explaining how to extend
> vctrs to work with your own vctrs, now that vctrs has been rewritten
> to use double dispatch)
>
> Hadley
>
> --
> http://hadley.nz
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] vctrs: a type system for the tidyverse

2018-08-08 Thread Gabe Becker
Actually, I sent that too quickly, I should have let it stew a bit more.
I've changed my mind about the resolution argument I Was trying to make.
There is more information, technically speaking, in the factor with empty
levels. I'm still not convinced that its the right behavior, personally. It
may just be me though, since Martin seems on board. Mostly I'm just very
wary of taking away the thing about factors that makes them fundamentally
not characters, and removing the effectiveness of the level restriction, in
practice, does that.

Best,
~G

On Wed, Aug 8, 2018 at 8:54 AM, Martin Maechler 
wrote:

> > Hadley Wickham
> > on Wed, 8 Aug 2018 09:34:42 -0500 writes:
>
>  Method dispatch for `vec_c()` is quite simple because
>  associativity and commutativity mean that we can
>  determine the output type only by considering a pair of
>  inputs at a time. To this end, vctrs provides
>  `vec_type2()` which takes two inputs and returns their
>  common type (represented as zero length vector):
> 
>  str(vec_type2(integer(), double())) #> num(0)
> 
>  str(vec_type2(factor("a"), factor("b"))) #> Factor w/ 2
>  levels "a","b":
> >>>
> >>>
> >>> What is the reasoning behind taking the union of the
> >>> levels here? I'm not sure that is actually the behavior
> >>> I would want if I have a vector of factors and I try to
> >>> append some new data to it. I might want/ expect to
> >>> retain the existing levels and get either NAs or an
> >>> error if the new data has (present) levels not in the
> >>> first data. The behavior as above doesn't seem in-line
> >>> with what I understand the purpose of factors to be
> >>> (explicit restriction of possible values).
> >>
> >> Originally (like a week ago 😀), we threw an error if the
> >> factors didn't have the same level, and provided an
> >> optional coercion to character. I decided that while
> >> correct (the factor levels are a parameter of the type,
> >> and hence factors with different levels aren't
> >> comparable), that this fights too much against how people
> >> actually use factors in practice. It also seems like base
> >> R is moving more in this direction, i.e. in 3.4
> >> factor("a") == factor("b") is an error, whereas in R 3.5
> >> it returns FALSE.
>
> > I now have a better argument, I think:
>
> > If you squint your brain a little, I think you can see
> > that each set of automatic coercions is about increasing
> > resolution. Integers are low resolution versions of
> > doubles, and dates are low resolution versions of
> > date-times. Logicals are low resolution version of
> > integers because there's a strong convention that `TRUE`
> > and `FALSE` can be used interchangeably with `1` and `0`.
>
> > But what is the resolution of a factor? We must take a
> > somewhat pragmatic approach because base R often converts
> > character vectors to factors, and we don't want to be
> > burdensome to users. So we say that a factor `x` has finer
> > resolution than factor `y` if the levels of `y` are
> > contained in `x`. So to find the common type of two
> > factors, we take the union of the levels of each factor,
> > given a factor that has finer resolution than
> > both. Finally, you can think of a character vector as a
> > factor with every possible level, so factors and character
> > vectors are coercible.
>
> > (extracted from the in-progress vignette explaining how to
> > extend vctrs to work with your own vctrs, now that vctrs
> > has been rewritten to use double dispatch)
>
> I like this argumentation, and find it very nice indeed!
> It confirms my own gut feeling which had lead me to agreeing
> with you, Hadley, that taking the union of all factor levels
> should be done here.
>
> As Gabe mentioned (and you've explained about) the term "type"
> is really confusing here.  As you know, the R internals are all
> about SEXPs, TYPEOF(), etc, and that's what the R level
> typeof(.) also returns.  As you want to use something slightly
> different, it should be different naming, ideally something not
> existing yet in the R / S world, maybe 'kind' ?
>
> Martin
>
>
> > Hadley
>
> > --
> > http://hadley.nz
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Gabe Becker
Hadley,

Overall seems like a cool and potentially really idea. I do have some
thoughts/feedback, which I've put in-line below

On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham 
wrote:

>
> 
>

> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
> with a function in a base package, but they follow the superset
> principle (i.e. they only extend the API, as explained to me by
> Hervè Pages).
>
> conflicted assumes that packages adhere to the superset principle,
> which appears to be true in most of the cases that I’ve seen.


It seems that you may be able to strengthen this heuristic from a blanket
assumption to something more narrowly targeted by looking for one or more
of the following to confirm likely-superset adherence

   1. matching or purely extending formals (ie all the named arguments of
   base::fun match including order, and there are new arguments in pkg::fun
   only if base::fun takes ...)
   2. explicit call to  base::fun in the body of pkg::fun
   3. UseMethod(funname) and at least one provided S3 method calls base::fun
   4. S4 generic creation using fun or base::fun as the seeding/default
   method body or called from at least one method



> For
> example, the lubridate package provides `as.difftime()` and `date()`
> which extend the behaviour of base functions, and provides S4
> generics for the set operators.
>
> conflict_scout(c("lubridate", "base"))
> #> 5 conflicts:
> #> * `as.difftime`: [lubridate]
> #> * `date`   : [lubridate]
> #> * `intersect`  : [lubridate]
> #> * `setdiff`: [lubridate]
> #> * `union`  : [lubridate]
>
> There are two popular functions that don’t adhere to this principle:
> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
> special cases so they correctly generate conflicts. (I sure wish I’d
> know about the subset principle when creating dplyr!)
>
> conflict_scout(c("dplyr", "stats"))
> #> 2 conflicts:
> #> * `filter`: dplyr, stats
> #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
> checks for use of `.Deprecated()`. This rule is very useful when
> moving functions from one package to another. For example, many
> devtools functions were moved to usethis, and conflicted ensures
> that you always get the non-deprecated version, regardess of package
> attach order:
>

I would completely believe this rule is useful for refactoring as you
describe, but that is the "same function" case. For an end-user in the
"different function same symbol" case it's not at all clear to me that the
deprecated function should always win.

People sometimes use deprecated functions. It's not great, and eventually
they'll need to fix that for any given case, but imagine if you deprecated
the filter verb in dplyr (I know this will never happen, but I think it's
illustrative none the less).

Consider a piece of code someone wrote before this hypothetical deprecation
of filter. The fact that it's now deprecated certainly doesn't mean that
they secretly wanted stats::filter all along, right? Conflicted acting as
if it does will lead to them getting the exact kind of error you're looking
to protect them from, and with even less ability to understand why because
they are already doing "The right thing" to protect themselves by using
conflicted in the first place...


> Finally, as mentioned above, the user can declare preferences:
>
> conflict_prefer("select", "MASS")
> #> [conflicted] Will prefer MASS::select over any other package
> conflict_scout(c("dplyr", "MASS"))
> #> 1 conflict:
> #> * `select`: [MASS]
>
>
I deeply worry about people putting this kind of thing, or even just
library(conflicted), in their .Rprofile and thus making their scripts
*substantially* less reproducible. Is that a consequence you have thought
about to this kind of functionality?

Best,
~G


> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
Best,
~G

-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposal: more accurate seq(from, to, length=n)

2018-09-07 Thread Gabe Becker
Suharto,

My 2c inline.

On Fri, Sep 7, 2018 at 2:34 PM, Suharto Anggono Suharto Anggono via R-devel
 wrote:

> In R,
> seq(0, 1, 0.1)
> gives the same result as
> (0:10)*0.1.
> It is not the same as
> c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) ,
> as 0.1 is not represented exactly. I am fine with it.
>
> In R,
> seq(0, 1, length=11)
> gives the same result as
> seq(0, 1, 0.1).
> However, for
> seq(0, 1, length=11),
> it is more accurate to return
> c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1) .

It can be obtained by
> (0:10)/10.
>
> When 'from', 'to', and 'length.out' are specified and length.out > 2, I
> propose for function 'seq.default' in R to use something like
> from + ((0:(length.out - 1))/(length.out - 1)) * (to - from)
> instead of something like
> from + (0:(length.out - 1)) * ((to - from)/(length.out - 1)) .
>

In your example case under 3.50 on my system these two expressions give
results which return TRUE from all.equal, which is the accepted way of
comparing non-integer numerics in R for "sameness".

> from = 0

> to = 1

> length.out = 11

> all.equal(from + ((0:(length.out - 1))/(length.out - 1)) * (to - from),
from + (0:(length.out - 1)) * ((to - from)/(length.out - 1)))

[1] TRUE

Given that I'm wondering what the benefit you're looking for here is that
would outweigh the very large set of existing code whose behavior would
technically change  under this change. Then again, it wouldn't change with
respect to the accepted all.equal test, so I guess you could argue that
either there's "no change" or the change is ok?

I'd still like to know what practical problem you're trying to solve
though. if you're looking for the ability to use == to compare non integer
sequences generated different ways, as far as I understand the answer is
that you shouldn't be expecting to be able to do that.

Best,
~G


> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bias in R's random integers?

2018-09-20 Thread Gabe Becker
Hi all,

On Thu, Sep 20, 2018 at 9:30 AM, Paul Gilbert  wrote:
>
>
> There are only two small problems that occur to me:
>
> 1/ Researchers that want to have reproducible results (all I hope) need to
> be aware the change has happened. In theory they should have recorded the
> RNG they were using, along with the seed (and, BTW, the number of nodes if
> they generate with a parallel generator). If they have not done that then
> they can figure out the RNG from knowing what version of R they used. If
> they haven't recorded that then they can figure it out by some
> experimentation and knowing roughly when they did the research. If none of
> this works then the research probably should be lost.
>
> As an exercise, researchers might also want to experiment with whether the
> new default qualitatively changes their results. That might lead to
> publishable research, so no one should complain.
>

I was going to suggest helper/convenience functions for this but looking at
?RNG I see that someone has already put in RNGversion which *sets* the RNG
kind to what was used by default in an older version. I do wonder if there
is still value in a function that would *return* it, e.g. for comparisons.
Perhaps RNGversionstr?

Also, would R-core members be interested in a small patch to sessionInfo()
and print.sessionInfo() which makes it so that the current RNGkind is
captured and displayed (respectively) by the sessionInfo machinery? I can
prepare one if so.

Best,
~G

-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R_ext/Altrep.h should be more C++-friendly

2018-10-09 Thread Gabe Becker
Michael,

Thanks for reaching out. This was brought up by Romaine Francois offline to
me as well. What he does as a workaround is


#define class klass
extern "C" {
  #include 
}
#undef class

While we consider changing Altrep.h, the above should work for you  in the
immediate term.

Let me know if it doesn't.

~G





On Mon, Oct 8, 2018 at 4:17 PM, Michael Sannella via R-devel <
r-devel@r-project.org> wrote:

> I am not able to #include "R_ext/Altrep.h" from a C++ file.  I think
> it needs two changes:
>
> 1. add the same __cplusplus check as most of the other header files:
> #ifdef  __cplusplus
> extern "C" {
> #endif
> ...
> #ifdef  __cplusplus
> }
> #endif
>
> 2. change the line
> R_new_altrep(R_altrep_class_t class, SEXP data1, SEXP data2);
>  to
> R_new_altrep(R_altrep_class_t cls, SEXP data1, SEXP data2);
>  since C++ doesn't like an argument named 'class'
>
>   ~~ Michael Sannella
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] methods(class = class()) - improve for |cl.| > 1 ?

2018-10-19 Thread Gabe Becker
Martin and Kevin,

Perhaps a variant of methods which more directly addresses the use-case
Kevin mentions might be in order?

I am happy to prepare a patch which implements a methodsForObj function
(name very much negotiable), or a third obj argument to methods which takes
the actual object and answers the question "what methods would I hit for an
object just like this one?" directly. Is this something you (Martin, et al
in R core), are interested in and would consider?

I know people can always do methods(class = class(obj)) after the change
being discussed (and contingent on that change what I described could be
trivially implemented that way), but  should they need to?

Best,
~G



On Fri, Oct 19, 2018 at 10:55 AM, Kevin Ushey  wrote:

> I think this would be a good change. I think most users use the
> 'methods(class = <...>)' function to answer the question, "what
> methods can I call on objects with these classes?", and in that
> context I think it would be sensible for the function to accept more
> than one class.
>
> Kevin
>
> On Wed, Oct 17, 2018 at 7:15 AM Martin Maechler
>  wrote:
> >
> > With new "strict" settings in R-devel, the following glm() example
> >
> > > data(anorexia, package = "MASS")
> > > fm <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian,
> data = anorexia)
> > > methods(class = class(fm))
> > Warning in grep(name, row.names(info)) :
> >   argument 'pattern' has length > 1 and only the first element will be
> used
> > Warning in gsub(name, "", row.names(info)) :
> >   argument 'pattern' has length > 1 and only the first element will be
> used
> > Warning in grep(pattern, all.names, value = TRUE) :
> >   argument 'pattern' has length > 1 and only the first element will be
> used
> > Warning in grep(pattern, all.names, value = TRUE) :
> >   argument 'pattern' has length > 1 and only the first element will be
> used
> > ...
> > ...
> > ...
> > [ca. 20 lines of warnings]
> >
> > and then shows the "glm" methods, but not "lm" ones.
> >
> > This is not a bug strictly, as  ?methods says that
> >
> >class: a symbol or character string naming a class: only used if
> >   ‘generic.function’ is not supplied.
> >
> > and so the use of
> >
> >methods(class = class())
> >
> > is a user error when  class()  is of length > 1.
> >
> > In the case of e.g. a randomForest() result, we also get 25
> > warnings, i.e. 50 lines, but then
> >
> > --->>>  no methods found
> >
> > because
> >
> >> class(rf.fit)
> >[1] "randomForest.formula" "randomForest"
> >
> > and no methods are  defined for "randomForest.formula".
> >
> > ---
> >
> > Of course, all this works fine with S4 classes:  There the full
> > inheritance is used and all methods are found.
> >
> > Still, would it make sense to improve the underlying .S3methods() ?
> >
> > I assume it will break *some* overzealous package checks out
> > there when .S3methods() and hence methods() would return *more*
> > in such case.
> >
> > Comments?
> >
> > --
> > Martin Maechler
> > ETH Zurich  and  R Core
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel