Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
> Michael Lawrence 
> on Tue, 30 Jan 2018 15:57:42 -0800 writes:

> I just meant that the minimal contract for as.list() appears to be that it
> returns a VECSXP. To the user, we might say that is.list() will always
> return TRUE.

Indeed. I also agree with Herv'e that the user level
documentation should rather mention  is.list(.) |--> TRUE  than
VECSXP, and interestingly for the experts among us,
the  is.list() primitive gives not only TRUE for  VECSXP  but
also of LISTSXP (the good ole' pairlists).

> I'm not sure we can expect consistency across methods
> beyond that, nor is it feasible at this point to match the
> semantics of the methods package. It deals in "class
> space" while as.list() deals in "typeof() space".

> Michael

Yes, and that *is* the extra complexity we have in R (inherited
from S, I'd say)  which ideally wasn't there and of course is
not there in much younger languages/systems such as julia.

And --- by the way let me preach, for the "class space" ---
do __never__ use

  if(class(obj) == "")

in your code (I see this so often, shockingly to me ...) but rather use

  if(inherits(obj, ""))

instead.

Martin



> On Tue, Jan 30, 2018 at 3:47 PM, Hervé Pagès  wrote:

>> On 01/30/2018 02:50 PM, Michael Lawrence wrote:
>> 
>>> by() does not always return a list. In Gabe's example, it returns an
>>> integer, thus it is coerced to a list. as.list() means that it should 
be a
>>> VECSXP, not necessarily with "list" in the class attribute.
>>> 
>> 
>> The documentation is not particularly clear about what as.list()
>> means for list derivatives. IMO clarifications should stick to
>> simple concepts and formulations like "is.list(x) is TRUE" or
>> "x is a list or a list derivative" rather than "x is a VECSXP".
>> Coercion is useful beyond the use case of implementing a .C entry
>> point and calling as.numeric/as.list/etc... on its arguments.
>> 
>> This is why I was hoping that we could maybe discuss the possibility
>> of making the as.list() contract less vague than just "as.list()
>> must return a list or a list derivative".
>> 
>> Again, I think that 2 things weight quite a lot in that discussion:
>> 1) as.list() returns an object of class "data.frame" on a
>> data.frame (strict coercion). If all what as.list() needed to
>> do was to return a VECSXP, then as.list.default() already does
>> this on a data.frame so why did someone bother adding an
>> as.list.data.frame method that does strict coercion?
>> 2) The S4 coercion system based on as() does strict coercion by
>> default.
>> 
>> H.
>> 
>> 
>>> Michael
>>> 
>>> 
>>> On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès >> > wrote:
>>> 
>>> Hi Gabe,
>>> 
>>> Interestingly the behavior of as.list() on by objects seem to
>>> depend on the object itself:
>>> 
>>> > b1 <- by(1:2, 1:2, identity)
>>> > class(as.list(b1))
>>> [1] "list"
>>> 
>>> > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
>>> > class(as.list(b2))
>>> [1] "by"
>>> 
>>> This is with R 3.4.3 and R devel (2017-12-11 r73889).
>>> 
>>> H.
>>> 
>>> On 01/30/2018 02:33 PM, Gabriel Becker wrote:
>>> 
>>> Dario,
>>> 
>>> What version of R are you using. In my mildly old 3.4.0
>>> installation and in the version of Revel I have lying around
>>> (also mildly old...)  I don't see the behavior I think you are
>>> describing
>>> 
>>> > b = by(1:2, 1:2, identity)
>>> 
>>> > class(as.list(b))
>>> 
>>> [1] "list"
>>> 
>>> > sessionInfo()
>>> 
>>> R Under development (unstable) (2017-12-19 r73926)
>>> 
>>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
>>> 
>>> Running under: OS X El Capitan 10.11.6
>>> 
>>> 
>>> Matrix products: default
>>> 
>>> BLAS:
>>> /Users/beckerg4/local/Rdevel/R
>>> .framework/Versions/3.5/Resources/lib/libRblas.dylib
>>> 
>>> LAPACK:
>>> /Users/beckerg4/local/Rdevel/R
>>> .framework/Versions/3.5/Resources/lib/libRlapack.dylib
>>> 
>>> 
>>> locale:
>>> 
>>> [1]
>>> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> 
>>> attached base packages:
>>> 
>>> [1] stats graphics  grDevices utils datasets
>>> methods   base
>>> 
>>> 
>>> loaded via a namespace (and not attached):
>>> 
>>> [1] compiler_3.5.0
>>> 
>>> >
>>> 
>>> 
>>> As for by not having a class definition, no S3 class has an
>>> explicit definition, so this is somewhat par for the course
>>> here...
>>> 
>>> did I misunderstand something?
>>> 
>>> 
>>> ~G
>>> 
>>> On Tue, Jan 30, 2018 at 2:24 PM, Hervé Pagès
>>> mailto:hpa...@fredhutc

Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
> Michael Lawrence 
> on Tue, 30 Jan 2018 10:37:38 -0800 writes:

> I agree that it would make sense for the object to have c("by", "list") as
> its class attribute, since the object is known to behave as a list.

Well, but that (list behavior) applies to most non-simple S3
classed objects, say "data.frame", say "lm" to start with real basic ones.

The later part of the discussion, seems more relevant to me.
Adding "list" to the class attribute seems as wrong to me as
e.g. adding "double" to "Date" or "POSIXct" (and many more such cases).

For the present case, we should stay with focusing on  is.list()
being true after as.list() .. the same we would do with
as.numeric() and is.numeric().

Martin

> However, it would may be too disruptive to make this change at this point.
> Hard to predict.

> Michael

> On Mon, Jan 29, 2018 at 5:00 PM, Dario Strbenac 

> wrote:

>> Good day,
>> 
>> I'd like to suggest the addition of an as.list method for a by object 
that
>> actually returns a list of class "list". This would make it safer to do
>> type-checking, because is.list also returns TRUE for a data.frame 
variable
>> and using class(result) == "list" is an alternative that only returns 
TRUE
>> for lists. It's also confusing initially that
>> 
>> > class(x)
>> [1] "by"
>> > is.list(x)
>> [1] TRUE
>> 
>> since there's no explicit class definition for "by" and no mention if it
>> has any superclasses.
>> 
>> --
>> Dario Strbenac
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

> [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Error message: 'Rscript' should not be used without a path

2018-02-01 Thread Michal Burda
Dear R-devel members,

recently, I ran into the following error message (R-devel 2018-01-31):

'Rscript' should not be used without a path -- see par. 1.6 of the manual

I would like to know more about it, why is it required to run Rscript with
a path, and where is that par. 1.6 of the manual.

I get this error message during Travis r-devel build of my package for
generating makefiles. I am developing a makefile generator package, which
contains testthat unit tests that generate and run various makefiles in
/tmp. These makefiles run several "Rscript -e" commands. Everything works
OK on R-stable on Linux as well as on Windows, the only problem is with
R-devel on that Travis cloud builder. Could someone give me more
information about that error? Is there any workaround or do I really need
to obtain somehow the full path of Rscript and put it into the makefiles
(as it may be tricky for such makefile work on linux, macOs and Windows)?

Thanks, in advance.


Michal Burda

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Error message: 'Rscript' should not be used without a path

2018-02-01 Thread Tomas Kalibera

Hi Michal,

On 02/01/2018 09:23 AM, Michal Burda wrote:

Dear R-devel members,

recently, I ran into the following error message (R-devel 2018-01-31):

'Rscript' should not be used without a path -- see par. 1.6 of the manual

I would like to know more about it, why is it required to run Rscript with
a path, and where is that par. 1.6 of the manual.

The manual is "Writing R Extensions"
https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Writing-portable-packages

"
Do not invoke R by plain R, Rscript or (on Windows) Rterm in your 
examples, tests, vignettes, makefiles or other scripts. As pointed out 
in several places earlier in this manual, use something like

"$(R_HOME)/bin/Rscript"
"$(R_HOME)/bin$(R_ARCH_BIN)/Rterm"
with appropriate quotes (as, although not recommended, R_HOME can 
contain spaces).

"

This is needed to make sure that one does not run Rscript from a 
different version of R installed in the system. The quotes are important 
and it works on all platforms supported by R.


(for similar questions perhaps R-package-devel is a bit better list)

Best
Tomas



I get this error message during Travis r-devel build of my package for
generating makefiles. I am developing a makefile generator package, which
contains testthat unit tests that generate and run various makefiles in
/tmp. These makefiles run several "Rscript -e" commands. Everything works
OK on R-stable on Linux as well as on Windows, the only problem is with
R-devel on that Travis cloud builder. Could someone give me more
information about that error? Is there any workaround or do I really need
to obtain somehow the full path of Rscript and put it into the makefiles
(as it may be tricky for such makefile work on linux, macOs and Windows)?

Thanks, in advance.


Michal Burda

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Duncan Murdoch

On 31/01/2018 6:59 AM, Duncan Murdoch wrote:

On 30/01/2018 11:39 PM, Hadley Wickham wrote:

 [ lots deleted ]

Personally, I don't find writing in comments any harder than writing
in .Rd files, especially now that you can write in markdown and have
it automatically translated to Rd formatting commands.


I didn't know about the possibility of Markdown.  That's a good thing.
You didn't say what editor you use, but RStudio is a good guess, and it
also makes it easier to write in comments.


I've taken a look at the Markdown support, and I think that is 
fantastic.  I'd rather it wasn't inline in the .R file (does it have to 
be?), but I'd say it tips the balance, and I'll certainly experiment 
with using that for new projects.


The only negative I see besides forcing inline docs is pretty minor:  I 
can see that supporting Rd markup within the Markdown text will on rare 
occasions cause lots of confusion (because users won't know why their 
backslashes are doing funny things).  I'd suggest that (at least 
optionally) you should escape anything that looks like Rd markup, so a 
user can put text like \item into the middle of a paragraph and not have 
the Rd parser see it.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Joris Meys
On Thu, Feb 1, 2018 at 1:29 PM, Duncan Murdoch 
wrote:

> On 31/01/2018 6:59 AM, Duncan Murdoch wrote:
>
>> On 30/01/2018 11:39 PM, Hadley Wickham wrote:
>>
>  [ lots deleted ]
>
>> Personally, I don't find writing in comments any harder than writing
>>> in .Rd files, especially now that you can write in markdown and have
>>> it automatically translated to Rd formatting commands.
>>>
>>
>> I didn't know about the possibility of Markdown.  That's a good thing.
>> You didn't say what editor you use, but RStudio is a good guess, and it
>> also makes it easier to write in comments.
>>
>
> I've taken a look at the Markdown support, and I think that is fantastic.
> I'd rather it wasn't inline in the .R file (does it have to be?), but I'd
> say it tips the balance, and I'll certainly experiment with using that for
> new projects.
>

You don't have to put the Rmarkdown in the .R file of the function, there
are ways to keep them in separate files. But keeping them in the same file
does make it easier for Rmarkdown to eg generate the correct usage section
and use the correct Rd makeup etc. At least that's my understanding of it.
Hadley will hopefully correct me if I'm wrong.  I haven't checked all the
options and possibilities yet in the latest iterations of the package.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Georgi Boshnakov
It is indeed a matter of what the developer is comfortable with and the 
one-stop solution provided by devtools is difficult to beat. 
This may also vary across projects. I use EMACS/ESS with and without roxygen2. 
In some cases EMACS/ESS+Org mode provides stunning benefits.

Updating "usage" statements in Rd files was mentioned several times. 
Rdpack::reprompt() does this and more for functions, methods and classes. 


Georgi Boshnakov

--
Date: Wed, 31 Jan 2018 07:53:18 -0800
From: Michael Lawrence 
To: Duncan Murdoch 
Cc: "Brian G. Peterson" , "Suzen, Mehmet"
, R-devel 
Subject: Re: [Rd] Best practices in developing package: From a single
file
Message-ID:

Content-Type: text/plain; charset="UTF-8"

I pretty much agree. I tried using roxygen when it was first released but
couldn't stand putting documentation in comments, especially for complex,
S4-based software. Rd is easy to read and write and lets me focus on the
task of writing documentation (focus is the hardest part of any task for
me). Probably the best feature of roxygen is that it automatically
generates \usage{}, which is otherwise completely redundant with the code.

I think the preceeding systems like doxygen, javadoc, gtk-doc, qtdoc, etc,
found a nice compromise through templating, where the bulk of the details
are written into the template, and just the essentials (usage, arguments,
return value) were embedded in the source file. I think this is even more
important for R, since we're often describing complex algorithms, while
most C/C++/Java software is oriented complex classes containing many
relatively simple methods.

Michael


On Tue, Jan 30, 2018 at 11:53 AM, Duncan Murdoch 
wrote:

> On 30/01/2018 11:29 AM, Brian G. Peterson wrote:
>
>> On Tue, 2018-01-30 at 17:00 +0100, Suzen, Mehmet wrote:
>>
>>> Dear R developers,
>>>
>>> I am wondering what are the best practices for developing an R
>>> package. I am aware of Hadley Wickham's best practice
>>> documentation/book (http://r-pkgs.had.co.nz/).  I recall a couple of
>>> years ago there were some tools for generating a package out of a
>>> single file, such as using package.skeleton, but no auto-generated
>>> documentation. Do you know a way to generate documentation and a
>>> package out of single R source file, or from an environment?
>>>
>>
>> Mehmet,
>>
>> This list is for development of the R language itself and closely
>> related tools.  There is a separate list, R-pkg-devel, for development
>> of packages.
>>
>> Since you're here, I'll try to answer your question.
>>
>> package.skeleton can create a package from all the R functions in a
>> specified environment.  So if you load all the functions that you want
>> in your new package into your R environment, then call
>> package.skeleton, you'll have a starting point.
>>
>> At that point, I would probably recommend moving to RStudio, and using
>> RStudio to generate markdown comments for roxygen for all your newly
>> created function files.  Then you could finish off the documentation by
>> writing it in these roxygen skeletons or copying and pasting from
>> comments in your original code files.
>>
>
> I'd agree about moving to RStudio, but I think Roxygen is the wrong
> approach for documentation.  package.skeleton() will have done the boring
> mechanical part of setting up your .Rd files; all you have to do is edit
> some content into them.  (Use prompt() to add a new file if you add a new
> function later, don't run package.skeleton() again.)
>
> This isn't the fashionable point of view, but I think it is easier to get
> good documentation that way than using Roxygen.  (It's easier to get bad
> documentation using Roxygen, but who wants that?)
>
> The reason I think this is that good documentation requires work and
> thought.  You need to think about the markup that will get your point
> across, you need to think about putting together good examples, etc.
> This is *harder* in Roxygen than if you are writing Rd files, because
> Roxygen is a thin front end to produce Rd files from comments in your .R
> files.  To get good stuff in the help page, you need just as much work as
> in writing the .Rd file directly, but then you need to add another layer on
> top to put in in a comment.  Most people don't bother.
>
> I don't know any packages with what I'd consider to be good documentation
> that use Roxygen.  It's just too easy to write minimal documentation that
> passes checks, so Roxygen users don't keep refining it.
>
> (There are plenty of examples of packages that write bad documentation
> directly to .Rd as well.  I just don't know of examples of packages with
> good documentation that use Roxygen.)
>
> Based on my criticism last week of git and Github, I expect to be called a
> grumpy old man for holding this point of view.  I'd actually like to be
> proven wrong.  So to anyone who disagrees with me:  rather than just
> calling me names, how about some examples of Roxygen-usi

Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Lionel Henry
On 31 janv. 2018, at 09:08, Gabriel Becker  wrote:

> it *actively discourages* the bits it doesn't directly support.

It may be discouraging to include Rd syntax in roxygen docs but only
because the LaTeX-like syntax of Rd is burdensome, not because of
roxygen. It is still handy to have inlined Rd as a backup and we do
use it for the cases where we need finer grained control.

I agree with your sentiment that roxygen encourages writing of
documentation for time-constrained users.

I'll add that the major problem of documentation is not fancy
formatting but the content getting out of sync with the codebase.
Having documentation sitting next to the code is the preferred
antidote to doc rot, e.g. docstrings in lisp languages, Julia and
Python, the Linux kernel-doc system, doxygen, javadoc, ...
It is true that R CMD check extensive checks help a lot as well in
this regard though only for things that can be checked automatically.

Best,
Lionel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list method for by Objects

2018-02-01 Thread Michael Lawrence
On Thu, Feb 1, 2018 at 1:21 AM, Martin Maechler 
wrote:

> > Michael Lawrence 
> > on Tue, 30 Jan 2018 10:37:38 -0800 writes:
>
> > I agree that it would make sense for the object to have c("by",
> "list") as
> > its class attribute, since the object is known to behave as a list.
>
> Well, but that (list behavior) applies to most non-simple S3
> classed objects, say "data.frame", say "lm" to start with real basic ones.
>
> The later part of the discussion, seems more relevant to me.
> Adding "list" to the class attribute seems as wrong to me as
> e.g. adding "double" to "Date" or "POSIXct" (and many more such cases).
>
>
There's a distinction though. Date and POSIXct should not really behave as
double values (an implementation detail), but "by" is expected to behave as
a list (when it is one).

For the present case, we should stay with focusing on  is.list()
> being true after as.list() .. the same we would do with
> as.numeric() and is.numeric().
>
> Martin
>
> > However, it would may be too disruptive to make this change at this
> point.
> > Hard to predict.
>
> > Michael
>
> > On Mon, Jan 29, 2018 at 5:00 PM, Dario Strbenac <
> dstr7...@uni.sydney.edu.au>
> > wrote:
>
> >> Good day,
> >>
> >> I'd like to suggest the addition of an as.list method for a by
> object that
> >> actually returns a list of class "list". This would make it safer
> to do
> >> type-checking, because is.list also returns TRUE for a data.frame
> variable
> >> and using class(result) == "list" is an alternative that only
> returns TRUE
> >> for lists. It's also confusing initially that
> >>
> >> > class(x)
> >> [1] "by"
> >> > is.list(x)
> >> [1] TRUE
> >>
> >> since there's no explicit class definition for "by" and no mention
> if it
> >> has any superclasses.
> >>
> >> --
> >> Dario Strbenac
> >> University of Sydney
> >> Camperdown NSW 2050
> >> Australia
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
>
> > [[alternative HTML version deleted]]
>
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Duncan Murdoch

On 01/02/2018 8:17 AM, Georgi Boshnakov wrote:

It is indeed a matter of what the developer is comfortable with and the 
one-stop solution provided by devtools is difficult to beat.
This may also vary across projects. I use EMACS/ESS with and without roxygen2. 
In some cases EMACS/ESS+Org mode provides stunning benefits.

Updating "usage" statements in Rd files was mentioned several times.
Rdpack::reprompt() does this and more for functions, methods and classes.


Thanks for pointing that out (and for writing it)!  I had forgotten 
about your package.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fwd: Re: Best practices in developing package:

2018-02-01 Thread Therneau, Terry M., Ph.D.

I'm not going to force anyone to use roxygen2. But I personally find it
easier to have the function right below the documentation, so that any
change to the function can immediately be documented as well. You prefer to
do this by keeping that strictly separated, which is absolutely fine. It's
just not my prefered workflow. Different animal, different habits I guess.



Lest Duncan be left standing all alone let me say that I am another who really 
dislikes
the roxygen style.  Joris' comment above is a key one though; the goal is good
documentation and any tool to that end is just a tool.  One driver for my 
preferences is
that I don't like editing large files, e.g., in the survival package every 
function is a
separate file.  A second is that I care a lot about documentation so my help 
files are
fairly long, so much so that the advantage of having the documentation of an 
argument
"close" to the declaration of said argument is lost.

The closeness argument works best when the documentation for each argument is a 
terse half
sentence, and in that sense roxygen encourages minimalist documentation.  But 
the real
issue with poor documentation is the orneriness of the writers: good 
documentation is hard
work and most don't bother.  For most R packages lines of code > lines of 
documentation >
lines of test suite, usually by a factor of 10 at each stage.  One of my goals 
for the
survival package has been to make them more equal with 'lines of documentation' 
the
largest.  I'm getting closer: currently 17395 lines in the R subdirectory, 8042 
+ 8841 in
man + vignettes, and 6000 in test.

A challenge for someone who is better at document analysis than me: what is the 
distribution
of the ratios above, across CRAN packages?  Is my 10:1 impression optimistic?

 Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fwd: Re: Best practices in developing package:

2018-02-01 Thread Lionel Henry

> On 1 févr. 2018, at 06:51, Therneau, Terry M., Ph.D.  
> wrote:
> 
> A second is that I care a lot about documentation so my help files are
> fairly long, so much so that the advantage of having the documentation of an 
> argument
> "close" to the declaration of said argument is lost.

Good point. It suggests editors need folding support for roxygen sections.

Lionel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Duncan Murdoch

On 01/02/2018 7:44 AM, Joris Meys wrote:



On Thu, Feb 1, 2018 at 1:29 PM, Duncan Murdoch > wrote:


On 31/01/2018 6:59 AM, Duncan Murdoch wrote:

On 30/01/2018 11:39 PM, Hadley Wickham wrote:

  [ lots deleted ]

Personally, I don't find writing in comments any harder than
writing
in .Rd files, especially now that you can write in markdown
and have
it automatically translated to Rd formatting commands.


I didn't know about the possibility of Markdown.  That's a good
thing.
You didn't say what editor you use, but RStudio is a good guess,
and it
also makes it easier to write in comments.


I've taken a look at the Markdown support, and I think that is
fantastic.  I'd rather it wasn't inline in the .R file (does it have
to be?), but I'd say it tips the balance, and I'll certainly
experiment with using that for new projects.


You don't have to put the Rmarkdown in the .R file of the function, 
there are ways to keep them in separate files. But keeping them in the 
same file does make it easier for Rmarkdown to eg generate the correct 
usage section and use the correct Rd makeup etc. At least that's my 
understanding of it. Hadley will hopefully correct me if I'm wrong.  I 
haven't checked all the options and possibilities yet in the latest 
iterations of the package.


I don't see that in the Roxygen2 docs, so hopefully it is possible, and 
someone will point out how it's done.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31

2018-02-01 Thread Martin Maechler
> Hervé Pagès 
> on Tue, 30 Jan 2018 13:30:18 -0800 writes:

> Hi Martin, Henrik,
> Thanks for the follow up.

> @Martin: I vote for 2) without *any* hesitation :-)

> (and uniformity could be restored at some point in the
> future by having prod(), rowSums(), colSums(), and others
> align with the behavior of length() and sum())

As a matter of fact, I had procrastinated and worked at
implementing '2)' already a bit on the weekend and made it work
- more or less.  It needs a bit more work, and I had also been considering
replacing the numbers in the current overflow check

if (ii++ > 1000) {   \
ii = 0; \
if (s > 9000L || s < -9000L) {  \
if(!updated) updated = TRUE;\
*value = NA_INTEGER;\
warningcall(call, _("integer overflow - use 
sum(as.numeric(.))")); \
return updated; \
}   \
}   \

i.e. think of tweaking the '1000' and '9000L', 
but decided to leave these and add comments there about why. For
the moment.
They may look arbitrary, but are not at all: If you multiply
them (which looks correct, if we check the sum 's' only every 1000-th
time ...((still not sure they *are* correct))) you get  9*10^18
which is only slightly smaller than  2^63 - 1 which may be the
maximal "LONG_INT" integer we have.

So, in the end, at least for now, we do not quite go all they way
but overflow a bit earlier,... but do potentially gain a bit of
speed, notably with the ITERATE_BY_REGION(..) macros
(which I did not show above).

Will hopefully become available in R-devel real soon now.

Martin

> Cheers,
> H.


> On 01/27/2018 03:06 AM, Martin Maechler wrote:
>>> Henrik Bengtsson 
>>> on Thu, 25 Jan 2018 09:30:42 -0800 writes:
>> 
>> > Just following up on this old thread since matrixStats 0.53.0 is now
>> > out, which supports this use case:
>> 
>> >> x <- rep(TRUE, times = 2^31)
>> 
>> >> y <- sum(x)
>> >> y
>> > [1] NA
>> > Warning message:
>> > In sum(x) : integer overflow - use sum(as.numeric(.))
>> 
>> >> y <- matrixStats::sum2(x, mode = "double")
>> >> y
>> > [1] 2147483648
>> >> str(y)
>> > num 2.15e+09
>> 
>> > No coercion is taking place, so the memory overhead is zero:
>> 
>> >> profmem::profmem(y <- matrixStats::sum2(x, mode = "double"))
>> > Rprofmem memory profiling of:
>> > y <- matrixStats::sum2(x, mode = "double")
>> 
>> > Memory allocations:
>> > bytes calls
>> > total 0
>> 
>> > /Henrik
>> 
>> Thank you, Henrik, for the reminder.
>> 
>> Back in June, I had mentioned to Hervé and R-devel that
>> 'logical' should remain to be treated as 'integer' as in all
>> arithmetic in (S and) R. Hervé did mention the isum()
>> function in the C code which is relevant here .. which does have
>> a LONG INT counter already -- *but* if we consider that sum()
>> has '...' i.e. a conceptually arbitrary number of long vector
>> integer arguments that counter won't suffice even there.
>> 
>> Before talking about implementation / patch, I think we should
>> consider 2 possible goals of a change --- I agree the status quo
>> is not a real option
>> 
>> 1) sum(x) for logical and integer x  would return a double
>> in any case and overflow should not happen (unless for
>> the case where the result would be larger the
>> .Machine$double.max which I think will not be possible
>> even with "arbitrary" nargs() of sum.
>> 
>> 2) sum(x) for logical and integer x  should return an integer in
>> all cases there is no overflow, including returning
>> NA_integer_ in case of NAs.
>> If there would be an overflow it must be detected "in time"
>> and the result should be double.
>> 
>> The big advantage of 2) is that it is back compatible in 99.x %
>> of use cases, and another advantage that it may be a very small
>> bit more efficient.  Also, in the case of "counting" (logical),
>> it is nice to get an integer instead of double when we can --
>> entirely analogously to the behavior of length() which returns
>> integer whenever possible.
>> 
>> The advantage of 1) is uniformity.
>> 
>> We should (at least provisionally) decide between 1) and 2) and then go 
for that.
>> It could be that going for 1) may have bad
>> compatibility-consequences in package space, because indeed we
>> had documented sum() would be integer for logical and integer arguments.
>> 
>> I currently 

Re: [Rd] as.list method for by Objects

2018-02-01 Thread Martin Maechler
> Michael Lawrence 
> on Thu, 1 Feb 2018 06:12:20 -0800 writes:

> On Thu, Feb 1, 2018 at 1:21 AM, Martin Maechler 

> wrote:

>> > Michael Lawrence 
>> > on Tue, 30 Jan 2018 10:37:38 -0800 writes:
>> 
>> > I agree that it would make sense for the object to have c("by",
>> "list") as
>> > its class attribute, since the object is known to behave as a list.
>> 
>> Well, but that (list behavior) applies to most non-simple S3
>> classed objects, say "data.frame", say "lm" to start with real basic 
ones.
>> 
>> The later part of the discussion, seems more relevant to me.
>> Adding "list" to the class attribute seems as wrong to me as
>> e.g. adding "double" to "Date" or "POSIXct" (and many more such cases).
>> 
>> 
> There's a distinction though. Date and POSIXct should not really behave as
> double values (an implementation detail), but "by" is expected to behave 
as
> a list (when it is one).

yes, you are right  As I'm "never"(*) using by(), I'm glad
to leave this issue to you.

Martin

---
*) Never  [James Bond, 1983]

> For the present case, we should stay with focusing on  is.list()
>> being true after as.list() .. the same we would do with
>> as.numeric() and is.numeric().
>> 
>> Martin
>> 
>> > However, it would may be too disruptive to make this change at this
>> point.
>> > Hard to predict.
>> 
>> > Michael
>> 
>> > On Mon, Jan 29, 2018 at 5:00 PM, Dario Strbenac <
>> dstr7...@uni.sydney.edu.au>
>> > wrote:
>> 
>> >> Good day,
>> >>
>> >> I'd like to suggest the addition of an as.list method for a by
>> object that
>> >> actually returns a list of class "list". This would make it safer
>> to do
>> >> type-checking, because is.list also returns TRUE for a data.frame
>> variable
>> >> and using class(result) == "list" is an alternative that only
>> returns TRUE
>> >> for lists. It's also confusing initially that
>> >>
>> >> > class(x)
>> >> [1] "by"
>> >> > is.list(x)
>> >> [1] TRUE
>> >>
>> >> since there's no explicit class definition for "by" and no mention
>> if it
>> >> has any superclasses.
>> >>
>> >> --
>> >> Dario Strbenac
>> >> University of Sydney
>> >> Camperdown NSW 2050
>> >> Australia
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >>
>> 
>> > [[alternative HTML version deleted]]
>> 
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 

> [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fwd: Re: Best practices in developing package:

2018-02-01 Thread Michael Lawrence
Folding is a simple solution, but there are intrinsic problems, like the
need to embed the documentation in comments. If the user already has to
expand a fold to edit the docs, the IDE could instead just provide a link
or shortcut that jumps to a separate documentation file, written in
whatever language, Rd, markdown, docbook. For example, I could imagine
RStudio showing the rendered documentation in a side pane when the cursor
is on the function name/signature, and the user could somehow switch modes
to edit it. But there would be no need to mix two different languages in
the same file, and thus no ugly escaping, and no documentation obscuring
the code, or vice versa.

On Thu, Feb 1, 2018 at 7:20 AM, Lionel Henry  wrote:

>
> > On 1 févr. 2018, at 06:51, Therneau, Terry M., Ph.D. 
> wrote:
> >
> > A second is that I care a lot about documentation so my help files are
> > fairly long, so much so that the advantage of having the documentation
> of an argument
> > "close" to the declaration of said argument is lost.
>
> Good point. It suggests editors need folding support for roxygen sections.
>
> Lionel
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Hadley Wickham
On Thu, Feb 1, 2018 at 4:29 AM, Duncan Murdoch  wrote:
> On 31/01/2018 6:59 AM, Duncan Murdoch wrote:
>>
>> On 30/01/2018 11:39 PM, Hadley Wickham wrote:
>
>  [ lots deleted ]
>>>
>>> Personally, I don't find writing in comments any harder than writing
>>> in .Rd files, especially now that you can write in markdown and have
>>> it automatically translated to Rd formatting commands.
>>
>>
>> I didn't know about the possibility of Markdown.  That's a good thing.
>> You didn't say what editor you use, but RStudio is a good guess, and it
>> also makes it easier to write in comments.
>
>
> I've taken a look at the Markdown support, and I think that is fantastic.
> I'd rather it wasn't inline in the .R file (does it have to be?), but I'd
> say it tips the balance, and I'll certainly experiment with using that for
> new projects.

Please do let me know how it goes - often a fresh set of eyes reveals
problems that an experienced user is blind to.

> The only negative I see besides forcing inline docs is pretty minor:  I can
> see that supporting Rd markup within the Markdown text will on rare
> occasions cause lots of confusion (because users won't know why their
> backslashes are doing funny things).  I'd suggest that (at least optionally)
> you should escape anything that looks like Rd markup, so a user can put text
> like \item into the middle of a paragraph and not have the Rd parser see it.

Yes, that would certainly be nice. It's a little challenging because
we're using the commonmark parser, but it should be possible somehow.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Best practices in developing package: From a single file

2018-02-01 Thread Gabriel Becker
On Thu, Feb 1, 2018 at 5:24 AM, Lionel Henry  wrote:

> On 31 janv. 2018, at 09:08, Gabriel Becker  wrote:
>
> > it *actively discourages* the bits it doesn't directly support.
>
> It may be discouraging to include Rd syntax in roxygen docs but only
> because the LaTeX-like syntax of Rd is burdensome, not because of
> roxygen. It is still handy to have inlined Rd as a backup and we do
> use it for the cases where we need finer grained control.
>

I only somewhat agree with this. Part of it is the Rd specifically, I
agree, but part of it is just the fact that it is a different syntax at
all. People who write roxygen documentation tend to think about and write
it in roxygen, I think. Any switch out to another syntax, thus introducing
two syntaxes side-by-side, is discouraged by the very fact that they are
thinking in roxygen comments.

Again, this is a "discouragement", not a disallowing. I know that people
who care deeply about writing absolutely top notch documentation, and who
also use roxygen will do the switch when called for, but the path of least
resistance, i.e. the pattern of behavior that is *encouraged* by roxygen2
is to not do that, and simply write documentation using only the supported
roxygen2 tags. I'm not saying this makes the system bad, per se. As I
pointed out, I use it in many of my packages (and it was my choice to do
so, not because I inherited code from someone who already did), but
pretending it doesn't encourage certain types of behavior doesn't seem like
the right way to go either.


>
> I agree with your sentiment that roxygen encourages writing of
> documentation for time-constrained users.
>

I do think it does that, but that was really only half of what I said, I
said it encourages time constrained users to write middling (i.e. not
great) documentation. Another person pointed out that structurally it
really encourages terseness in the explanations of parameters, which I
think is very true and have heard independently from others when talking
about it as well. This is again not a requirement, but it is a real thing.


>
> I'll add that the major problem of documentation is not fancy
> formatting but the content getting out of sync with the codebase.
> Having documentation sitting next to the code is the preferred
> antidote to doc rot, e.g. docstrings in lisp languages, Julia and
> Python, the Linux kernel-doc system, doxygen, javadoc, ...
>

I mean, it is *an *antidote to doc rot. And sure, one that is used
elsewhere. You could easily imagine one that didn't require it though.
Perhaps doc files associated with objects (including closures) could embed
a hash of the object they document, then you could see which things have
changed since the documentation was updated and look at which documentation
is still ok and which needs updating. That's just off the top of my head,
I'm sure you could make the detection much more sophisticated.

Or perhaps you could imagine two help systems, akin to --help and man for
command line tools, one of which is minimalist showing usage, etc,
generated by roxygen comments, and one of which is much more extensive, and
not tied to (what could be extremely large) comments in the same .R file as
the code.

Best,
~G


> It is true that R CMD check extensive checks help a lot as well in
> this regard though only for things that can be checked automatically.
>
> Best,
> Lionel
>
>


-- 
Gabriel Becker, PhD
Scientist (Bioinformatics)
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] as.list method for by Objects

2018-02-01 Thread Henrik Bengtsson
On Thu, Feb 1, 2018 at 12:14 AM, Martin Maechler
 wrote:
>> Michael Lawrence 
>> on Tue, 30 Jan 2018 15:57:42 -0800 writes:
>
> > I just meant that the minimal contract for as.list() appears to be that 
> it
> > returns a VECSXP. To the user, we might say that is.list() will always
> > return TRUE.
>
> Indeed. I also agree with Herv'e that the user level
> documentation should rather mention  is.list(.) |--> TRUE  than
> VECSXP, and interestingly for the experts among us,
> the  is.list() primitive gives not only TRUE for  VECSXP  but
> also of LISTSXP (the good ole' pairlists).
>
> > I'm not sure we can expect consistency across methods
> > beyond that, nor is it feasible at this point to match the
> > semantics of the methods package. It deals in "class
> > space" while as.list() deals in "typeof() space".
>
> > Michael
>
> Yes, and that *is* the extra complexity we have in R (inherited
> from S, I'd say)  which ideally wasn't there and of course is
> not there in much younger languages/systems such as julia.
>
> And --- by the way let me preach, for the "class space" ---
> do __never__ use
>
>   if(class(obj) == "")
>
> in your code (I see this so often, shockingly to me ...) but rather use
>
>   if(inherits(obj, ""))
>
> instead.

Second this one.  But, soon (*) the former will at least give the
correct answer when length(class(obj)) == 1 and produce an error
otherwise.  So, several of these cases will be caught at run-time in a
near future.

(*) When _R_CHECK_LENGTH_1_CONDITION_=true becomes the default
behavior - hopefully by R 3.5.0.

>
> Martin
>
>
>
> > On Tue, Jan 30, 2018 at 3:47 PM, Hervé Pagès  
> wrote:
>
> >> On 01/30/2018 02:50 PM, Michael Lawrence wrote:
> >>
> >>> by() does not always return a list. In Gabe's example, it returns an
> >>> integer, thus it is coerced to a list. as.list() means that it should 
> be a
> >>> VECSXP, not necessarily with "list" in the class attribute.
> >>>
> >>
> >> The documentation is not particularly clear about what as.list()
> >> means for list derivatives. IMO clarifications should stick to
> >> simple concepts and formulations like "is.list(x) is TRUE" or
> >> "x is a list or a list derivative" rather than "x is a VECSXP".
> >> Coercion is useful beyond the use case of implementing a .C entry
> >> point and calling as.numeric/as.list/etc... on its arguments.
> >>
> >> This is why I was hoping that we could maybe discuss the possibility
> >> of making the as.list() contract less vague than just "as.list()
> >> must return a list or a list derivative".
> >>
> >> Again, I think that 2 things weight quite a lot in that discussion:
> >> 1) as.list() returns an object of class "data.frame" on a
> >> data.frame (strict coercion). If all what as.list() needed to
> >> do was to return a VECSXP, then as.list.default() already does
> >> this on a data.frame so why did someone bother adding an
> >> as.list.data.frame method that does strict coercion?
> >> 2) The S4 coercion system based on as() does strict coercion by
> >> default.
> >>
> >> H.
> >>
> >>
> >>> Michael
> >>>
> >>>
> >>> On Tue, Jan 30, 2018 at 2:41 PM, Hervé Pagès  >>> > wrote:
> >>>
> >>> Hi Gabe,
> >>>
> >>> Interestingly the behavior of as.list() on by objects seem to
> >>> depend on the object itself:
> >>>
> >>> > b1 <- by(1:2, 1:2, identity)
> >>> > class(as.list(b1))
> >>> [1] "list"
> >>>
> >>> > b2 <- by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
> >>> > class(as.list(b2))
> >>> [1] "by"
> >>>
> >>> This is with R 3.4.3 and R devel (2017-12-11 r73889).
> >>>
> >>> H.
> >>>
> >>> On 01/30/2018 02:33 PM, Gabriel Becker wrote:
> >>>
> >>> Dario,
> >>>
> >>> What version of R are you using. In my mildly old 3.4.0
> >>> installation and in the version of Revel I have lying around
> >>> (also mildly old...)  I don't see the behavior I think you are
> >>> describing
> >>>
> >>> > b = by(1:2, 1:2, identity)
> >>>
> >>> > class(as.list(b))
> >>>
> >>> [1] "list"
> >>>
> >>> > sessionInfo()
> >>>
> >>> R Under development (unstable) (2017-12-19 r73926)
> >>>
> >>> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> >>>
> >>> Running under: OS X El Capitan 10.11.6
> >>>
> >>>
> >>> Matrix products: default
> >>>
> >>> BLAS:
> >>> /Users/beckerg4/local/Rdevel/R
> >>> .framework/Versions/3.5/Resources/lib/libRblas.dylib
> >>>
> >>> LAPACK:
> >>> /Users/beckerg4/local/Rdevel/R
> >>> .framework/Versions/3.5/Resources/lib/libRlapack.dylib
> >>>
> >>>
> >>> locale:
> >>>
> >>> [1]
> >>> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >>>
>