Re: [Rd] Use of C++ in Packages
Some of us are learning about development in R and use R in our work data analysis pipelines. What is the best way to identify packages that currently have these C++ problems? I would like to be able to help fix the bugs but more importantly not use these packages in critical work pipelines. Any C++ R package bug squashing events out there? Regards Hugh On Mon, Apr 1, 2019 at 6:23 PM Tomas Kalibera wrote: > On 3/30/19 8:59 AM, Romain Francois wrote: > > tl;dr: we need better C++ tools and documentation. > > > > We collectively know more now with the rise of tools like rchk and > improved documentation such as Tomas’s post. That’s a start, but it appears > that there still is a lot of knowledge that would deserve to be promoted to > actual documentation of best practices. > Well there is quite a bit of knowledge in Writing R Extensions and many > problems could have been prevented had it been read more thoroughly by > package developers. The problem that C++ runs some functions > automatically (like destructors), should not be too hard to identify > based on what WRE says about the need for protection against garbage > collection. > > From my experience, one can learn most about R internals from debugging > and reading source code - when debugging PROTECT errors and other memory > errors/memory corruption, common problems caused by bugs in native C/C++ > code - one needs to read and understand source code involved at all > layers, one needs to understand the documentation covering code at > different layers, and one has to think about these things, forming > hypotheses, narrowing down to smaller examples, etc. > > My suggestion for package authors who write native code and want to > learn more, and who want to be responsible (these kinds of bugs affect > other packaged indirectly and can be woken up by inconsequential and > correct code changes, even in R runtime): test and debug your code hard > - look at UBSAN/ASAN/valgrind/rchk checks from CRAN and run these tools > yourself if needed. Run with strict barrier checking and with gctorture. > Write more tests to increase the coverage. Specifically now if you use > C++ code, try to read all of your related code and check you do not have > the problems I mentioned in my blog. Think of other related problems and > if you find about them, tell others. Make sure you only use the API from > Writing R Extensions (and R help system). If you really can't find > anything wrong about your package, but still want to learn more, try to > debug some bugs reported against R runtime or against your favorite > packages you use (or their CRAN check reports from various tools). In > addition to learning more about R internals, by spending much more time > on debugging you may also get a different perspective on some of the > things about C++ I pointed to. Finally, it would help us with the > problem we have now - that many R packages in C++ have serious bugs. > > Tomas > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] should base R have a piping operator ?
How is your argument different to, say, "Should dplyr or data.table be part of base R as they are the most popular data science packages and they are used by a large number of users?" Kind regards On Sat, Oct 5, 2019 at 4:34 PM Ant F wrote: > Dear R-devel, > > The most popular piping operator sits in the package `magrittr` and is used > by a huge amount of users, and imported /reexported by more and more > packages too. > > Many workflows don't even make much sense without pipes nowadays, so the > examples in the doc will use pipes, as do the README, vignettes etc. I > believe base R could have a piping operator so packages can use a pipe in > their code or doc and stay dependency free. > > I don't suggest an operator based on complex heuristics, instead I suggest > a very simple and fast one (>10 times than magrittr in my tests) : > > `%.%` <- function (e1, e2) { > eval(substitute(e2), envir = list(. = e1), enclos = parent.frame()) > } > > iris %.% head(.) %.% dim(.) > #> [1] 6 5 > > The difference with magrittr is that the dots must all be explicit (which > sits with the choice of the name), and that special magrittr features such > as assignment in place and building functions with `. %>% head() %>% dim()` > are not supported. > > Edge cases are not surprising: > > ``` > x <- "a" > x %.% quote(.) > #> . > x %.% substitute(.) > #> [1] "a" > > f1 <- function(y) function() eval(quote(y)) > f2 <- x %.% f1(.) > f2() > #> [1] "a" > ``` > > Looking forward for your thoughts on this, > > Antoine > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] should base R have a piping operator ?
I exaggerated the comparison for effect. However, it is not very difficult to find functions in dplyr or data.table or indeed other packages that one may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc. Also, the "popularity" of magrittr::`%>%` is mostly attributable to the tidyverse (an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar wrote: > On Sat, 5 Oct 2019 at 17:15, Hugh Marera wrote: > > > > How is your argument different to, say, "Should dplyr or data.table be > > part of base R as they are the most popular data science packages and > they > > are used by a large number of users?" > > Two packages with many features, dozens of functions and under heavy > development to fix bugs, add new features and improve performance, vs. > a single operator with a limited and well-defined functionality, and a > reference implementation that hasn't changed in years (but certainly > hackish in a way that probably could only be improved from R itself). > > Can't you really spot the difference? > > Iñaki > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel