Re: [Rd] [patch] Documentation for list.files when no matches found

2019-01-07 Thread Tomas Kalibera

Thanks for the report, fixed in documentation in R-devel.

Best
Tomas

On 1/7/19 3:03 AM, Jonathan Carroll wrote:

Apologies in advance if this is already known but a search of the
r-devel archive did not immediately turn up any mentions.

list.files() (and thus dir()) returns character(0) when no files are
found in the requested path. This is useful and expected behaviour as
length(dir()) can be tested for success. The Value documentation,
however, indicates otherwise


A character vector containing the names of the files in the specified directories, or 
"" if there were no files.

which would be less useful and does not match current behaviour.

This appears to have been the case for the majority the lifetime of
the software so I'm not sure it's terribly important, but for the sake
of consistency, I propose the following simple patch.

Kind regards,

- Jonathan.

--- a/src/library/base/man/list.files.Rd
+++ b/src/library/base/man/list.files.Rd
@@ -45,7 +45,7 @@ list.dirs(path = ".", full.names = TRUE, recursive = TRUE)
  }
  \value{
A character vector containing the names of the files in the
-  specified directories, or \code{""} if there were no files.  If a
+  specified directories, or \code{character(0)} if there were no files.  If a
path does not exist or is not a directory or is unreadable it
is skipped, with a warning.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`

2019-01-07 Thread Evgeni Chasnovski
This is intended to be a bug report with proposed patch. I am posting to
this mailing list as described in NOTE in "Bug Reporting in R".

Function `stats:::regularize.values()` is meant to preprocess `x` and `y`
arguments to have "proper" values for later use during interpolation. If
input is already "proper", I would expect it to reuse the same objects
without creating new ones. However, this isn't the case and is the source
of unneccessary extra memory usage in `approx()` and others.

The root cause of this seems to be a forceful reordering in lines 37-39 of
'approx.R' file. If reordering is done only if `x` is unsorted then no
copies are created. Also this doesn't seem like breaking any existing code.

There is a patch attached.

Reproducable code:
x <- seq(1, 100, 1)
y <- seq(1, 100, 1)

reg_xy <- stats:::regularize.values(x, y, mean)

# Regularized versions of `x` and `y` are identical to input but are stored
at
# different places
identical(x, reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["x"]]))
#> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["y"]]))
#> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Differs from original only by using `if (is.unsorted(x))`
new_regularize.values <- function (x, y, ties) {
  x <- xy.coords(x, y, setLab = FALSE)
  y <- x$y
  x <- x$x
  if (any(na <- is.na(x) | is.na(y))) {
ok <- !na
x <- x[ok]
y <- y[ok]
  }
  nx <- length(x)
  if (!identical(ties, "ordered")) {
if (is.unsorted(x)) {
  o <- order(x)
  x <- x[o]
  y <- y[o]
}
if (length(ux <- unique(x)) < nx) {
  if (missing(ties))
warning("collapsing to unique 'x' values")
  y <- as.vector(tapply(y, match(x, x), ties))
  x <- ux
  stopifnot(length(y) == length(x))
}
  }
  list(x = x, y = y)
}

new_reg_xy <- new_regularize.values(x, y, mean)

# Output is still identical to input and also references to the same objects
identical(x, new_reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["x"]]))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, new_reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["y"]]))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Current R version
R.version
#>_
#> platform   x86_64-pc-linux-gnu
#> arch   x86_64
#> os linux-gnu
#> system x86_64, linux-gnu
#> status
#> major  3
#> minor  5.2
#> year   2018
#> month  12
#> day20
#> svn rev75870
#> language   R
#> version.string R version 3.5.2 (2018-12-20)
#> nickname   Eggshell Igloo


-- 
Best regards,
Evgeni Chasnovski
Index: src/library/stats/R/approx.R
===
--- src/library/stats/R/approx.R	(revision 75926)
+++ src/library/stats/R/approx.R	(working copy)
@@ -34,9 +34,11 @@
 }
 nx <- length(x)
 if (!identical(ties, "ordered")) {
-	o <- order(x)
-	x <- x[o]
-	y <- y[o]
+	if (is.unsorted(x)) {
+	o <- order(x)
+	x <- x[o]
+	y <- y[o]
+	}
 	if (length(ux <- unique(x)) < nx) {
 	if (missing(ties))
 		warning("collapsing to unique 'x' values")
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug report with patch: `stats:::regularize.values()` always creates full copies of `x` and `y`

2019-01-07 Thread Evgeni Chasnovski
This is intended to be a bug report with proposed patch. I am posting to
this mailing list as described in NOTE in "Bug Reporting in R".

Function `stats:::regularize.values()` is meant to preprocess `x` and `y`
arguments to have "proper" values for later use during interpolation. If
input is already "proper", I would expect it to reuse the same objects
without creating new ones. However, this isn't the case and is the source
of unneccessary extra memory usage in `approx()` and others.

The root cause of this seems to be a forceful reordering in lines 37-39 of
'approx.R' file. If reordering is done only if `x` is unsorted then no
copies are created. Also this doesn't seem like breaking any existing code.

There is a patch attached.

Reproducable code:
x <- seq(1, 100, 1)
y <- seq(1, 100, 1)

reg_xy <- stats:::regularize.values(x, y, mean)

# Regularized versions of `x` and `y` are identical to input but are stored
at
# different places
identical(x, reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["x"]]))
#> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(reg_xy[["y"]]))
#> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Differs from original only by using `if (is.unsorted(x))`
new_regularize.values <- function (x, y, ties) {
  x <- xy.coords(x, y, setLab = FALSE)
  y <- x$y
  x <- x$x
  if (any(na <- is.na(x) | is.na(y))) {
ok <- !na
x <- x[ok]
y <- y[ok]
  }
  nx <- length(x)
  if (!identical(ties, "ordered")) {
if (is.unsorted(x)) {
  o <- order(x)
  x <- x[o]
  y <- y[o]
}
if (length(ux <- unique(x)) < nx) {
  if (missing(ties))
warning("collapsing to unique 'x' values")
  y <- as.vector(tapply(y, match(x, x), ties))
  x <- ux
  stopifnot(length(y) == length(x))
}
  }
  list(x = x, y = y)
}

new_reg_xy <- new_regularize.values(x, y, mean)

# Output is still identical to input and also references to the same objects
identical(x, new_reg_xy[["x"]])
#> [1] TRUE
.Internal(inspect(x))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["x"]]))
#> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

identical(y, new_reg_xy[["y"]])
#> [1] TRUE
.Internal(inspect(y))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...
.Internal(inspect(new_reg_xy[["y"]]))
#> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,...

# Current R version
R.version
#>_
#> platform   x86_64-pc-linux-gnu
#> arch   x86_64
#> os linux-gnu
#> system x86_64, linux-gnu
#> status
#> major  3
#> minor  5.2
#> year   2018
#> month  12
#> day20
#> svn rev75870
#> language   R
#> version.string R version 3.5.2 (2018-12-20)
#> nickname   Eggshell Igloo

-- 
Best regards,
Evgeni Chasnovski
Index: src/library/stats/R/approx.R
===
--- src/library/stats/R/approx.R	(revision 75926)
+++ src/library/stats/R/approx.R	(working copy)
@@ -34,9 +34,11 @@
 }
 nx <- length(x)
 if (!identical(ties, "ordered")) {
-	o <- order(x)
-	x <- x[o]
-	y <- y[o]
+	if (is.unsorted(x)) {
+	o <- order(x)
+	x <- x[o]
+	y <- y[o]
+	}
 	if (length(ux <- unique(x)) < nx) {
 	if (missing(ties))
 		warning("collapsing to unique 'x' values")
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unsorted - suggestion for performance improvement and ALTREP support for POSIXct

2019-01-07 Thread Harvey Smith
I believe the performance of isUnsorted() in sort.c could be improved by
calling REAL() once (outside of the for loop), rather than calling it twice
inside the loop.   As an aside, it is implemented in the faster way in
doSort() (sort.c line 401).  The example below shows the performance
improvement for a vectors of double of moving REAL() outside the for loop.

# example as implemented in isUnsorted
body = "
R_xlen_t n, i;
n = XLENGTH(x);
for(i = 0; i+1 < n ; i++)
  if(REAL(x)[i] > REAL(x)[i+1])
return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f1 = inline::cfunction(sig = signature(x='numeric'), body=body)
# example updated with only one call to REAL()
body = "
R_xlen_t n, i;
n = XLENGTH(x);
double* real_x = REAL(x);
for(i = 0; i+1 < n ; i++)
  if(real_x[i] > real_x[i+1])
return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f2 = inline::cfunction(sig = signature(x='numeric'), body=body)
# unsorted
x.double = as.double(1:1e7) + 0
x.posixct = Sys.time() + x.double
microbenchmark::microbenchmark(
  f1(x.double),
  f2(x.double),  # faster due to one REAL()
  f1(x.posixct),
  f2(x.posixct), # faster due to one REAL()
  unit='ms', times=10)
Unit: milliseconds
  expr   minlq  meanmedianuq  max
neval
  f1(x.double) 35.737629 37.991785 43.004432 38.575525 39.198533 80.85625
 10
  f2(x.double)  6.053373  6.064323  7.238750  6.092453  8.438550 10.69384
 10
 f1(x.posixct) 36.315705 36.542253 42.349745 38.355395 39.378262 81.59857
 10
 f2(x.posixct)  6.063946  6.070741  7.579176  6.138518  7.063024 13.94141
 10



I would also like to suggest ALTREP support for POSIXct vectors, which are
interpreted as type REAL in the c code, but do not gain the performance
benefits of real vectors.  Sorted vectors of timestamps are important for
joining time series and in calls to findInterval().

# unsorted vectors
x.double = as.double(1:1e7) + 0
x.posixct = Sys.time() + x.double
# sort for altrep benefit
x.double.sort <- sort(x.double)
x.posixct.sort <- sort(x.posixct)
microbenchmark::microbenchmark(
  is.unsorted(x.double),
  is.unsorted(x.double.sort), # faster due to altrep
  is.unsorted(x.posixct),
  is.unsorted(x.posixct.sort), # no altrep benefit
  unit='ms', times=10)
Unit: milliseconds
expr   minlq   mean median
   uqmax neval
   is.unsorted(x.double) 16.987730 17.010008 17.1577173 17.0862785
17.308674  17.47443210
  is.unsorted(x.double.sort)  0.000378  0.000756  0.0065327  0.0075525
 0.010195   0.01170610
  is.unsorted(x.posixct) 36.925876 37.084837 43.4125593 37.4695915
41.858589  78.74217410
 is.unsorted(x.posixct.sort) 36.966654 37.031975 51.1228686 37.1235380
37.777319 153.27017010


Since there do not appear to be any tests for is.unsorted() these are some
tests to be added for some types.

# integer sequence
x <- -10L:10L
stopifnot(!is.unsorted(x, na.rm = F, strictly = T))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# integer not strictly
x <- -10L:10L
x[2] <- x[3]
stopifnot( is.unsorted(x, na.rm = F, strictly = T))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# integer with NA
x <- -10L:10L
x[2] <- NA
stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))
# double
x <- seq(from = -10, to = 10, by=0.01)
stopifnot(!is.unsorted(x, na.rm = F, strictly = T))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# double not strictly
x <- seq(from = -10, to = 10, by=0.01)
x[2] <- x[3]
stopifnot( is.unsorted(x, na.rm = F, strictly = T))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# double with NA
x <- seq(from = -10, to = 10, by=0.01)
x[length(x)] <- NA
stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))
# logical
stopifnot(!is.unsorted( c(F, T, T), strictly = F))
stopifnot( is.unsorted( c(F, T, T), strictly = T))
stopifnot( is.unsorted( c(T, T, F), strictly = F))
stopifnot( is.unsorted( c(T, T, F), strictly = T))
# POSIXct
x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# POSIXct not strictly
x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
x[2] <- x[3]
stopifnot( is.unsorted(x, na.rm = F, strictly = T))
stopifnot(!is.unsorted(x, na.rm = F, strictly = F))
# POSIXct with NA
x <- seq(from=as.POSIXct('2018-1-1'), to=as.POSIXct('2019-1-1'), by='day')
x[length(x)] <- NA
stopifnot(!is.unsorted(x, na.rm = T, strictly = F))
stopifnot(is.na(is.unsorted(x, na.rm = F, strictly = F)))

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Runnable R packages

2019-01-07 Thread David Lindelof
Dear all,

I’m working as a data scientist in a major tech company. I have been using
R for almost 20 years now and there’s one issue that’s been bugging me of
late. I apologize in advance if this has been discussed before.

R has traditionally been used for running short scripts or data analysis
notebooks, but there’s recently been a growing interest in developing full
applications in the language. Three examples come to mind:

1) The Shiny web application framework, which facilitates the developent of
rich, interactive web applications
2) The httr package, which provides lower-level facilities than Shiny for
writing web services
3) Batch jobs run by data scientists according to, say, a cron schedule

Compared with other languages, R’s support for such applications is rather
poor. The Rscript program is generally used to run an R script or an
arbitrary R expression, but I feel it suffers from a few problems:

1) It encourages developers of batch jobs to provide their code in a single
R file (bad for code structure and unit-testability)
2) It provides no way to deal with dependencies on other packages
3) It provides no way to "run" an application provided as an R package

For example, let’s say I want to run a Shiny application that I provide as
an R package (to keep the code modular, to benefit from unit tests, and to
declare dependencies properly). I would then need to a) uncompress my R
package, b) somehow, ensure my dependencies are installed, and c) call
runApp(). This can get tedious, fast.

Other languages let the developer package their code in "runnable"
artefacts, and let the developer specify the main entry point. The
mechanics depend on the language but are remarkably similar, and suggest a
way to implement this in R. Through declarations in some file, the
developer can often specify dependencies and declare where the program’s
"main" function resides. Consider Java:

Artefact: .jar file
Declarations file: Manifest file
Entry point: declared as 'Main-Class'
Executed as: java -jar 

Or Python:

Artefact: Python package, typically as .tar.gz source distribution file
Declarations file: setup.py (which specifies dependencies)
Entry point: special __main__() function
Executed as: python -m 

R has already much of this machinery:

Artefact: R package
Declarations file: DESCRIPTION
Entry point: ?
Executed as: ?

I feel that R could benefit from letting the developer specify, possibly in
DESCRIPTION, how to "run" the package. The package could then be run
through, for example, a new R CMD command, for example:

R CMD RUN  

I’m sure there are plenty of wrinkles in this idea that need to be ironed
out, but is this something that has ever been considered, or that is on R’s
roadmap?

Thanks for reading so far,



David Lindelöf, Ph.D.
+41 (0)79 415 66 41 or skype:david.lindelof
http://computersandbuildings.com
Follow me on Twitter:
http://twitter.com/dlindelof

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Gergely Daróczi
Dear David, sharing some related (subjective) thoughts below.

On Mon, Jan 7, 2019 at 9:53 PM David Lindelof  wrote:
>
> Dear all,
>
> I’m working as a data scientist in a major tech company. I have been using
> R for almost 20 years now and there’s one issue that’s been bugging me of
> late. I apologize in advance if this has been discussed before.
>
> R has traditionally been used for running short scripts or data analysis
> notebooks, but there’s recently been a growing interest in developing full
> applications in the language. Three examples come to mind:
>
> 1) The Shiny web application framework, which facilitates the developent of
> rich, interactive web applications
> 2) The httr package, which provides lower-level facilities than Shiny for
> writing web services
> 3) Batch jobs run by data scientists according to, say, a cron schedule
>
> Compared with other languages, R’s support for such applications is rather
> poor. The Rscript program is generally used to run an R script or an
> arbitrary R expression, but I feel it suffers from a few problems:
>
> 1) It encourages developers of batch jobs to provide their code in a single
> R file (bad for code structure and unit-testability)

I think it rather encourages developers to create (internal) R
packages and use those from the batch jobs. This way the structure is
pretty clean, sharing code between scripts is easy, unit testing can
be done within the package etc.

> 2) It provides no way to deal with dependencies on other packages

See above: create R package(s) and use those from the scripts.

> 3) It provides no way to "run" an application provided as an R package
>
> For example, let’s say I want to run a Shiny application that I provide as
> an R package (to keep the code modular, to benefit from unit tests, and to
> declare dependencies properly). I would then need to a) uncompress my R
> package, b) somehow, ensure my dependencies are installed, and c) call
> runApp(). This can get tedious, fast.

You can provide your app as a Docker image, so that the end-user
simply calls a "docker pull" and then "docker run" -- that can be done
from a user-friendly script as well.
Of course, this requires Docker to be installed, but if that's a
problem, probably better to "ship" the app as a web application and
share a URL with the user, eg backed by shinyproxy.io

>
> Other languages let the developer package their code in "runnable"
> artefacts, and let the developer specify the main entry point. The
> mechanics depend on the language but are remarkably similar, and suggest a
> way to implement this in R. Through declarations in some file, the
> developer can often specify dependencies and declare where the program’s
> "main" function resides. Consider Java:
>
> Artefact: .jar file
> Declarations file: Manifest file
> Entry point: declared as 'Main-Class'
> Executed as: java -jar 
>
> Or Python:
>
> Artefact: Python package, typically as .tar.gz source distribution file
> Declarations file: setup.py (which specifies dependencies)
> Entry point: special __main__() function
> Executed as: python -m 
>
> R has already much of this machinery:
>
> Artefact: R package
> Declarations file: DESCRIPTION
> Entry point: ?
> Executed as: ?
>
> I feel that R could benefit from letting the developer specify, possibly in
> DESCRIPTION, how to "run" the package. The package could then be run
> through, for example, a new R CMD command, for example:
>
> R CMD RUN  
>
> I’m sure there are plenty of wrinkles in this idea that need to be ironed
> out, but is this something that has ever been considered, or that is on R’s
> roadmap?
>
> Thanks for reading so far,
>
>
>
> David Lindelöf, Ph.D.
> +41 (0)79 415 66 41 or skype:david.lindelof
> http://computersandbuildings.com
> Follow me on Twitter:
> http://twitter.com/dlindelof
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Dirk Eddelbuettel


On 3 January 2019 at 11:43, David Lindelof wrote:
| Dear all,
| 
| I’m working as a data scientist in a major tech company. I have been using
| R for almost 20 years now and there’s one issue that’s been bugging me of
| late. I apologize in advance if this has been discussed before.
| 
| R has traditionally been used for running short scripts or data analysis
| notebooks, but there’s recently been a growing interest in developing full
| applications in the language. Three examples come to mind:
| 
| 1) The Shiny web application framework, which facilitates the developent of
| rich, interactive web applications
| 2) The httr package, which provides lower-level facilities than Shiny for
| writing web services
| 3) Batch jobs run by data scientists according to, say, a cron schedule

That is a bit of a weird classification of "full applications". I have done
this about as long as you but I also provided (at least as tests and demos)
  i)  GUI apps using tcl/tk (which comes with R) and
  ii) GUI apps with Qt (or even Wt), see my RInside package.

But my main weapon for 3) is littler. See

   https://cran.r-project.org/package=littler

and particularly the many examples at

   https://github.com/eddelbuettel/littler/tree/master/inst/examples
 
| Compared with other languages, R’s support for such applications is rather
| poor. The Rscript program is generally used to run an R script or an
| arbitrary R expression, but I feel it suffers from a few problems:
| 
| 1) It encourages developers of batch jobs to provide their code in a single
| R file (bad for code structure and unit-testability)
| 2) It provides no way to deal with dependencies on other packages
| 3) It provides no way to "run" an application provided as an R package

Err, no. See the examples/ directory above. About every single one uses
packages.

As illustrations I have long-running and somewhat visible cronjobs that are
implemented the same way: CRANberries (since 2007, now running hourly) and
CRAN Policy Watch (running once a day). Because both are 'hacks' I never
published the code but there is not that much to it. CRANberries just queries
CRAN, compares to what it had last, and writes out variants of the
DESCRIPTION file to text where a static blog engine (like Hugo, but older)
makes a feed and html pagaes out of it.  Oh, and we tweet because "why not?".
 
| For example, let’s say I want to run a Shiny application that I provide as
| an R package (to keep the code modular, to benefit from unit tests, and to
| declare dependencies properly). I would then need to a) uncompress my R
| package, b) somehow, ensure my dependencies are installed, and c) call
| runApp(). This can get tedious, fast.

Disagree here too. At work, I just write my code, organize it in packages,
update the packages and have shiny expose whatever makes sense.

| Other languages let the developer package their code in "runnable"
| artefacts, and let the developer specify the main entry point. The
| mechanics depend on the language but are remarkably similar, and suggest a
| way to implement this in R. Through declarations in some file, the
| developer can often specify dependencies and declare where the program’s
| "main" function resides. Consider Java:
| 
| Artefact: .jar file
| Declarations file: Manifest file
| Entry point: declared as 'Main-Class'
| Executed as: java -jar 
| 
| Or Python:
| 
| Artefact: Python package, typically as .tar.gz source distribution file
| Declarations file: setup.py (which specifies dependencies)
| Entry point: special __main__() function
| Executed as: python -m 
| 
| R has already much of this machinery:
| 
| Artefact: R package
| Declarations file: DESCRIPTION
| Entry point: ?
| Executed as: ?
| 
| I feel that R could benefit from letting the developer specify, possibly in
| DESCRIPTION, how to "run" the package. The package could then be run
| through, for example, a new R CMD command, for example:
| 
| R CMD RUN  
| 
| I’m sure there are plenty of wrinkles in this idea that need to be ironed
| out, but is this something that has ever been considered, or that is on R’s
| roadmap?

Hm. If _you_ have an itch to scratch here why don't _you_ implement a draft.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Murray Stokely
Some other major tech companies have in the past widely use Runnable R
Archives (".Rar" files), similar to Python .par files [1], and integrate
them completely into the proprietary R package build system in use there.
I thought there were a few systems like this that had made their way to
CRAN or the UseR conferences, but I don't have a link.

Building something specific to your organization on top of the python .par
framework to archive up R, your needed packages/shared libraries, and other
dependencies with a runner script to R CMD RUN your entry point in a
sandbox is pretty straightforward way to have control in a way that makes
sense for your environment.

  - Murray

[1] https://google.github.io/subpar/subpar.html

On Mon, Jan 7, 2019 at 12:53 PM David Lindelof  wrote:

> Dear all,
>
> I’m working as a data scientist in a major tech company. I have been using
> R for almost 20 years now and there’s one issue that’s been bugging me of
> late. I apologize in advance if this has been discussed before.
>
> R has traditionally been used for running short scripts or data analysis
> notebooks, but there’s recently been a growing interest in developing full
> applications in the language. Three examples come to mind:
>
> 1) The Shiny web application framework, which facilitates the developent of
> rich, interactive web applications
> 2) The httr package, which provides lower-level facilities than Shiny for
> writing web services
> 3) Batch jobs run by data scientists according to, say, a cron schedule
>
> Compared with other languages, R’s support for such applications is rather
> poor. The Rscript program is generally used to run an R script or an
> arbitrary R expression, but I feel it suffers from a few problems:
>
> 1) It encourages developers of batch jobs to provide their code in a single
> R file (bad for code structure and unit-testability)
> 2) It provides no way to deal with dependencies on other packages
> 3) It provides no way to "run" an application provided as an R package
>
> For example, let’s say I want to run a Shiny application that I provide as
> an R package (to keep the code modular, to benefit from unit tests, and to
> declare dependencies properly). I would then need to a) uncompress my R
> package, b) somehow, ensure my dependencies are installed, and c) call
> runApp(). This can get tedious, fast.
>
> Other languages let the developer package their code in "runnable"
> artefacts, and let the developer specify the main entry point. The
> mechanics depend on the language but are remarkably similar, and suggest a
> way to implement this in R. Through declarations in some file, the
> developer can often specify dependencies and declare where the program’s
> "main" function resides. Consider Java:
>
> Artefact: .jar file
> Declarations file: Manifest file
> Entry point: declared as 'Main-Class'
> Executed as: java -jar 
>
> Or Python:
>
> Artefact: Python package, typically as .tar.gz source distribution file
> Declarations file: setup.py (which specifies dependencies)
> Entry point: special __main__() function
> Executed as: python -m 
>
> R has already much of this machinery:
>
> Artefact: R package
> Declarations file: DESCRIPTION
> Entry point: ?
> Executed as: ?
>
> I feel that R could benefit from letting the developer specify, possibly in
> DESCRIPTION, how to "run" the package. The package could then be run
> through, for example, a new R CMD command, for example:
>
> R CMD RUN  
>
> I’m sure there are plenty of wrinkles in this idea that need to be ironed
> out, but is this something that has ever been considered, or that is on R’s
> roadmap?
>
> Thanks for reading so far,
>
>
>
> David Lindelöf, Ph.D.
> +41 (0)79 415 66 41 or skype:david.lindelof
> http://computersandbuildings.com
> Follow me on Twitter:
> http://twitter.com/dlindelof
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Dirk Eddelbuettel


On 7 January 2019 at 22:09, Gergely Daróczi wrote:
| You can provide your app as a Docker image, so that the end-user
| simply calls a "docker pull" and then "docker run" -- that can be done
| from a user-friendly script as well.
| Of course, this requires Docker to be installed, but if that's a
| problem, probably better to "ship" the app as a web application and
| share a URL with the user, eg backed by shinyproxy.io

Excellent suggestion.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-01-07 Thread Iñaki Ucar
On Mon, 7 Jan 2019 at 22:09, Gergely Daróczi  wrote:
>
> Dear David, sharing some related (subjective) thoughts below.
>
> You can provide your app as a Docker image, so that the end-user
> simply calls a "docker pull" and then "docker run" -- that can be done
> from a user-friendly script as well.
> Of course, this requires Docker to be installed, but if that's a
> problem, probably better to "ship" the app as a web application and
> share a URL with the user, eg backed by shinyproxy.io

If Docker is a problem, you can also try podman: same usage,
compatible with Dockerfiles and daemon-less, no admin rights required.

https://podman.io/

Iñaki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] unsorted - suggestion for performance improvement and ALTREP support for POSIXct

2019-01-07 Thread Gabriel Becker
Hi Harvey,

Its exciting to see people thinking about and looking at ALTREP speedups
"in the wild" :).   You're absolutely right that pulling out the REAL call
will give you a significant speedup, but ALTREP does add a little wrinkle
(and a solution to it!). Detailed responses and comments inline:

On Mon, Jan 7, 2019 at 11:58 AM Harvey Smith  wrote:

> I believe the performance of isUnsorted() in sort.c could be improved by
> calling REAL() once (outside of the for loop), rather than calling it twice
> inside the loop.   As an aside, it is implemented in the faster way in
> doSort() (sort.c line 401).  The example below shows the performance
> improvement for a vectors of double of moving REAL() outside the for loop.
>
> 
>
>
In light of ALTREP's inclusion in the R internals its best to avoid asking
things for their full data vector when you don't need to. Instead, you can
use the ITERATE_BY_REGION macro R (courtesy of Luke, I believe?) provides
in /R_ext/Itermacros.h. This is particularly true of R's
internals, which also preferably won't  "explode"/invalidate an ALTREP
(which asking for a writable pointer does) when they don't need to. Most
internal functions haven't been converted to this yet, as you see with
is.unsorted (and its not a high priority to do the conversion until it
becomes an issue for any given case), but this is what, e.g., R's own sum
function now does.

ITERATE_BY_REGION is based on *_GET_REGION, which was added to the C API
as part ALTREP, but works on ALTREP and normal vectors, and won't explode
in corner cases where materializing a full ALTREP vector would be
problematic. The core concept for ITERATE_BY_REGION is to grab regions (a
quick glance tells me its 512 elements at a time) of a vector, copying them
into a buffer, and using the same trick your code does by avoiding pointer
lookup inside the inner tight loop. Do note that as of now I had to compile
my function with language="C", rather than the default "C++" to avoid an
error about initializing a const double * with a const void * value.

On my machine, at least, you actually *nearly* the same speedup with all
the added safety. Eyeballing it I'm not convinced the difference is
statistically signfiicant, to be honest, but even if it is, you get most of
the benefit...


body = "
R_xlen_t n, i;
n = XLENGTH(x);
for(i = 0; i+1 < n ; i++)
  if(REAL(x)[i] > REAL(x)[i+1])
return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f1 = inline::cfunction(sig = signature(x='numeric'), body=body)

body = "
R_xlen_t n, i;
n = XLENGTH(x);
double* real_x = REAL(x);
for(i = 0; i+1 < n ; i++)
  if(real_x[i] > real_x[i+1])
return ScalarLogical(TRUE);
return ScalarLogical(FALSE);";
f2 = inline::cfunction(sig = signature(x='numeric'), body=body)

body = "
double tmp = -DBL_MAX; // minimum possible double value
ITERATE_BY_REGION(x, xptr, i, nbatch, double, REAL, {
  if(xptr[0] < tmp) //deal with batch barriers, tmp is end of last batch
return ScalarLogical(TRUE);
  for(R_xlen_t k = 0; k < nbatch - 1; k++) {
  if(xptr[k] > xptr[k+1])
return ScalarLogical(TRUE);
  }
  tmp = xptr[nbatch - 1];
});
return ScalarLogical(FALSE);";
f3 = inline::cfunction(sig = signature(x='numeric'), body=body, includes =
'#include "R_ext/Itermacros.h"',
   language = "C")

x.double = as.double(1:1e7) + 0
x.posixct = Sys.time() + x.double
microbenchmark::microbenchmark(
f1(x.double),
f2(x.double), # one REAL call
f3(x.double),  # ITERATE_BY_REGION
f1(x.posixct),
f2(x.posixct), # one REAL call
f3(x.posixct), # ITERATE_BY_REGION
unit='ms', times=100)



Unit: milliseconds

  expr   minlq  meanmedianuqmax

  f1(x.double) 26.377432 27.234192 28.156993 27.774590 28.602643  32.213378

  f2(x.double)  4.722712  4.854300  5.011549  4.991388  5.127996   5.523156

  f3(x.double)  4.759537  4.788137  5.408925  5.373667  5.713877   6.694330

 f1(x.posixct) 77.975030 78.853724 85.867995 82.530822 83.557849 123.546206

 f2(x.posixct)  4.637912  4.660033  4.872892  4.750513  4.880569   5.907149

 f3(x.posixct)  4.643806  4.665936  5.094212  5.085454  5.384414   5.778274

 neval

10

10

10

10

10

10



To be extra careful we can check that we're getting all the edges right
just incase, since the code is admittedly harder to follow and a bit more
arcane:

> x.double2 = x.double

> x.double2[512] = x.double[1] #unsorted at end of first batch

> stopifnot(f3(x.double2))

>

> x.double2a = x.double

> x.double2a[513] = x.double[1] #unsorted at beginning of 2nd batch

> stopifnot(f3(x.double2a))

>

>

> ##check edges

> x.double3 = x.double

> x.double3[length(x.double3)] = x.double3[1] #unsorted at last element

> stopifnot(f3(x.double3))

>

> x.double4 = x.double

> x.double4[1] = x.double[5] #unsorted at first element

> stopifnot(f3(x.double4)