[Rd] Appetite for eliminating dependency on Perl

2019-08-09 Thread Ken Williams
Preamble: I am in no way opposed to Perl in general - I love Perl and
probably always will.

R currently has Perl as both a build-time and run-time dependency.  This
adds about 200 Mb, give or take, to the required environment size (as
measured in CentOS - looks like it might be a bit smaller in Ubuntu?).

Not such a huge deal, really, but the actual benefit R gets from the
dependency is quite small.  From my poking around in the R sources (using
`git grep -P '\bperl\b(?! ?= ?(?:TRUE|FALSE))' ` as a filter), it looks
like it's only used in the following nooks & crannies:

* tools/help2man.pl
* tools/install-info.pl
* configure:  INSTALL_INFO="perl \$(top_srcdir)/tools/install-info.pl"
* m4/R.m4:  INSTALL_INFO="perl \$(top_srcdir)/tools/install-info.pl"

Ultimately that's only two scripts.  `help2man.pl` seems like it's part of
the build process, but not used at runtime.  `install-info.pl` seems like
maybe it's runnable at runtime, but requires user initiation to run, at
which point the user is expected to have perl installed.  Either one of
them could probably be ported to another language pretty easily, maybe even
R.

Anything else I missed?

If someone were to volunteer the porting work, would there be any appetite
for eliminating the dependency?

  -Ken

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bug filed on unzip() function

2010-12-21 Thread Ken Williams
Hi,

A few days ago I filed a bug report on the unzip() function:

  https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14462

I haven't gotten any comments yet, so I thought I'd ask for comments
here.  I also see on the description of R-devel that the list "also
receives all (filtered, i.e. non-spam!) bug reports from R-bugs", but
I don't see it here.

Eventually I would like to help unzip() gain large-file support, such
as is offered by http://info-zip.org/UnZip.html version 6.0.  A
corresponding zip() function would be nice too.

Thanks.

 -Ken

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug filed on unzip() function

2010-12-24 Thread Ken Williams
On Thu, Dec 23, 2010 at 11:22 PM, Marc Schwartz wrote:

> Also, I don't know what the typical response time has been on Bugzilla once
> a bug report is filed. Perhaps something could be noted there so that bug
> reporters might have some expectation that a comment/reply might be
> forthcoming within X days of filing. After that time frame, some recommended
> form of follow up communication could take place as a tickler/reminder of
> sorts.
>

Well, as a concrete data point - nobody's yet commented on the bug report,
or on this list, about the original issue I brought up:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14462

I haven't filed bug reports before, but in your experience does Warnocking
like this happen frequently?

 -Ken

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [patch] giving library() a 'version' argument

2012-04-11 Thread Ken Williams
I've made a small enhancement to R that would help developers better control 
what versions of code we're using where.  Basically, to load a package in R, 
one currently does:

library(whateverPackage)

and with the enhancement, you can ensure that you're getting at least version X 
of the package:

library(whateverPackage, version=3.14)

Reasons one might want this include:

  * you know that in version X some bug was fixed
  * you know that in version X some feature was added
  * that's the first version you've actually tested it with & you don't want to 
vouch for earlier versions without testing
  * you develop on one machine & deploy on another machine you don't control, 
and you want runtime checks that the sysadmin installed what they were supposed 
to install

In general, I have an interest in helping R get better at various things that 
would help it play in a "production environment", for various values of that 
term. =)

The attached patch is made against revision 58980 of 
https://svn.r-project.org/R/trunk .  I think this is the first patch I've 
submitted to the R core, so please let me know if anything's amiss, or of 
course if there are reservations about the approach.

Thanks.

--
Ken Williams, Senior Research Scientist
WindLogics
http://windlogics.com



CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended 
recipient(s) and may contain confidential and privileged information. Any 
unauthorized review, use, disclosure or distribution of any kind is strictly 
prohibited. If you are not the intended recipient, please contact the sender 
via reply e-mail and destroy all copies of the original message. Thank you.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-11 Thread Ken Williams
Apparently the patch file got eaten.  Let me try again with a .txt extension.

 -Ken

> -Original Message-
> From: Ken Williams
> Sent: Wednesday, April 11, 2012 10:28 AM
> To: r-devel@r-project.org
> Subject: [patch] giving library() a 'version' argument
>
> I've made a small enhancement to R that would help developers better
> control what versions of code we're using where.
> [...]


CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended 
recipient(s) and may contain confidential and privileged information. Any 
unauthorized review, use, disclosure or distribution of any kind is strictly 
prohibited. If you are not the intended recipient, please contact the sender 
via reply e-mail and destroy all copies of the original message. Thank you.
Index: src/library/base/man/library.Rd
===
--- src/library/base/man/library.Rd (revision 58980)
+++ src/library/base/man/library.Rd (working copy)
@@ -21,7 +21,7 @@
 character.only = FALSE, logical.return = FALSE,
 warn.conflicts = TRUE, quietly = FALSE,
 keep.source = getOption("keep.source.pkgs"),
-verbose = getOption("verbose"))
+verbose = getOption("verbose"), version)
 
 require(package, lib.loc = NULL, quietly = FALSE,
 warn.conflicts = TRUE,
@@ -59,6 +59,9 @@
   \item{quietly}{a logical.  If \code{TRUE}, no message confirming
 package loading is printed, and most often, no errors/warnings are
 printed if package loading fails.}
+  \item{version}{the minimum acceptable version of the package to load.
+If a lesser version is found, the package will not be loaded and an
+exception will be thrown.}
 }
 \details{
   \code{library(package)} and \code{require(package)} both load the
@@ -189,6 +192,10 @@
 search()# "splines", too
 detach("package:splines")
 
+# To require a specific minimum version:
+library(splines, '2.14')
+detach("package:splines")
+
 # if the package name is in a character vector, use
 pkg <- "splines"
 library(pkg, character.only = TRUE)
Index: src/library/base/R/library.R
===
--- src/library/base/R/library.R(revision 58980)
+++ src/library/base/R/library.R(working copy)
@@ -32,7 +32,7 @@
 function(package, help, pos = 2, lib.loc = NULL, character.only = FALSE,
  logical.return = FALSE, warn.conflicts = TRUE,
 quietly = FALSE, keep.source = getOption("keep.source.pkgs"),
- verbose = getOption("verbose"))
+ verbose = getOption("verbose"), version)
 {
 if (!missing(keep.source))
 warning("'keep.source' is deprecated and will be ignored")
@@ -276,6 +276,11 @@
stop(gettextf("%s is not a valid installed package",
   sQuote(package)), domain = NA)
 pkgInfo <- readRDS(pfile)
+if (!missing(version)) {
+pver <- pkgInfo$DESCRIPTION["Version"]
+if (compareVersion(pver, as.character(version)) < 0)
+stop("Version ", version, " of '", package, "' required, 
but only ", pver, " is available")
+}
 testRversion(pkgInfo, package, pkgpath)
 ## avoid any bootstrapping issues by these exemptions
 if(!package %in% c("datasets", "grDevices", "graphics", "methods",
@@ -332,10 +337,18 @@
 stop(gettextf("package %s does not have a NAMESPACE and should be 
re-installed",
   sQuote(package)), domain = NA)
}
-   if (verbose && !newpackage)
-warning(gettextf("package %s already present in search()",
- sQuote(package)), domain = NA)
+if (!newpackage) {
+   if (verbose)
+   warning(gettextf("package %s already present in search()",
+sQuote(package)), domain = NA)
+   if (!missing(version)) {
+   pver <- packageVersion(package)
+   if (compareVersion(as.character(pver), as.character(version)) < 
0)
+   stop("Version ", version, " of '", package,"' required, ",
+"but a lesser version ", pver, " is already loaded")
 
+   }
+   }
 }
 else if(!missing(help)) {
if(!character.only)
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-12 Thread Ken Williams

> -Original Message-
> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> Sent: Thursday, April 12, 2012 7:54 AM
> To: Duncan Murdoch
> Cc: Ken Williams; r-devel@r-project.org
> Subject: Re: [Rd] [patch] giving library() a 'version' argument
>
> A very important point is that library() *had* a 'version' argument for 
> several
> years, and this is not what it did.

That is unfortunate.  So such a mechanism would need to use a different 
argument name.

For completeness in this thread, I dug up the fact that it seems to have been 
removed in the 2.9.0 release:

o   Support for versioned installs (R CMD INSTALL --with-package-versions
and install.packages(installWithVers = TRUE)) has been removed.
Packages installed with versioned names will be ignored.

I'll address Duncan's concerns in a separate message.

 -Ken


CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:7}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-12 Thread Ken Williams


> -Original Message-
> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
> Sent: Thursday, April 12, 2012 7:22 AM
> To: Ken Williams
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] [patch] giving library() a 'version' argument
>
> On 12-04-11 11:28 AM, Ken Williams wrote:
> >
> > Reasons one might want this include:
> >
> >* you know that in version X some bug was fixed
> >* you know that in version X some feature was added
> >* that's the first version you've actually tested it with &  you don't 
> > want to
> >  vouch for earlier versions without testing
> >* you develop on one machine &  deploy on another machine you don't
> > control, and you want runtime checks that the sysadmin installed what
> > they were supposed to install
>
> I don't really see the need for this.  Packages already have a scheme for
> requiring a particular version of a package, so this would only be useful in
> scripts run outside of packages.

The main distinction here is that the existing package mechanism enforces 
version requirements at *install* time, but this mechanism enforces it at *run* 
time.  So this indeed applies well to scripts run outside packages, but it's 
also useful inside packages when they're loading their dependencies at runtime. 
 I was trying to illustrate that with the 4 bullet points above (especially the 
last one) but I should have said so explicitly.

It can happen very easily that constraints that were satisfied at install time 
get out of whack by subsequent package installations, but the violations go 
undetected.  The result can be breakage, whether dramatic or subtle.

The main hats targeted here are really people (like me, of course) who are 
trying to "productionize" results, not so much people who are doing offline 
analysis.  In a production system

> But what if your script requires a particular
> (perhaps obsolete) version of a package?  This change only puts a lower
> bound on the version number, and version requirements can be more
> elaborate than that.

Certainly true; this was meant as a first iteration, and support for the more 
elaborate requirements specifications could certainly be added.

The more elaborate specs actually illustrate the need for a runtime mechanism 
nicely - if code X (which may be a package, or a script, it doesn't matter) 
requires exactly version 3.14 of package B, and someone in the production team 
upgrades version 3.14 to version 3.78 because "it's faster" or "it's less 
buggy" or "we just like to have the latest version of everything all the time", 
then someone needs to be alerted to the problem.  One alternative solution 
would be to use a full-fledged package management system like RPM or Deb to 
track all the dependencies, but yikes, that doesn't sound fun.

> I think my advice would be:
>
> 1.  Put your code in a package, and use the version specifications there.
>
> 2.  If you must write it in a script, then put a version test at the top, 
> using packageVersion().

Certainly those are alternatives, but to us they are somewhat unsatisfactory.  
The first option doesn't help with the crux of the problem, which is runtime 
enforcement.  The second is essentially the same solution I've proposed, but 
doesn't help anyone outside our organization who has the same problem.

 -Ken

CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:7}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-12 Thread Ken Williams


> -Original Message-
> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
> Sent: Thursday, April 12, 2012 12:27 PM
> To: Ken Williams
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] [patch] giving library() a 'version' argument
>
> I haven't tested it, but according to the documentation in Writing R
> Extensions, the dependencies are enforced at the time library() is called.

Oh, I hadn't suspected that.  I can look into testing that, if it's true then 
of course that changes this all.  I probably won't be able to do that for a few 
days because I'll be traveling though.

I've never noticed a package failing to load at runtime because its 
prereq-version dependency wasn't met though.

> [...]
> But a single line at the top of the script would fix this:
>
> stopifnot(packageVersion("foo") == "3.14")

For the most common use case, that would look more like:

stopifnot(compareVersion(packageVersion("foo"), "3.14") < 0)

which gets less declarative, and I'd argue less clear about exactly what it's 
trying to enforce.

And I can see myself (& presumably others) getting that comparison operator 
backwards a lot, having to look it up each time or copy-paste it from other 
code.

And then that still doesn't add nice error messages, that would be yet more 
code.

*And*, it doesn't actually behave correctly if the package is already loaded by 
other code, because it might have been loaded from a different location than 
the one that would be found in the packageVersion() call.  (Or am I maybe wrong 
about what packageVersion() does in that case?  I don't think the docs specify 
that behavior.)


For prior art on this whole concept, a useful precedent is the 'use()' function 
in Perl, which accepts a version argument, even though there is also robust 
version checking at installation/testing time.

>
> Another problem with putting this into library() is that packages aren't
> always loaded by library():  there is require(), and there are implicit
> loads triggered by dependencies of other packages.

That's not really a problem.  If someone wants to enforce a runtime dependency, 
they stick the enforcement line into their code, and it will correctly stop if 
the criterion is not met.

 -Ken

CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:7}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-12 Thread Ken Williams


> -Original Message-
> From: Roebuck,Paul L [mailto:proeb...@mdanderson.org]
> Sent: Thursday, April 12, 2012 1:03 PM
> To: R-devel
> Cc: Ken Williams
> Subject: Re: [Rd] [patch] giving library() a 'version' argument
>
> On 4/12/12 10:11 AM, Ken Williams wrote:
>
> >> On 4/12/12 7:22 AM, Duncan Murdoch wrote:
> > [SNIP]
> > ...
> > The main hats targeted here are really people (like me, of course) who
> > are trying to "productionize" results, not so much people who are
> > doing offline analysis.  In a production system
> >
> >> But what if your script requires a particular (perhaps obsolete)
> >> version of a package?  This change only puts a lower bound on the
> >> version number, and version requirements can be more elaborate than
> >> that.
> >
> > Certainly true; this was meant as a first iteration, and support for
> > the more elaborate requirements specifications could certainly be added.
> >
> > The more elaborate specs actually illustrate the need for a runtime
> > mechanism nicely - if code X (which may be a package, or a script, it
> > doesn't matter) requires exactly version 3.14 of package B, and
> > someone in the production team upgrades version 3.14 to version 3.78
> > because "it's faster" or "it's less buggy" or "we just like to have
> > the latest version of everything all the time", then someone needs to
> > be alerted to the problem.  One alternative solution would be to use a
> > full-fledged package management system like RPM or Deb to track all the
> dependencies, but yikes, that doesn't sound fun.
>
> I appreciate your contribution of both time and energy.
>
> But I think the existing library() method is sufficient without this 
> modification.
> It's essentially syntactic sugar for:
>
> library(MASS); stopifnot(packageVersion("MASS") >= "7.3"))

I was about to write back & say "that's not correct, if '7.10' is installed, a 
string comparison will do the wrong thing."

But apparently it does the *right* thing, because 'numeric_version' class 
implements the comparison operator.

I'd still prefer to "Huffman-code it" to something shorter, to encourage people 
to use it, but I can see why others could consider it good enough.

I could contribute a doc patch to the 'numeric_version' man page to make it 
clearer what's available.  The 3 comparisons there happen to turn out the same 
way when done as a string comparison.

I also do still have a question about what packageVersion() does when a package 
is already loaded - does it go look for it again, or does it check the version 
of what's already loaded?  A doc patch could help here too.

 -Ken

CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:7}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] giving library() a 'version' argument

2012-04-13 Thread Ken Williams


From: Martin Maechler [maech...@stat.math.ethz.ch]

> Indeed nowadays,  packageDescription()  *)  *does*
> use the correct package version, by inspecting the "path"
> attribute of the package, in the same way as
>   searchpaths()

Yeah, that's what I suspected, but only from reading the code of 
packageDescription().  It doesn't seem to mention this in the docs.  And I 
wasn't 100% confident from reading the code, in case it was a 'promise' or 
something like that.

I'm willing to write a doc patch but it'll take a few days, I'm traveling.

---
Ken Williams, Senior Research Scientist
Applied Mathematics Group, WindLogics Inc.

CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:7}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] data.frame() args in transform()

2012-10-29 Thread Ken Williams
Starting in SVN revision 47035 (which shows up in the R-2-9-0 line), 
transform.data.frame() started accepting arguments like 'row.names' and 
'stringsAsFactors' to be passed through to the data.frame() function.  It looks 
like this was an unintentional side-effect of letting multiple columns be added 
properly.

Given that this has been implemented for quite a while, should it now be 
documented?  It's a little strange to support it though - the 
'stringsAsFactors' argument might be handy, but 'check.names', 'row.names', and 
'check.rows' are perhaps questionable.  If it's desirable to document the 
behavior, here's a possible patch.

-
Index: src/library/base/man/transform.Rd
===
--- src/library/base/man/transform.Rd  (revision 61043)
+++ src/library/base/man/transform.Rd   (working copy)
@@ -27,6 +27,10 @@
   \code{_data}.  The tags are matched against \code{names(_data)}, and for
   those that match, the value replace the corresponding variable in
   \code{_data}, and the others are appended to \code{_data}.
+
+  \code{transform.data.frame} also accepts the additional named
+  arguments that the \code{data.frame} function accepts,
+  e.g. \code{stringsAsFactors}.
}
 \value{
   The modified value of \code{_data}.
-

--
Ken Williams, Senior Research Scientist
WindLogics
http://windlogics.com



CONFIDENTIALITY NOTICE: This e-mail message is for the s...{{dropped:10}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Doc patch for Sys.time and system.time

2012-12-27 Thread Ken Williams
Here’s a patch that adds ‘seealso’ entries to Sys.time and system.time
docs, to help people who forget what the distinction is between them.

Patch was made against https://svn.r-project.org/R/trunk@61454 .

 -Ken
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Doc patch for Sys.time and system.time

2012-12-27 Thread Ken Williams
Duncan noticed that either the sending server (Gmail - shouldn't be the
case) or receiving server stripped out the attachment.  Here it is again,
inline.

 -Ken

===
>From 99766dd8f16804ecddc73f6169be3e42b916b8fa Mon Sep 17 00:00:00 2001
From: Ken Williams 
Date: Thu, 27 Dec 2012 09:58:21 -0600
Subject: [PATCH] Add system.time link to Sys.time documentation, and vice
 versa.


diff --git a/src/library/base/man/Sys.time.Rd
b/src/library/base/man/Sys.time.Rd
index d34571b..f0b0c50 100644
--- a/src/library/base/man/Sys.time.Rd
+++ b/src/library/base/man/Sys.time.Rd
@@ -41,6 +41,8 @@ Sys.Date()
   string.

   \code{\link{Sys.timezone}}.
+
+  \code{\link{system.time}} for measuring elapsed/CPU time of expressions.
 }
 \examples{\donttest{
 Sys.time()
diff --git a/src/library/base/man/system.time.Rd
b/src/library/base/man/system.time.Rd
index 5cd79b7..ad21267 100644
--- a/src/library/base/man/system.time.Rd
+++ b/src/library/base/man/system.time.Rd
@@ -38,6 +38,8 @@ unix.time(expr, gcFirst = TRUE)
 }
 \seealso{
   \code{\link{proc.time}}, \code{\link{time}} which is for time series.
+
+  \code{\link{Sys.time}} to get the current date & time.
 }
 \examples{
 require(stats)
-- 
1.7.9
===



On Thu, Dec 27, 2012 at 10:08 AM, Ken Williams  wrote:

> Here’s a patch that adds ‘seealso’ entries to Sys.time and system.time
> docs, to help people who forget what the distinction is between them.
>
> Patch was made against https://svn.r-project.org/R/trunk@61454 .
>
>  -Ken

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel