Re: [Rd] How to handle INT8 data

2017-01-20 Thread Peter Haverty
For what it is worth, I would be extremely pleased to R's integer type go to 64bit. A signed 32bit integer is just a bit too small to index into the ~3 billion position human genome. The "work arounds" that have arisen for this specific issue are surprisingly complex. Pete

Re: [Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Peter Haverty
Dear Kirill, You are correct, that is a new bug introduced in PR16491. The appropriate fix and regression tests have been added via PR16885, which has been merged into trunk. I believe that means the fix will be released with R 3.3.1. I checked your example and the second "match" now properly ret

[Rd] RFC: deprecating some complexity in key methods package functions

2015-08-21 Thread Peter Haverty
Dear R community, I have been working on speeding up a few parts of the methods package and related environment/namespace functionality. I would like to get some community input on a few proposed changes that are rather small, but do involve deprecations. These changes would simplify and speed-up

Re: [Rd] names function for environments?

2015-01-27 Thread Peter Haverty
something like keySet(). But do we really need this? > > > > > > > On Tue, Jan 27, 2015 at 7:11 AM, Martin Maechler > wrote: >> >> >>>>> Peter Haverty >> >>>>> on Sun, 25 Jan 2015 12:21:04 -0800 writes: >> >> &

[Rd] names function for environments?

2015-01-25 Thread Peter Haverty
Hi all, The "ls" function wears two hats. It allows users to inspect an environment interactively and also serves deeper in code as the accessor for an environment's names/keys. I propose that we separate these two conflicting goals, keeping ls for interactive use and adding names for a quick list

Re: [Rd] speedbump in library

2015-01-23 Thread Peter Haverty
; > I can't speak to whether there are any pitfalls in changing the > library path searching, though. > > -Winston > > > On Thu, Jan 22, 2015 at 12:25 PM, Peter Haverty > wrote: >> Hi all, >> >> Profiling turned up a bit of a speedbump in the lib

Re: [Rd] :: and ::: as .Primitives?

2015-01-22 Thread Peter Haverty
seeing good arguments for use in >> ways that would be performance-critical, but I'm happy to be convinced >> otherwise. If there is a need for a faster :: then going to a >> SPECIALSXP is fine; it would also be good to make the byte code >> compiler aware of it, and pos

[Rd] :: and ::: as .Primitives?

2015-01-22 Thread Peter Haverty
Hi all, When S4 methods are defined on base function (say, "match"), the function becomes a method with the body "base::match(x,y)". A call to such a function often spends more time doing "::" than in the function itself. I always assumed that "::" was a very low-level thing, but it turns out to

[Rd] speedbump in library

2015-01-22 Thread Peter Haverty
Hi all, Profiling turned up a bit of a speedbump in the library function. I submitted a patch to the R bug tracker as bug 16168 and I've also included it below. The alternate code is simpler and easier to read/maintain, I believe. Any thoughts on other ways to write this? Index: src/library/base

Re: [Rd] reducing redundant work in methods package

2015-01-21 Thread Peter Haverty
from the use of elNamed(), given that [[ > now uses exact matching. > > Have you tried patching methods to use .BasicFunsList directly as in > setMethod? > > > On Wed, Jan 21, 2015 at 10:41 AM, Peter Haverty > wrote: > >> Hi all, >> >> The function call

[Rd] reducing redundant work in methods package

2015-01-21 Thread Peter Haverty
Hi all, The function call series genericForPrimitive -> .findBasicFuns -> .findAll happens 4400 times while the GenomicRanges package is loading. Each time .findAll follows a chain of environments to determine that the methods namespace is the only one that holds a variable called .BasicFunsList.

Re: [Rd] default min-v/nsize parameters

2015-01-19 Thread Peter Haverty
Hi All, This is a very important issue. It would be very sad to leave most users unaware of a free speedup of this size. These options don't appear in the R --help output. They really should be added there. Additionally, if the garbage collector is working very hard, might it emit a note about be

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Peter Haverty
Here are some quick measurements of Martin's accomplishment with "get0": In loading the package GenomicRanges, 30K calls to "exists" have been skipped. (However 99K still remain!) Overall, the current usage of "get0" seems to save us 10% in package loading time (no error bars on that measurement)

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-09 Thread Peter Haverty
Fantastic. I'm eager to try it out. Thanks for seeing this through. Regards, Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Fri, Jan 9, 2015 at 7:37 AM, Martin Maechler wrote: > > Martin Maechler > > on Fri, 9 Jan 2015 14:00:38 +0100 write

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty wrote: > How about unique them both and compare the lengths? It's less work, > especially allocation. > > > > Pete > > > Peter M. Haverty, Ph.D. > Genent

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
Try this out. It looks like a 2X speedup for some cases and a wash in others. "unique" does two allocations, but skipping the "> 0L" allocation could make up for it. library(microbenchmark) library(RUnit) x = sample.int(1e4, 1e5, TRUE) y = sample.int(1e4, 1e5, TRUE) set_equal <- function(x, y)

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > If you look at the definition of %in%, you'll find that it i

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Peter Haverty
Michael's idea has an interesting bonus that he and I discussed earlier. It would be very convenient to have a container of key/value pairs. I imagine many people often write this: x - mapply( names(x), x, FUN=function(k,v) { # work with key and value } especially ex perl people accustomed to w

Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

2015-01-08 Thread Peter Haverty
For what it's worth, I think we would need a new function if the default behavior changes. Since we already have "get" and "mget", maybe "cget" for "conditional get"? "if get", "safe get", ... I like the idea of keeping the original "not found" behavior if the "if.not.found" arg is missing. Howe

Re: [Rd] we need an exists/get hybrid

2014-12-03 Thread Peter Haverty
1.2015 exists("a", e, inherits = FALSE) > #> 2 1.0545 exists("a", envir = e, inherits = FALSE) > #> 3 0.3615 .Internal(exists("a", e, "any", FALSE)) > #> 4 7.6345 "a" %in% ls(e, all.names = TRUE) > #&

[Rd] we need an exists/get hybrid

2014-12-03 Thread Peter Haverty
Hi All, I've been looking into speeding up the loading of packages that use a lot of S4. After profiling I noticed the "exists" function accounts for a surprising fraction of the time. I have some thoughts about speeding up exists (below). More to the point of this post, Martin M�chler noted tha