t in the operator priority? E.g. how is
>
> a ! b > c
>
> parsed?
As (a ! b) > c.
Their precedence is between that of + and - and that of < and >.
So "x" ! 1+2 evalates to "x3" and "x" ! 1+2 < "x4" is TRUE.
(Actually, pqR also has a .. operator that fixes the problems with
generating sequences with the : operator, and it has precedence lower
than + and - and higher than ! and !!, but that's not relevant if you
don't have the .. operator.)
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
going on by having a different operator
for string concatenation.
Plus ! and !! semm natural for representing paste0 and paste, whereas
using ++ for paste (with + for paste0) would look rather strange.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
d even aside from whatever LAPACK is doing.
Regards,
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
nt to this discussion
is that the "merge" method is substantially faster than "shell" for
these character vectors with a million elements, while retaining the
stability and collation properties of "shell" (whereas "radix" only
does C collation).
It would probably not be too hard to take the merge sort code from pqR
and use it in R core's implementation.
The merge sort in pqR doesn't exploit parallelism at the moment, but
merge sort is potentially quite parallelizable (though I think the
storage allocation strategy I use would have to be modified).
Regards,
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
))
user system elapsed
0.041 0.000 0.041
Some of this may be due to pqR's faster garbage collector - R Core
implementatons have a particular GC problem with strings, as explained at
https://radfordneal.wordpress.com/2018/11/29/faster-garbage-col
sage from what is produced for any other list, while still retaining
the massive slowdown.
There is no need for you to write $.data.frame in C. You just need
to delete the version written in R.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
ke df$xyz by a factor of
seven achieves now is to add the words "in data frame" to the warning
message (while making the earlier part of the message less intelligible).
I think you might want to just delete the definition of $.data.frame,
reverting to the situation before R-3.1.0.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
me row name to all
the elements would be rather strange...).
After v <- a[1,_], the program may well have an expression like v[nc]
where nc is a column name. We want this to still work if there
happens to be only one column. That will happen only if a[1,_]
attaches a column name, not a row name, when a has only one column.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
-rs-design-flaws-in-a-new-version-of-pqr/
That was written when most of these features where introduced,
though getting your specific example right relies on another
change introduced in the most recent version.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
posted
a fast hashing algorithm that produces the same results as the simple
algorithm, here:
https://stat.ethz.ch/pipermail/r-devel/2017-October/075012.html
The latest version of this will be in the soon-to-be new release of
pqR, and will of course enabled automatically whenever it seems
desirable,
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> set.seed(123)
> m <- (2/5)*2^32
> m > 2^31
[1] FALSE
> x <- sample(m, 100, replace = TRUE)
> table(x %% 2)
0 1
499412 500588
So I dou
eeds up
existing code than to implement more and more special-case functions
like anyNA or some special function to allow length(unclass(x)) to be
done quickly.
The variant result mechanism has extremely low overhead, and is not
hard to implement.
Radford Neal
__
t's a bit
silly to plot the distributions of times, which will mostly reflect
variations in when garbage collections at various levels occur - just
the mean is what is relevant.
Regards,
Radford Neal
__
R-devel@r-project.org mailing list
t number
in the sequence, but rather (say) the 10th, might be preferable.
Radford Neal
> > seeds = c(86548915L, 86551615L, 86566163L, 86577411L, 86584144L,
> 86584272L,
> + 86620568L, 86724613L, 86756002L, 86768593L, 86772411L, 86781516L,
> + 86794389L, 86805854L, 86814600L, 8
is does work as advertised.
Not always. As I reported on bugzilla three years ago (PR#15878), it
only works if the logical argument does not have to be copied. The
bug has been fixed in pqR since pqR-2014-09-30.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
until after search for entry n-i */
}
VMAXSET(vmax);
}
return r;
}
This will be in a new version of pqR that will have many other performance
improvements as well, which I expect to release in a few weeks.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
iciently handling things
like x[-1] and x[2:length(x)], so I added the x[v] test to see what
performance is like when this special handling isn't invoked.
There's no particular reason pqR's code for these operations couldn't
be adapted for use in the R Core implementaton, though there are
probably a few issues involving large vectors, and the special
handling of x[2:length(x)] would require implementing pqR's internal
"variant result" mechanism. pqR also has much faster code for some
other subset and subset assignment operations.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
I, I'd suggest that you
produce clear and complete documention on the new scheme.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
ON RELEASED 2015-09-14. Some are fixed
in R-3.4.0, but most remain.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
butes on symbols are not saved when the workspace
is saved with q("yes").
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
on-style
literals would do is allow you to put big blocks of literal text in
your program, without having to put quotes around each line. But
shouldn't such text really be stored in a separate file that gets
read, rather than in the program source?
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
> :
> I don't think it is reasonable to change the parser this way. This is
> currently valid R code:
>
> a <- "foo"
> "bar"
>
> and with the new syntax, it is also valid, but with a different
> meaning.
Yes. The meaning of that would certainly need to stay the same.
However, the following i
for a new unary @ operator. I'm not necessarily advocating
that particular use (my ideas in this respect are still undergoing
revisions), but the overall point is that there may well be several
good uses of a unary @ operator (and there aren't many other good
characters to use for a unary op
ce improvements are implemented using pqR's "variant
result" mechanism, which also allows many other optimizations. See
https://radfordneal.wordpress.com/2013/06/30/how-pqr-makes-programs-faster-by-not-doing-things/
for some explanation. There is no particular reason this mecha
> From: "Cohn, Robert S"
>
> I am using R to multiply some large (30k x 30k double) matrices on a
> 64 core machine (xeon phi). I added some timers to
> src/main/array.c to see where the time is going. All of the time is
> being spent in the matprod function, most of that time is spent in
> d
would justify imposing this burden on users. A language
specification shouldn't really be changing all the time for no
particular reason.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
sample(x) when x is numeric of length 1,
There's a difference between these two. Giving an error when using a
1x1 matrix as a scalar may detect some programming bugs, but not
giving an error doesn't introduce a bug. Whereas sample(2:n) behaving
differently when n is 2 than when n i
.
Regarding the 0-length vector issue, I agree with other posters that
after a<-numeric(0), is has to be allowable to write a<1. To not
allow this would be highly destructive of code reliability. And for
similar reason, after a<-c(), a<1 needs to be allowed, wh
> maybe there?s a reason for it, but the discrepancy between the
> handling of `[` and `$` in deparsing seems odd to me:
>
> > substitute(a[1], list(a = quote(x * y)))
> x * y[1]
>
> > substitute(a$b, list(a = quote(x * y)))
> (x * y)$b
>
> The former is still executed in the right order (`*` firs
to anyone trying to understand the code.
I have tested the new parser on the 5018 packages in the pqR repository,
but of course it's possible that problems might show up in some other
CRAN packages. I'm willing to help in resolving any such problems as
we
One thing I forgot in my previous message about problems with
psignal.c on Rtools for Windows...
One also needs to change src/gnuwin32/fixed/h/psignal.h
At a minimum, one needs the following changes:
@@ -122,8 +129,8 @@ typedef struct
/* Prototype stuff
= b-1.0; printf("M: %.1e\n",d);
return 0;
}
At least on a 32-bit Windows 7 system, the output of "d" is 1e-17 for
all threads except thread 1, for which it is 0. That's the new thread
for which the __asm__("fninit") wasn
> On Fri, Aug 21, 2015 at 8:38 PM, Radford Neal wrote:
>
> > The reason for this appears to be that the omp.h file included is
> > appropriate only for the 32-bit version of the openMP library. As
> > far as I can see, there is no 64-bit version of this include file
&g
R (branch 44 at the moment). R Core might want to take a
look there, or at the next version of pqR, which will be released
soon.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
from outside R), and hence avoid
most of the extra procedure call overhead, but I haven't attempted this.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
ally?) avoid the problem by not
actually using mingw's sigset_t type, but instead always using int.
But it's possible that there is some reason to want to use the type
defined in sys/types.h.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
pplications may not do anything for
which the incorrect omp.h include file makes a difference.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
o-do list for pqR is to re-implement R's
garbage collector in a way that will avoid this (as well as having
various other advantages, including less memory overhead per object).
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
On Tue, Jul 14, 2015 at 07:52:56PM -0400, Duncan Murdoch wrote:
> On 14/07/2015 6:08 PM, Radford Neal wrote:
> > In testing pqR on Solaris SPARC systems, I have found two bugs that
> > are also present in recent R Core versions. You can see the bugs and
> > fixes
error, depending on
details of the compiler. It showed up with gcc 4.9.2 on a SPARC
system. The fix slightly changes the error behaviuor, signaling an
error on inappropriate reads before reading any data, rather than
after reading one (but not all) items as before.
Radford Neal
a dim attribute.
Allowing a dim attribute of length zero (presently disallowed) might
also be a good idea, because it could be used to explicitly mark a
single number as a scalar, not a vector of length one.
Radford Neal
__
R-devel@r-project.org mail
concatenation.
> Yes, even Fortran has one, and in C, I can simply write "literal1"
> "literal2" and they'll be concatenated. It is only for literals, but
> still very useful.
Concatenation of literal strings could eas
ed as a one-dimensional array, which
would be part of a solution to several other problems as well, as I
propose at http://www.cs.utoronto.ca/~radford/ftp/R-lang-ext.pdf
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
vel within about an hour of finding
what caused it. (I'd noticed the symptoms a few days before, but
hadn't isolated the cause.)
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
-tests-2.R:
x <- c("a", NA, "b")
factor(x)
factor(x, exclude="")
The solution (kludgy, but the whole concept is kludgy) is to forward
R_print.na_string and R_print.na_string_noquote with the other "roots"
in RunGenCollect (after the comment
ods for it are NOT required to
convert matrices to vectors.
So you're advocating slowing down all ordinary uses of diag to
accommodate a usage that nobody thought was important enough to
actually document.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
however (where help isn't needed,
since there's nothing wrong with the mod in my previous message).
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
be fixed as well.) There are also other optimizations in pqR
for these functions but the code is still quite similar to R-3.1.2.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
ifying the R code that was used for the previous
function call. There's no justification for doing this in the
documentation for lapply. It is certainly not desired behaviour,
except in so far as it allows a slight savings in time (which is
minor, given the t
Radford Neal:
> > there's a danger of getting carried away and essentially rewriting
> > malloc. To avoid this, one might try just calling "free" on the
> > no-longer-needed object, letting "malloc" then figure out when it can
> > be re-use
lace for statements such as w = w * Q, but not
curretly when the LHS variable does not appear on the RHS.
Regards,
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
brackets).
I've considered changing how this works in pqR, using pqR's "variant
return" mechanism to pass on the information that an inner statement
should be debugged. This would also allow one to avoid entering the
debugger for "if" when it is used as an expression rat
> so I guess you're looking at a modified version of the function... The
> implementation detail is in the comment -- FUN(X[i], ...) is evaluated in the
> frame of lapply.
>
> Martin Morgan
You may find it interesting that in the latest version of pqR, lapply is made
faster in the case where no
> The above question still stands, but otherwise, I overlooked the most
> obvious solution:
>
> dim_1 <- function(x) dim(x)
>
> Unit: nanoseconds
> expr minlq mean medianuq max neval cld
> dim(x) 0 172.941 1 12696 1000
. (Similarly, there are ScalarRealMaybeShared, etc.)
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
/* <<--- added in pqR */
#define ENCODE_LEVELS(v) ((v) << 12)
#define DECODE_LEVELS(v) ((v) >> 12)
#define DECODE_TYPE(v) ((v) & 255)
Please let me know if you see any problem with this, or if for some
reason you'd prefer that I use o
a logical matrix with FALSE off the
diagonal does not strike me as too odd. It makes sense if you think
of (FALSE, TRUE) as forming a field under operations of sum=XOR and
product=AND. And of course it will typically be converted to a 0/1
matrix later if th
to more than one variable, for instance), but
code like that above for .Call("myfunction",x) would be guaranteed to
work.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
e moment, but would be if code that allocates more
memory is added later, so let's be safe and do it now too". Otherwise,
the reader may infer an incorrect model of when to PROTECT, such as
"you should PROTECT every allocated object, then UNPROTECT at the end of
the procedure - do
t, the code isn't
safe because C doesn't specify the order of evaluation of argments,
so mkans(x) might be called before install("x").
One should also note that the PROTECT within mkans is unnecessary,
and must surely be confusing to anyone who thought (correctly)
that they had u
> Thank you for the wonderful suggestion . do you suggest to protect
> ScalarInteger(1)
> before calling lang3 ?
Yes, either the result of Scalar Integer(1) or the result of install
should be protected (before calling lang3 - don't try to do it inside
the argument list of lang3).
___
re this is the source of the actual problem, since both
"data.frame" and "head" presumably already exist, so the install won't
actually allocate memory. But this is not a safe coding method in
general.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
e latter works in some cases where the former gives an error.
So it's a bug.
Furthermore, crossprod(x,y) ought to be equivalent to t(x) %*% y,
even in cases where a vector needs to get promoted to a matrix,
because such cases can arise from inadvertant dropping of dimensions
when subscriptiong a m
you already have so
> they are easier to fix?
I did report at least one bug not long ago (which got fixed), after
seeing (as now) that an R Core release was imminent, and therefore
thinking it would be best if a fix was put in before it went out.
es to find where I made changes.
If your complaint is that your time is more valuable than my time, so
I should be producing a nice patch for you, then I don't agree. I went
to the effort of verifing that the bugs are still present in r66002.
That's as much as I'm willing to do.
ow, only the second warning message is produced.
o A bug has been fixed in which rbind would not handle non-vector
objects such as function closures, whereas cbind did handle them,
and both were documented to do so.
Regards,
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
,s) rather than having
to arduously type seq(a,b,by=s)? Maybe we should have glm_binomial,
glm_poisson, etc. so we don't have to remember the "family" argument?
This way lies madness...
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
EED environment variable is set, the random
seed is initialized to its value, rather than from the time and
process id. This was motivated by exactly this problem - I can now
just set R_SEED to something before running all the package checks.
Radford Neal
On Wed, Nov 06, 2013 at 02:40:59PM -0300, George Vega Yon wrote:
> Hi! You are right, what I actually use is SET_LENGTH... Is that ok?
> El 06/11/2013 14:00, "Radford Neal" escribi?:
Is SET_LENGTH a documented feature of the API? Not that I can see.
However, it is indee
nd getting the length right to start with.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
> > ..., my previous
> > dev-lang/R-2.10.1 ebuild package has been upgraded to R-3.0.1.
> > While compiling it, I have got the following compiler warning:
> >
> > * QA Notice: Package triggers severe warnings which indicate that it
> > *may exhibit random runtime failures.
> > * m
bable complete loss of accuracy in modulus
I think issuing a warning for this is probably not a good idea, but if
a warning is issued, it certainly shouldn't be this one.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
I have released a new, faster, version of R, which I call pqR (for
"pretty quick" R), based on R-2.15.0. Among many other improvements,
pqR supports automatic use of multiple cores to perform numerical
computations in parallel with other numerical computations, and with
the interpretive thread. I
e to protect
the result of eval (only if you will be doing further allocations
while still using sx).
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Several bugs are present in R-2.15.3 and R-alpha due to
naive copying of list elements.
The bug below is due to naive copying in subset.c (with
similar bugs for matrices and arrays):
a<-list(c(1,2),c(3,4),c(5,6))
b<-a[2:3]
a[[2]][2]<-9
print(b[[1]][2])
Naive copying in mapply.c leads to the foll
Is this a SPARC system? On at least some SPARC systems, the "long double"
type in C is implemented very slowly in software, and it seems that it is
used for the sums done when calculating standard deviations with "sd".
Radford Neal
> Date: Wed, 8 Aug 2012 18:55:3
The Rprofmem facility is currently enabled only if the configuration
option --enable-memory-profiling is used. However, the overhead of
having it enabled is negligible when profiling is not actually being
done, and can easily be made even smaller. So I think it ought to be
enabled all the time.
> But the whole point of separating VECTOR_SEXPREC from the other
> SEXPRECs is that they are _shorter_. A vecsxp is only going to be
> larger than (say) an envsxp if 2 R_len_t's are more than 3 pointers,
> which is quite unlikely since R_len_t variables holds things that
> one might add to pointer
There seems to be a latent flaw in the definition of struct SEXPREC
in Rinternals.h, which likely doesn't cause problems now, but could
if the relative sizes of data types changes.
The SEXPREC structure contains a union that includes a primsxp,
symsxp, etc, but not a vecsxp. However, in allocVect
You'll find that the time printed after b:, d:, and g: is near zero,
but that there is non-negligible time for f:. This is because sqrt
is primitive but t is not, so the modification to A after the call
t(A) requires that a copy be made.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
be of
substantial benefit to the R community. The R core team is welcome to
incorporate them into versions of R that they release in the future.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
e an error.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
enable-checking=release --build=i486-linux-gnu
--host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)
This R-2.12.0 installation generally works (based on testing a few
trivial things). The same error occurs
I wonder what is the history of "seq" and "seq.int"?
>From "help(seq)", one reads that "'seq.int' is an internal generic
which can be much faster but has a few restrictions". And indeed,
"seq.int(1,99,by=2)" is over 40 times faster than "seq(1,99,by=2)" in
a quick test I just did. This is not
Luke - Thanks for your comments on the speed patches I submitted.
I'm glad you like patch-transpose, patch-for, patch-evalList, and
patch-vec-arith. I'll be interested to hear what you or other people
think about patch-dollar and patch-vec-subset after you've had more
time to consider them. (Re
I see that some of the speed patches that I posted have been
incorporated into the current development version (eg, my patch-for,
patch-evalList, and patch-vec-arith).
My patch for speeding up x^2 has been addressed in an inadvisable way,
however. This was a simple addition of four lines of code
I found a bug in one of the fourteen speed patches I posted, namely in
patch-vec-subset. I've fixed this (I now see one does need to
duplicate index vectors sometimes, though one can avoid it most of the
time). I also split this patch in two, since it really has two
different and independent parts
though,
including information on how much they speed up typical programs, on
various machines.
Radford Neal
---
These patches to the R source for improving speed were written by
Radford M. Neal, Sept. 2010.
See the README file for
> I've appended below the new version of the modified part of the
> do_transpose function in src/main/array.c.
A quick correction... A "break" went missing from the case INTSXP:
section of the code I posted. Corrected version below
nd divides do not
take constant time, but are faster when, for instance, dividing by 1.)
I've appended below the new version of the modified part of the
do_transpose function in src/main/array.c.
Radford Neal
--
ed here only because the
procedures for getting values from variables check if NAMED is 0, and
if so fix it up to being 1, which is the minimum that it ought to be
for a value that's stored in a variable.
Is my understanding of this correct? Or have I missed something?
Radford Neal
> On Aug 23, 2010, at 7:39 PM, Radford Neal wrote:
> > In particular, all matrix x vector and vector x matrix products will
> > in this version be done in the matprod routine, not the Fortran routine.
> > And this is the right thing to do, since the time for the ISNAN che
ally know how the bug report system is supposed to work.
Radford Neal
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
idn't change any R configuration options regarding compilation.
Below is my modified version of the matprod function in src/main/array.c.
Radford Neal
static void matprod(double *x, int nrx, int ncx,
f (!ISNAN(x[i]))
s += x[i];
}
} else {
for (i = 0; i < n; i++)
s += x[i];
if (n>0) updated = TRUE;
}
*value = s;
return(updated);
}
An entirely analogous improvement can be made to the "prod" function.
Radfo
sources via
subversion, but while I figure that out, and how to post up "diffs"
for changes, I'll put the revised evalList code below for anyone
interested...
Radford Neal
--
/* Used in eval and applyMethod (object
grams that aren't dominated by large
operations like multiplying big matrices by about 10%. The effect is
cumulative with the change I mentioned in my previous message about
avoiding extra CONS operations in evalList, for a total speedup of
about 15%.
Radford Neal
___
pect that an average improvement of maybe 15% is
possible, with some programs probably speeding up by a factor of two.
For now, though, I'll just give the revised versions of evalList and
evalListKeepMissing, below.
Radford Neal
--
98 matches
Mail list logo