Re: [Rd] xftrm is more than 100x slower for AsIs than for character vectors

2024-07-14 Thread Ivan Krylov via R-devel
В Fri, 12 Jul 2024 17:35:19 +0200
Hilmar Berger via R-devel  пишет:

> This can be finally traced to base::rank() (called from
> xtfrm.default), where I found that
> 
> "NB: rank is not itself generic but xtfrm is, and rank(xtfrm(x), )
> will have the desired result if there is a xtfrm method. Otherwise,
> rank will make use of ==, >, is.na and extraction methods for classed
> objects, possibly rather slowly. "

The problem is indeed that the vector reaches base::rank in both cases,
but since it has a class, the function has to construct and evaluate a
call to .gt every time it wants to compare two elements.

xtfrm.AsIs even tries to remove the 'AsIs' class before continuing the
method dispatch process:

>> if (length(cl <- class(x)) > 1) oldClass(x) <- cl[-1L]

It doesn't work in the (very contrived) case when 'AsIs' is not the
first class and it doesn't remove 'AsIs' as the only class (making
static int equal(...) take the slower branch). What's going to break if
we allow removing the class attribute altogether? This seems to speed
up xtfrm(I(x)) and survive LC_ALL=C.UTF-8 make check-devel:

Index: src/library/base/R/sort.R
===
--- src/library/base/R/sort.R   (revision 86895)
+++ src/library/base/R/sort.R   (working copy)
@@ -297,7 +297,8 @@
 
 xtfrm.AsIs <- function(x)
 {
-if(length(cl <- class(x)) > 1) oldClass(x) <- cl[-1L]
+cl <- oldClass(x)
+oldClass(x) <- cl[cl != 'AsIs']
 NextMethod("xtfrm")
 }
 

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] xftrm is more than 100x slower for AsIs than for character vectors

2024-07-14 Thread HB via R-devel
Dear Ivan, 

thanks for the confirmation and the proposed patch. 

I just wanted to add some notes regarding the relevance of this: base::merge 
using by.x=0 or by.y=0 (i.e. matching on row.names) will automatically add a 
column Row.names which is I(row.names(x)) to the corresponding input table 
(using I() since  revision 39026 to avoid conversion of character to factor). 
When this column is used for sorting (sort=TRUE by default in merge; should 
happen at least if all.x=T or all.y=T), this will result in slower execution. 

xtfrm.AsIs is unchanged since its addition in r50992 (likely unrelated to the 
former). 

So I guess that this just went unnoticed since it will not cause problems on 
small data frames.

Best regards

Hilmar
 
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R-patched on CRAN is R-4.3.3

2024-07-14 Thread Peter Langfelder
Hi all,

apologies if I missed something here. Just downloaded and compiled
R-patched from https://stat.ethz.ch/R/daily/ but it reports as R-4.3.3
(2024-04-09 r86895) -- "Angel Food Cake". The last dated R-patched is
from 2024-04-09, about 3 months old. Are R-patched not updated
anymore, am I looking at a wrong directory or even a wrong server? The
current R Installation and Administration manual
(https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Getting-patched-and-development-versions)
suggests that the current R-patched should be where I looked for it:

A patched version of the current release, ‘r-patched’, and the current
development version, ‘r-devel’, are available as daily tarballs and
via access to the R Subversion repository. (For the two weeks prior to
the release of a minor (4.x.0) version, ‘r-patched’ tarballs may refer
to beta/release candidates of the upcoming release, the patched
version of the current release being available via Subversion.)

The tarballs are available from https://stat.ethz.ch/R/daily/.
Download R-patched.tar.gz or R-devel.tar.gz (or the .tar.bz2 versions)
and unpack as described in the previous section. They are built in
exactly the same way as distributions of R releases.

Thanks,

Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel