[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Kirill Müller

Hi


I think the following behavior is a regression from R 3.2.5:

> match(iconv(  c("\u00f8", "A"), from = "UTF8", to  = "latin1" ), 
"\u00f8")

[1]  1 NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8")
[1] NA
> match(iconv(  c("\u00f8"), from = "UTF8", to  = "latin1" ), "\u00f8", 
incomparables = NA)

[1] 1

I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10.

The specific behavior makes me think this is related to the following 
NEWS entry:


match(x, table) is faster (sometimes by an order of magnitude) when x is 
of length one and incomparables is unchanged (PR#16491).



Best regards

Kirill

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Peter Haverty
Dear Kirill,

You are correct, that is a new bug introduced in PR16491. The appropriate
fix and regression tests have been added via PR16885, which has been merged
into trunk. I believe that means the fix will be released with R 3.3.1.

I checked your example and the second "match" now properly returns 1 with
the patched code.

Please have a look at
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885
http://developer.r-project.org/blosxom.cgi/R-devel/NEWS

Thank you for your report. I hope the benefits of this speedup will
eventually outweigh this unfortunate bug in my PR16491.

Regards,

Pete


Peter M. Haverty, Ph.D.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R process killed when allocating too large matrix (Mac OS X)

2016-05-09 Thread Jeroen Ooms
On 05/05/2016 10:11, Uwe Ligges wrote:
> Actually this also happens under Linux and I had my R processes killed
> more than once (and much worse also other processes so that we had to
> reboot a server, essentially).

I found that setting RLIMIT_AS [1] works very well on Linux. But this
requires that you cap memory to some fixed value.

> library(RAppArmor)
> rlimit_as(1e9)
> rnorm(1e9)
Error: cannot allocate vector of size 7.5 Gb

The RAppArmor package has many other utilities to protect your server
such from a mis-behaving process such as limiting cpu time
(RLIMIT_CPU), fork bombs (RLIMIT_NPROC) and file sizes (RLIMIT_FSIZE).

[1] http://linux.die.net/man/2/getrlimit

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings

2016-05-09 Thread Martin Maechler
> Peter Haverty 
> on Mon, 9 May 2016 09:47:48 -0700 writes:

> Dear Kirill,
> You are correct, that is a new bug introduced in PR16491. The appropriate
> fix and regression tests have been added via PR16885, which has been 
merged
> into trunk. I believe that means the fix will be released with R 3.3.1.

Yes, definitely.
Kirill, as seem to use code which does trigger the bug, you may want to
switch using 'R-patched', i.e., 

  > R.version.string
  [1] "R version 3.3.0 Patched (2016-05-09 r70591)"

   ( where the subversion revision must be >= 70591 )

> I checked your example and the second "match" now properly returns 1 with
> the patched code.

> Please have a look at
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885
> http://developer.r-project.org/blosxom.cgi/R-devel/NEWS

> Thank you for your report. I hope the benefits of this speedup will
> eventually outweigh this unfortunate bug in my PR16491.

I'm pretty sure that your hope will be fulfilled.

> Regards,
> Pete
> 
> Peter M. Haverty, Ph.D.

Martin Maechler, ETH Zurich

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Is it possible to increase MAX_NUM_DLLS in future R releases?

2016-05-09 Thread Martin Maechler
> Qin Zhu 
> on Fri, 6 May 2016 11:33:37 -0400 writes:

> Thanks for all your great answers. 
> The app I’m working on is indeed an exploratory data analysis tool for 
gene expression, which requires a bunch of bioconductor packages. 

> I guess for now, my best solution is to divide my app into modules and 
load/unload packages as the user switch from one module to another.

> This brought me another question: it seems that unload package with the 
detach/unloadNamespace function does not unload the DLLs, or in the case of the 
"SCDE" package, not all dependent DLLs:

>> length(getLoadedDLLs())
> [1] 9
>> requireNamespace("scde")
> Loading required namespace: scde
>> length(getLoadedDLLs())
> [1] 34
>> unloadNamespace("scde")
> now 
dyn.unload("/Library/Frameworks/R.framework/Versions/3.3/Resources/library/scde/libs/scde.so")
 ...
>> length(getLoadedDLLs())
> [1] 33

> Does that mean I should use dyn.unload to unload whatever I think is 
associated with that package when the user’s done using it? I’m a little 
nervous about this because this seems to be OS dependent and previous versions 
of my app are running on both windows and macs. 

Hmm, I thought that  dyn.unload() would typically work on all
platforms, but did not research the question now, and am happy
to learn more by being corrected.

Even if we increase MAX_NUM_DLL in the future, a considerable
portion your app's will not use that future version of R yet,
and so you should try to "fight" the problem now.

> Any suggestions would be appreciated, and I’d appreciate if the 
MAX_NUM_DLLS can be increased.

> Thanks,
> Qin


>> On May 4, 2016, at 9:17 AM, Martin Morgan 
 wrote:
>> 
>> 
>> 
>> On 05/04/2016 05:15 AM, Prof Brian Ripley wrote:
>>> On 04/05/2016 08:44, Martin Maechler wrote:
> Qin Zhu 
> on Mon, 2 May 2016 16:19:44 -0400 writes:
 
 > Hi,
 > I’m working on a Shiny app for statistical analysis. I ran into
 this "maximal number of DLLs reached" issue recently because my app
 requires importing many other packages.
 
 > I’ve posted my question on stackoverflow
 
(http://stackoverflow.com/questions/36974206/r-maximal-number-of-dlls-reached
 
).
 
 
 > I’m just wondering is there any reason to set the maximal
 number of DLLs to be 100, and is there any plan to increase it/not
 hardcoding it in the future? It seems many people are also running
 into this problem. I know I can work around this problem by modifying
 the source, but since my package is going to be used by other people,
 I don’t think this is a feasible solution.
 
 > Any suggestions would be appreciated. Thanks!
 > Qin
 
 Increasing that number is of course "possible"... but it also
 costs a bit (adding to the fixed memory footprint of R).
>>> 
>>> And not only that.  At the time this was done (and it was once 50) the
>>> main cost was searching DLLs for symbols.  That is still an issue, and
>>> few packages exclude their DLL from symbol search so if symbols have to
>>> searched for a lot of DLLs will be searched.  (Registering all the
>>> symbols needed in a package avoids a search, and nowadays by default
>>> searches from a namespace are restricted to that namespace.)
>>> 
>>> See
>>> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Registering-native-routines
>>> for some further details about the search mechanism.
>>> 
 I did not set that limit, but I'm pretty sure it was also meant
 as reminder for the useR to "clean up" a bit in her / his R
 session, i.e., not load package namespaces unnecessarily. I
 cannot yet imagine that you need > 100 packages | namespaces
 loaded in your R session. OTOH, some packages nowadays have a
 host of dependencies, so I agree that this at least may happen
 accidentally more frequently than in the past.
>>> 
>>> I am not convinced that it is needed.  The OP says he imports many
>>> packages, and I doubt that more than a few are required at any one time.
>>> Good practice is to load namespaces as required, using requireNamespace.
>> 
>> Extensive package dependencies in Bioconductor make it pretty easy to 
end up with dozen of packages attached or loaded. For instance
>> 
>> library(GenomicFeatures)
>> library(DESeq2)
>> 
>> > length(loadedNamespaces())
>> [1] 63
>> > length(getLoadedDLLs())
>> [1] 41
>> 
>> Qin's use case is a shiny app, presumably trying to provide relatively 
comprehensive access to a particular domain. Even if the app were to load / 
requireNamespace() (this r