[Rd] Possible bug: file.exists() always returns TRUE for prn.us.txt

2018-03-24 Thread Joris Meys
Dear all,

while preparing some exercises I came across some highly surprising
behaviour of file.exists(). The specific value "prn.us.txt" always returns
TRUE, even though that file is nowhere to be found on my system.

In a fresh R session 3.4.4 installed on Windows 10:

> grep("prn.us.txt", dir(recursive = TRUE))
integer(0)
> file.exists("prn.us.txt")
[1] TRUE
> file.exists("pnr.us.txt")
[1] FALSE
> file.exists("prn\\.us\\.txt")
[1] FALSE

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4

This also happens in 3.4.3, 3.4.2 and 3.4.1 . It is confirmed by Roman
Lustrik on his system as well :
https://twitter.com/romunov/status/977486929380995072

I suspect this is a bug, or I must be missing something completely.

Cheers
Joris

-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug: file.exists() always returns TRUE for prn.us.txt

2018-03-24 Thread Duncan Murdoch

On 24/03/2018 6:16 AM, Joris Meys wrote:

Dear all,

while preparing some exercises I came across some highly surprising
behaviour of file.exists(). The specific value "prn.us.txt" always returns
TRUE, even though that file is nowhere to be found on my system.


That's a Windows "bug", not an R bug.  Any name starting "prn" (upper or 
lowercase), followed by an extension (i.e. a dot and characters) is 
taken to be the DOS printer device.  According to Writing R Extensions, 
names starting with "‘con’, ‘prn’, ‘aux’, ‘clock$’, ‘nul’, ‘com1’ to 
‘com9’, and ‘lpt1’ to ‘lpt9' (possibly followed by extensions) are also 
bad.  You can Google "PRN filename in Windows" to find lots of people 
confused by this.  One page I get is


https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx

but there's no guarantee that will work five minutes from now.

Duncan Murdoch



In a fresh R session 3.4.4 installed on Windows 10:


grep("prn.us.txt", dir(recursive = TRUE))

integer(0)

file.exists("prn.us.txt")

[1] TRUE

file.exists("pnr.us.txt")

[1] FALSE

file.exists("prn\\.us\\.txt")

[1] FALSE


sessionInfo()

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4

This also happens in 3.4.3, 3.4.2 and 3.4.1 . It is confirmed by Roman
Lustrik on his system as well :
https://twitter.com/romunov/status/977486929380995072

I suspect this is a bug, or I must be missing something completely.

Cheers
Joris



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug: file.exists() always returns TRUE for prn.us.txt

2018-03-24 Thread Joris Meys
Thank you. I was just replying my own message with the same information.
Sorry for not doing the research properly before filing.

Cheers
Joris

On Sat, Mar 24, 2018 at 11:36 AM, Duncan Murdoch 
wrote:

> On 24/03/2018 6:16 AM, Joris Meys wrote:
>
>> Dear all,
>>
>> while preparing some exercises I came across some highly surprising
>> behaviour of file.exists(). The specific value "prn.us.txt" always returns
>> TRUE, even though that file is nowhere to be found on my system.
>>
>
> That's a Windows "bug", not an R bug.  Any name starting "prn" (upper or
> lowercase), followed by an extension (i.e. a dot and characters) is taken
> to be the DOS printer device.  According to Writing R Extensions, names
> starting with "‘con’, ‘prn’, ‘aux’, ‘clock$’, ‘nul’, ‘com1’ to ‘com9’, and
> ‘lpt1’ to ‘lpt9' (possibly followed by extensions) are also bad.  You can
> Google "PRN filename in Windows" to find lots of people confused by this.
> One page I get is
>
> https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
>
> but there's no guarantee that will work five minutes from now.
>
> Duncan Murdoch
>
>
>
>> In a fresh R session 3.4.4 installed on Windows 10:
>>
>> grep("prn.us.txt", dir(recursive = TRUE))
>>>
>> integer(0)
>>
>>> file.exists("prn.us.txt")
>>>
>> [1] TRUE
>>
>>> file.exists("pnr.us.txt")
>>>
>> [1] FALSE
>>
>>> file.exists("prn\\.us\\.txt")
>>>
>> [1] FALSE
>>
>> sessionInfo()
>>>
>> R version 3.4.4 (2018-03-15)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 16299)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252
>> [2] LC_CTYPE=English_United Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.4.4
>>
>> This also happens in 3.4.3, 3.4.2 and 3.4.1 . It is confirmed by Roman
>> Lustrik on his system as well :
>> https://twitter.com/romunov/status/977486929380995072
>>
>> I suspect this is a bug, or I must be missing something completely.
>>
>> Cheers
>> Joris
>>
>>
>


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible bug: file.exists() always returns TRUE for prn.us.txt

2018-03-24 Thread Joris Meys
Sorry for coming back at this, but would it make sense to have
file.exists() return FALSE when it only finds one of these device names?
Using backslashes to escape the dots makes file.exists() return the correct
result. I got caught by this when I created file names based on a set of
stock market tickers, so I can imagine this could happen to other people
too.

Cheers
Joris

On Sat, Mar 24, 2018 at 11:38 AM, Joris Meys  wrote:

> Thank you. I was just replying my own message with the same information.
> Sorry for not doing the research properly before filing.
>
> Cheers
> Joris
>
> On Sat, Mar 24, 2018 at 11:36 AM, Duncan Murdoch  > wrote:
>
>> On 24/03/2018 6:16 AM, Joris Meys wrote:
>>
>>> Dear all,
>>>
>>> while preparing some exercises I came across some highly surprising
>>> behaviour of file.exists(). The specific value "prn.us.txt" always
>>> returns
>>> TRUE, even though that file is nowhere to be found on my system.
>>>
>>
>> That's a Windows "bug", not an R bug.  Any name starting "prn" (upper or
>> lowercase), followed by an extension (i.e. a dot and characters) is taken
>> to be the DOS printer device.  According to Writing R Extensions, names
>> starting with "‘con’, ‘prn’, ‘aux’, ‘clock$’, ‘nul’, ‘com1’ to ‘com9’, and
>> ‘lpt1’ to ‘lpt9' (possibly followed by extensions) are also bad.  You can
>> Google "PRN filename in Windows" to find lots of people confused by this.
>> One page I get is
>>
>> https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx
>>
>> but there's no guarantee that will work five minutes from now.
>>
>> Duncan Murdoch
>>
>>
>>
>>> In a fresh R session 3.4.4 installed on Windows 10:
>>>
>>> grep("prn.us.txt", dir(recursive = TRUE))

>>> integer(0)
>>>
 file.exists("prn.us.txt")

>>> [1] TRUE
>>>
 file.exists("pnr.us.txt")

>>> [1] FALSE
>>>
 file.exists("prn\\.us\\.txt")

>>> [1] FALSE
>>>
>>> sessionInfo()

>>> R version 3.4.4 (2018-03-15)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> Running under: Windows 10 x64 (build 16299)
>>>
>>> Matrix products: default
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252
>>> [2] LC_CTYPE=English_United Kingdom.1252
>>> [3] LC_MONETARY=English_United Kingdom.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.4.4
>>>
>>> This also happens in 3.4.3, 3.4.2 and 3.4.1 . It is confirmed by Roman
>>> Lustrik on his system as well :
>>> https://twitter.com/romunov/status/977486929380995072
>>>
>>> I suspect this is a bug, or I must be missing something completely.
>>>
>>> Cheers
>>> Joris
>>>
>>>
>>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> 
>
> ---
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>



-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Function 'factor' issues

2018-03-24 Thread Martin Maechler
> Suharto Anggono Suharto Anggono via R-devel 
> on Sat, 24 Mar 2018 00:52:02 + writes:

> I am trying once again.

> By just changing
> f <- match(xlevs[f], nlevs)
> to
> f <- match(xlevs, nlevs)[f]
> , function 'factor' in R devel could be made more consistent and 
back-compatible. Why not picking it?

Thank you for persevering,

I'll have a hard look...  You have been right before ;-)
So I will check this small change for both  `factor`  and `levels<-.factor`

Martin

> 
> On Sat, 25/11/17, Suharto Anggono Suharto Anggono 
 wrote:

> Subject: Re: [Rd] Function 'factor' issues
> To: r-devel@r-project.org
> Date: Saturday, 25 November, 2017, 6:03 PM

>> From commits to R devel, I saw attempts to speed up subsetting and 
'match', and to cache results of conversion of small nonnegative integers to 
character string. That's good.

> I am sorry for pushing, still.

> Is the partial new behavior of function 'factor' with respect to NA 
really worthy?

> match(xlevs, nlevs)[f]  looks nice, too.

> - Using
> f <- match(xlevs, nlevs)[f]
> instead of
> f <- match(xlevs[f], nlevs)
> for remapping
> - Remapping only if length(nlevs) differs from length(xlevs)
> Applying changes similar to above to function 'levels<-.factor' will not 
change 'levels<-.factor' result at all. So, the corresponding part of functions 
'factor' and 'levels<-.factor' can be kept in sync.

> 
> On Sun, 22/10/17, Suharto Anggono Suharto Anggono 
 wrote:

> Subject: Re: [Rd] Function 'factor' issues
> To: r-devel@r-project.org
> Date: Sunday, 22 October, 2017, 6:43 AM

> My idea (like in 
https://bugs.r-project.org/bugzilla/attachment.cgi?id=1540 ):
> - For remapping, use
> f <- match(xlevs, nlevs)[f]
> instead of
> f <- match(xlevs[f], nlevs)
> (I have mentioned it).
> - Remap only if length(nlevs) differs from length(xlevs) .


> [snip]

> 
> On Wed, 18/10/17, Martin Maechler  wrote:

> Subject: Re: [Rd] Function 'factor' issues

> Cc: r-devel@r-project.org
> Date: Wednesday, 18 October, 2017, 11:54 PM

> Suharto Anggono Suharto Anggono via R-devel 
>>     on Sun, 15 Oct 2017 16:03:48 + writes:


>     > In R devel, function 'factor' has been changed, allowing and 
merging duplicated 'labels'.

> Indeed.  That had been asked for and discussed a bit on this
> list from June 14 to June 23, starting at
>   https://stat.ethz.ch/pipermail/r-devel/2017-June/074451.html

>     > Issue 1: Handling of specified 'labels' without duplicates is 
slower than before.
>     > Example:
>     > x <- rep(1:26, 4)
>     > system.time(factor(x, levels=1:26, labels=letters))

>     > Function 'factor' is already rather slow because of conversion to 
character. Please don't add slowdown.

> Indeed, I doo see a ~ 20%  performance loss for the example
> above, and I may get to look into this.
> However, in R-devel there have been important internal
> changes (ALTREP additions) some of which are currently giving
> some performance losses in some cases (but they have the
> potential to give big performance _gains_ e.g. for simple
> indexing into large vectors which may apply here !).
> For factor(), these C level "ALTREP" changes may not be the reason at
> all for the slow down;
> I may find time to investigate further.

> {{ For the ALTREP-change slowdowns I've noticed in some
>   indexing/subset operations, we'll definitely have time to look into
>   before R-devel is going to be released next spring... and as mentioned,
>   these operations may even become considerably faster *thanks*
>   to ALTREP ... }}

>     > Issue 2: While default 'labels' is 'levels', not specifying 
'labels' may be different from specifying 'labels' to be the same as 'levels'.

>     > Example 1:
>     > as.integer(factor(c(NA,2,3), levels = c(2, NA), exclude = NULL))
>     > is different from
>     > as.integer(factor(c(NA,2,3), levels = c(2, NA), labels = c(2, NA), 
exclude = NULL))

> You are right.  But this is not so exceptional and part of the new 
feature of
> 'labels' allowing to "fix up" things in such cases.  While it
> would be nice if this was not the case the same phenomenon
> happens in other functions as well because of lazy evaluation.
> I think I had noticed that already and at the time found
> "not easy" to work around.
> (There are many aspects about changing such important base functions:
> 1. not breaking back compatibility ((unless in rare
>     border cases, where we are sure it's worth))
> 2. Keeping code relatively transparent
> 3. Keep the semantics "simple" to document and as intuiti

[Rd] aggregate() naming -- bug or feature

2018-03-24 Thread lmo via R-devel
Be aware that the object that aggregate returns with bar() is more complicated 
than you think.
str(aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar))
'data.frame':    3 obs. of  2 variables:
 $ Group.1: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
 $ x  : num [1:3, 1:2] 5.006 5.936 6.588 0.352 0.516 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "mean" "sd"
So you get a two column data.frame whose second column is a matrix.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Integrate erros on certain functions

2018-03-24 Thread John P. Nolan
Dear John,

Two issues.  First, the default action is stop.on.error=TRUE, so anytime the 
integrate function can determine an error, it will stop.  It doesn't detect an 
error, so no error is produced (whether you set stop.on.error=TRUE or FALSE).

The second issue is the real problem:  what automatic numerical integration can 
do or not do.  When the upper bound is Inf, integrate (which is based on the 
QUADPACK fortran code) does a change of variable to make the region of 
integration be a finite interval, then tries to evaluate that transformed 
integral.  When the upper bound is a large finite number, integrate tries to 
evaluate the integral directly.  In this case, the integrand is evaluated at 
multiple x values in the interval [0,13000].   Those x values are large, and 
the resulting function values are very near 0.  You can verify this by putting 
some trace statements into your integrand function, e.g. 

> f1 <- function( x ) { y <- exp(-x); print( rbind(x,y) ); return(y) }
> integrate( f1, lower = 0, upper =13000)

The quadrature rule "sees" an integrand near 0 and returns a value for the 
integral near 0.  It does not detect an error, so it does not report anything 
to you.  It does not know how the integrand function behaves on regions where 
it does not evaluate it.  This is a well-known problem in numerical 
integration: there is no way the integrate function can know what region to 
focus on in a general problem.  Using an upper bound=Inf does not guarantee 
that you will get the correct value, but sometimes it works.  

Hope this helps. 

John

……..
John P. Nolan
Math/Stat Dept., American University
106J Myers Hall, 4400 Massachusetts Ave, NW, Washington, DC 20016-8050
Phone: 202-885-3140   E-mail:  jpno...@american.edu
Web:   http://fs2.american.edu/jpnolan/www/





-Original Message-
From: R-devel  On Behalf Of John Muschelli
Sent: Friday, March 23, 2018 6:52 PM
To: r-devel@r-project.org
Subject: [Rd] Integrate erros on certain functions

In the help for ?integrate:

>When integrating over infinite intervals do so explicitly, rather than
just using a large number as the endpoint. This increases the chance of a 
correct answer – any function whose integral over an infinite interval is 
finite must be near zero for most of that interval.

I understand that and there are examples such as:

## a slowly-convergent integral
integrand <- function(x) {1/((x+1)*sqrt(x))} integrate(integrand, lower = 0, 
upper = Inf)

## don't do this if you really want the integral from 0 to Inf 
integrate(integrand, lower = 0, upper = 100, stop.on.error = FALSE) #> 
failed with message ‘the integral is probably divergent’

which gives an error message if stop.on.error = FALSE. But what happens on 
something like the function below:
integrate(function(x) exp(-x), lower = 0, upper =Inf) #> 1 with absolute error 
< 5.7e-05
integrate(function(x) exp(-x), lower = 0, upper =13000) #> 2.819306e-05 with 
absolute error < 5.6e-05

*integrate(function(x) exp(-x), lower = 0, upper =13000, stop.on.error = 
FALSE)#> 2.819306e-05 with absolute error < 5.6e-05*

I'm not sure this is a bug or misuse of the function, but I would assume the 
last integrate to give an error if stop.on.error = FALSE.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=U0G0XJAMhEk_X0GAGzCL7Q&r=7rQvU8hscCTWlvO-F5wI2-2eTiW40XI5qUKda0AnbG0&m=iA2KskSHO_cMznVT31Amx5mIJ0-cQurEM9ItQz-WwvU&s=_A2zZDw5gLetKaZqbPZMpJFqO8B1-kPT2T__T73CM-I&e=
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel