[Rd] Array changing address unexpectedly

2017-11-12 Thread lille stor
Hi,

Given the following R code:

 library(pryr)

 data <- array(dim = c(5))

 for(x in 1:5)
 {
  data[x] <- as.integer(x * 2)
 }

 add = address(data) # save address of "data"

 for(x in 1:5)
 {
  data[x] <- as.integer(0)
 }

 if (add == address(data))
 {
print("Address did not change")
 }
 else
 {
print("Address changed")
 }

If one runs this code, message "Address changed" is printed. However, if one 
comments line "data[x] <- as.integer(0)" the address of "data" does not change 
and message "Address did not change" is printed instead. Why? The datatype of 
the array should not change with this line and hence no need for R to convert 
the array to a different type (and have the array's address changing in the 
process).
 
Thank you!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Array changing address unexpectedly

2017-11-12 Thread David Winsemius

> On Nov 12, 2017, at 8:47 AM, lille stor  wrote:
> 
> Hi,
> 
> Given the following R code:
> 
> library(pryr)
> 
> data <- array(dim = c(5))
> 
> for(x in 1:5)
> {
>  data[x] <- as.integer(x * 2)
> }
> 
> add = address(data) # save address of "data"
> 
> for(x in 1:5)
> {
>  data[x] <- as.integer(0)
> }
> 
> if (add == address(data))
> {
>print("Address did not change")
> }
> else
> {
>print("Address changed")
> }
> 
> If one runs this code, message "Address changed" is printed. However, if one 
> comments line "data[x] <- as.integer(0)" the address of "data" does not 
> change and message "Address did not change" is printed instead. Why? The 
> datatype of the array should not change with this line and hence no need for 
> R to convert the array to a different type (and have the array's address 
> changing in the process).

I'm guessing you didn't take note of the error message:

> else
Error: unexpected 'else' in "else"

It's always good practice to investigate errors. The else function needs to 
come immediately after the "{".

Here's a more complete test of what I take to be your question:

library(pryr)

data <- array(dim = c(5))
add = address(data)
for(x in 1:5)
{
 data[x] <- as.integer(x * 2)
}
 if (add == address(data))
{
   print("Address did not change")
} else {
   print("Address changed")
}


data <- array(dim = c(5))   # reset
add = address(data)
for(x in 1:5)
{
 data[x] <- as.integer(0)
}

if (add == address(data))
{
   print("Address did not change")
} else {
   print("Address changed")
}

# changes in both situations.


> 
> Thank you!
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   
-Gehm's Corollary to Clarke's Third Law

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Array changing address unexpectedly

2017-11-12 Thread lille stor
Hi David,
 
Thanks for the correction concerning the "else" issue.
 
Taking your code and removing some lines (to increase readability):
 

library(pryr)
 
data <- array(dim = c(5))
for(x in 1:5)
{
   data[x] <- as.integer(x * 2)
}
 
#print(data)
 
add = address(data)
for(x in 1:5)
{
   data[x] <- as.integer(0)
}
 
if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}
 
 
If one runs this code everything works as expected, i.e. message "Address did 
not change" is printed. However, if one uncomments line "#print(data)", message 
"Address is changed" is printed instead. Any idea why this happens as it is a 
bit counter-intuitive? Is it something to do with some kind of lazy-evaluation 
mechanism R has that makes the array to be filled-up only when needed (in this 
case, when printing it) thus changing the array's address?
 
Thank you once more!
 


 

Sent: Sunday, November 12, 2017 at 6:02 PM
From: "David Winsemius" 
To: "lille stor" 
Cc: r-devel@r-project.org
Subject: Re: [Rd] Array changing address unexpectedly
> On Nov 12, 2017, at 8:47 AM, lille stor  wrote:
>
> Hi,
>
> Given the following R code:
>
> library(pryr)
>
> data <- array(dim = c(5))
>
> for(x in 1:5)
> {
> data[x] <- as.integer(x * 2)
> }
>
> add = address(data) # save address of "data"
>
> for(x in 1:5)
> {
> data[x] <- as.integer(0)
> }
>
> if (add == address(data))
> {
> print("Address did not change")
> }
> else
> {
> print("Address changed")
> }
>
> If one runs this code, message "Address changed" is printed. However, if one 
> comments line "data[x] <- as.integer(0)" the address of "data" does not 
> change and message "Address did not change" is printed instead. Why? The 
> datatype of the array should not change with this line and hence no need for 
> R to convert the array to a different type (and have the array's address 
> changing in the process).

I'm guessing you didn't take note of the error message:

> else
Error: unexpected 'else' in " else"

It's always good practice to investigate errors. The else function needs to 
come immediately after the "{".

Here's a more complete test of what I take to be your question:

library(pryr)

data <- array(dim = c(5))
add = address(data)
for(x in 1:5)
{
data[x] <- as.integer(x * 2)
}
if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}


data <- array(dim = c(5)) # reset
add = address(data)
for(x in 1:5)
{
data[x] <- as.integer(0)
}

if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}

# changes in both situations.


>
> Thank you!
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's 
Corollary to Clarke's Third Law




 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Array changing address unexpectedly

2017-11-12 Thread luke-tierney

A simpler version with annotation:

library(pryr)

data <- 1:5  # new allocation, referenced from one variable

add <- address(data)

data[1] <- 6L# only one reference; don't need to duplicate

add == address(data) # TRUE

print(data)  # inside the print call the local variable also references
 # the object, so there are two references

data[1] <- 7L# there have been two references to the object, so need
 # to duplicate

add == address(data) # FALSE

R objects are supposed to be immutable, so conceptually
assignments create a new object. If R is sure it is safe to do
so, it will avoid making a copy and re-use the original
object. In the first assignment there is only one reference to
the object, the variable data, and it is safe to re-use the
object, so the address of the new value of `data` is the same as
the address of the old value. It would not be safe to do this if
the assignment happened inside print(), since there are then two
references -- the original variable and the local variable for
the argument. R currently doesn't have enough information to know
that there is only one reference after the call to print()
returns, so it has to be safe and make a copy, and the new value
has a different address than the old one. R may be able to be
less conservative in the future, but for now it cannot.

[As an exercise you can try to work out why the calls to
address() don't play a role in this.]

As a rule you shouldn't need to worry about addresses, and R might
make changes in the future that would: cause the address in this
version of the example to change on the first assignments; cause the
address on the second assignment to not change; cause the information
provided by pryr to be inaccurate or misleading; result in the use of
a tool like pryr::address to change the internal structure of the
object.

Best,

luke

On Sun, 12 Nov 2017, lille stor wrote:


Hi David,

Thanks for the correction concerning the "else" issue.

Taking your code and removing some lines (to increase readability):


library(pryr)

data <- array(dim = c(5))
for(x in 1:5)
{
   data[x] <- as.integer(x * 2)
}

#print(data)

add = address(data)
for(x in 1:5)
{
   data[x] <- as.integer(0)
}

if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}


If one runs this code everything works as expected, i.e. message "Address did not change" is 
printed. However, if one uncomments line "#print(data)", message "Address is changed" is 
printed instead. Any idea why this happens as it is a bit counter-intuitive? Is it something to do with some 
kind of lazy-evaluation mechanism R has that makes the array to be filled-up only when needed (in this case, 
when printing it) thus changing the array's address?

Thank you once more!





Sent: Sunday, November 12, 2017 at 6:02 PM
From: "David Winsemius" 
To: "lille stor" 
Cc: r-devel@r-project.org
Subject: Re: [Rd] Array changing address unexpectedly

On Nov 12, 2017, at 8:47 AM, lille stor  wrote:

Hi,

Given the following R code:

library(pryr)

data <- array(dim = c(5))

for(x in 1:5)
{
data[x] <- as.integer(x * 2)
}

add = address(data) # save address of "data"

for(x in 1:5)
{
data[x] <- as.integer(0)
}

if (add == address(data))
{
print("Address did not change")
}
else
{
print("Address changed")
}

If one runs this code, message "Address changed" is printed. However, if one comments line "data[x] <- 
as.integer(0)" the address of "data" does not change and message "Address did not change" is 
printed instead. Why? The datatype of the array should not change with this line and hence no need for R to convert the 
array to a different type (and have the array's address changing in the process).


I'm guessing you didn't take note of the error message:


else

Error: unexpected 'else' in " else"

It's always good practice to investigate errors. The else function needs to come 
immediately after the "{".

Here's a more complete test of what I take to be your question:

library(pryr)

data <- array(dim = c(5))
add = address(data)
for(x in 1:5)
{
data[x] <- as.integer(x * 2)
}
if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}


data <- array(dim = c(5)) # reset
add = address(data)
for(x in 1:5)
{
data[x] <- as.integer(0)
}

if (add == address(data))
{
print("Address did not change")
} else {
print("Address changed")
}

# changes in both situations.




Thank you!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's 
Corollary to Clarke's Third Law






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of 

Re: [Rd] special latin1 do not print as glyphs in current devel on windows

2017-11-12 Thread Patrick Perry
Just following up on this since the associated bug report just got 
closed (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17329 ) 
because my original bug report was incomplete, and did not include 
sessionInfo() or LC_CTYPE.

Admittedly, my original bug report was a little confused. I have since 
gained a better understanding of the issue. I want to confirm that this 
(a) is a real bug in base, R, not RStudio (b) provide more context. It 
looks like the real issue is that R marks native strings as "latin1" 
when the declared character locale is Windows-1252. This causes problems 
when converting to UTF-8. See Daniel Possenriede's email below for much 
more detail, including his sessionInfo() and a reproducible example .

The development version of the `stringi` package and the CRAN version of 
the `utf8` package both have workarounds for this bug. (See, e.g. 
https://github.com/gagolews/stringi/issues/287 and the links to the 
related issues).


Patrick

> Patrick Perry 
> September 14, 2017 at 7:47 AM
> This particular issue has a simple fix. Currently, the 
> "R_check_locale" function includes the following code starting at line 
> 244 in src/main/platform.c:
>
> #ifdef Win32
> {
> char *ctype = setlocale(LC_CTYPE, NULL), *p;
> p = strrchr(ctype, '.');
> if (p && isdigit(p[1])) localeCP = atoi(p+1); else localeCP = 0;
> /* Not 100% correct, but CP1252 is a superset */
> known_to_be_latin1 = latin1locale = (localeCP == 1252);
> }
> #endif
>
> The "1252" should be "28591"; see 
> https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
>  
> .
>
>
> Daniel Possenriede 
> September 14, 2017 at 3:40 AM
> This is a follow-up on my initial posts regarding character encodings 
> on Windows 
> (https://stat.ethz.ch/pipermail/r-devel/2017-August/074728.html) and 
> Patrick Perry's reply 
> (https://stat.ethz.ch/pipermail/r-devel/2017-August/074830.html) in 
> particular (thank you for the links and the bug report!). My initial 
> posts were quite chaotic (and partly wrong), so I am trying to clear 
> things up a bit.
>
> Actually, the title of my original message "special latin1 
> [characters] do not print as glyphs in current devel on windows" is 
> already wrong, because the problem exists with characters with CP1252 
> encoding in the 80-9F (hex) range. Like Brian Ripley rightfully 
> pointed out, latin1 != CP1252. The characters in the 80-9F code point 
> range are not even part of ISO/IEC 8859-1 a.k.a. latin1, see for 
> example https://en.wikipedia.org/wiki/Windows-1252. R treats them as 
> if they were, however, and that is exactly the problem, IMHO.
>
> Let me show you what I mean. (All output from R 3.5 r73238, see 
> sessionInfo at the end)
>
> > Sys.getlocale("LC_CTYPE")
> [1] "German_Germany.1252"
> > x <- c("€", "ž", "š", "ü")
> > sapply(x, charToRaw)
> \u0080 \u009e \u009a  ü
> 80 9e 9a fc
>
> "€", "ž", "š" serve as examples in the 80-9F range of CP1252. I also 
> show the "ü" just as an example of a non-ASCII character outside that 
> range (and because Patrick Perry used it in his bug report which might 
> be a (slightly) different problem, but I will get to that later.)
>
> > print(x)
> [1] "\u0080" "\u009e" "\u009a" "ü"
>
> "€", "ž", and "š" are printed as (incorrect) unicode escapes. "€" for 
> example should be \u20ac not \u0080.
> (In R 3.4.1, print(x) shows the glyphs and not the unicode escapes. 
> Apparently, as of v3.5, print() calls enc2utf8() (or its equivalent in 
> C (translateCharUTF8?))?)
>
> > print("\u20ac")
> [1] "€"
>
> The characters in x are marked as "latin1".
>
> > Encoding(x)
> [1] "latin1" "latin1" "latin1" "latin1"
>
> Looking at the CP1252 table (e.g. link above), we see that this is 
> incorrect for "€", "ž", and "š", which simply do not exist in latin1.
>
> As per the documentation, "enc2utf8 convert[s] elements of character 
> vectors to [...] UTF-8 [...], taking any marked encoding into 
> account." Since the marked encoding is wrong, so is the output of 
> enc2utf8().
>
> > enc2utf8(x)
> [1] "\u0080" "\u009e" "\u009a" "ü"
>
> Now, when we set the encoding to "unknown" everything works fine.
>
> > x_un <- x
> > Encoding(x_un) <- "unknown"
> > print(x_un)
> [1] "€" "ž" "š" "ü"
> > (x_un2utf8 <- enc2utf8(x_un))
> [1] "€" "ž" "š" "ü"
>
> Long story short: The characters in the 80 to 9F range should not be 
> marked as "latin1" on CP1252 locales, IMHO.
>
> As a side-note: the output of localeToCharset() is also problematic, 
> since ISO8859-1 != CP1252.
>
> > localeToCharset()
> [1] "ISO8859-1"
>
> Finally on to Patrick Perry's bug report 
> (https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17329): 'On 
> Windows, enc2utf8("ü") yields "|".'
>
> Unfortunately, I cannot reproduce this with the CP1252 locale, as can 
> be seen above. Probably, because the bug applies to the C locale 
> (sorry if this is somewhere apparent in the bug report and I missed it