Re: [Rd] duplicated factor labels.

2017-06-16 Thread Joris Meys
To extwnd on Martin 's explanation :

In factor(), levels are the unique input values and labels the unique
output values. So the function levels() actually displays the labels.

Cheers
Joris


On 15 Jun 2017 17:15, "Martin Maechler"  wrote:

> Paul Johnson 
> on Wed, 14 Jun 2017 19:00:11 -0500 writes:

> Dear R devel
> I've been wondering about this for a while. I am sorry to ask for your
> time, but can one of you help me understand this?

> This concerns duplicated labels, not levels, in the factor function.

> I think it is hard to understand that factor() fails, but levels()
> after does not

>> x <- 1:6
>> xlevels <- 1:6
>> xlabels <- c(1, NA, NA, 4, 4, 4)
>> y <- factor(x, levels = xlevels, labels = xlabels)
> Error in `levels<-`(`*tmp*`, value = if (nl == nL)
> as.character(labels) else paste0(labels,  :
> factor level [3] is duplicated
>> y <- factor(x, levels = xlevels)
>> levels(y) <- xlabels
>> y
> [1] 1  444
> Levels: 1 4

> If the latter use of levels() causes a good, expected result, couldn't
> factor(..., labels = xlabels) be made to the same thing?

I may misunderstand, but I think you are confusing 'labels' and 'levels'
here, (and you are not alone in this!) mostly because  R's
factor() function treats them as arguments in a way that can be
confusing.. (but I don't think we'd want to change that; it's
been documented and in use for  > 25 year (in S, S+, R).

Note that after the above,

> dput(y)
structure(c(1L, NA, NA, 2L, 2L, 2L), .Label = c("1", "4"), class = "factor")

and that of course _is_ a valid factor .. which you can easily
get directly via e.g.

> identical(y, factor(c(1,NA,NA,4,4,4)))
[1] TRUE

or also  via

> identical(y, factor(c("1",NA,NA,"4","4","4")))
[1] TRUE

I really don't see a need for a change of factor().
It should remain as simple as possible (but not simpler :-).

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'ordered' destroyed to 'factor'

2017-06-16 Thread Jens Oehlschlägel
Dear all,
 
I don't know if you consider this a bug or feature, but it breaks reasonable 
code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels 
are equal. Here is a simple example:

o <- ordered(letters)
o[[1]]
lapply(o, min)[[1]]  # ordered factor
unlist(lapply(o, min))[[1]]  # no longer ordered
sapply(o, min)[[1]]  # no longer ordered

Jens Oehlschlägel
 
 
P.S: The above examples are silly for simple reproduction. The current behavior 
broke my use-case which had a structure like this
 
# have some data
x <- 1:20
# apply some function to each element
somefunc <- function(x){
  # do something and return an ordinal level
  sample(o, 1)
}
x <- sapply(x, somefunc)
# get minimum result
min(x)
# Error in Summary.factor(c(2L, 26L), na.rm = FALSE) :
#   ‘min’ not meaningful for factors
 
 
> version
   _   
platform   x86_64-pc-linux-gnu     
arch   x86_64  
os linux-gnu   
system x86_64, linux-gnu   
status     
major  3   
minor  4.0     
year   2017    
month  04  
day    21  
svn rev    72570   
language   R   
version.string R version 3.4.0 (2017-04-21)
nickname   You Stupid Darkness

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread Hadley Wickham
>> I don't think it is reasonable to change the parser this way. This is
>> currently valid R code:
>>
>> a <- "foo"
>> "bar"
>>
>> and with the new syntax, it is also valid, but with a different
>> meaning. Or you can even consider
>>
>> a <- "foo"
>> bar %>% func() %>% print()
>>
>> etc.
>>
>> I like the idea of string literals, but the C/C++ way clearly does not
>> work. The Python/Julia way might, i.e.:
>>
>> """this is a
>> multi-line
>> lineral"""
>
>
> This does look like a promising option; some more careful checking
> would be needed to make sure there aren't cases where currently
> working code would be broken.
>
> Another Python idea worth considering is the raw string notation
> r"xyx" that does not process escape sequences -- this would make
> writing things like regular expressions easier.

If this is something you would consider, we'd be happy to put together
a patch for review.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'ordered' destroyed to 'factor'

2017-06-16 Thread Robert McGehee
Hi,
It's been my experience that when you combine or aggregate vectors of factors 
using a function, you should be prepared for surprises, as it's not obvious 
what the "right" way to combine factors is (ordered or not), especially if two 
vectors of factors have different levels or (if ordered) are ordered in a 
different way.

For instance, what would you expect to get from unlist() if each element of the 
list had different levels, or were both ordered, but in a different way, or if 
some elements of the list were factors and others were ordered factors?
> unlist(list(ordered(c("a","b")), ordered(c("b","a"
[1] ?

Honestly, my biggest surprise from your question was that unlist even returned 
a factor at all. For example, the c() function just converts factors to 
integers.
> c(ordered(c("a","b")), ordered(c("a","b")))
[1] 1 2 1 2

And here's one that's especially weird. When rbind() data frames with an 
ordered factor, you still get an ordered factor back, but the order may be 
different from either of the original orders:

> x1 <- data.frame(a=ordered(c("b","c")))
> x2 <- data.frame(a=ordered(c("a","b","c")))
> str(rbind(x1,x2)) #  Note b < a
 'data.frame':  5 obs. of  1 variable:
 $ a: Ord.factor w/ 3 levels "b"<"c"<"a": 1 2 3 1 2

Should rbind just have returned an integer like c(), or returned a factor like 
unlist(), or should it kept the result as an ordered factor, but ordered the 
result in a different way? I have no idea.

So in short, IMO, there are definitely inconsistencies in how ordered/factors 
are handled across functions, but I think it would be hard to point to any 
single function and say it is wrong or needs to be changed. My best advice, is 
to just be careful when combining or aggregating factors.
--Robert

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of "Jens 
Oehlschlägel"
Sent: Friday, June 16, 2017 9:04 AM
To: r-devel@r-project.org
Cc: jens.oehlschlae...@truecluster.com
Subject: [Rd] 'ordered' destroyed to 'factor'

Dear all,
 
I don't know if you consider this a bug or feature, but it breaks reasonable 
code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels 
are equal. Here is a simple example:

o <- ordered(letters)
o[[1]]
lapply(o, min)[[1]]  # ordered factor
unlist(lapply(o, min))[[1]]  # no longer ordered
sapply(o, min)[[1]]  # no longer ordered

Jens Oehlschlägel
 
 
P.S: The above examples are silly for simple reproduction. The current behavior 
broke my use-case which had a structure like this
 
# have some data
x <- 1:20
# apply some function to each element
somefunc <- function(x){
  # do something and return an ordinal level
  sample(o, 1)
}
x <- sapply(x, somefunc)
# get minimum result
min(x)
# Error in Summary.factor(c(2L, 26L), na.rm = FALSE) :
#   ‘min’ not meaningful for factors
 
 
> version
   _   
platform   x86_64-pc-linux-gnu     
arch   x86_64  
os linux-gnu   
system x86_64, linux-gnu   
status     
major  3   
minor  4.0     
year   2017    
month  04  
day    21  
svn rev    72570   
language   R   
version.string R version 3.4.0 (2017-04-21)
nickname   You Stupid Darkness

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 'ordered' destroyed to 'factor'

2017-06-16 Thread Joris Meys
This can be traced back to the following line in unlist():

structure(res, levels = lv, names = nm, class = "factor")

The Details section of ?unlist states specifically how it treats factors,
so this is documented and expected behaviour.

This is also the appropriate behaviour. In your case one could argue that
unlist should maintain the order, as there's only a single factor. However,
the moment you have 2 ordered factors, there's no guarantee that the levels
are the same, or even in the same order. Hence it is impossible to
determine what should be the correct order. For this reason, the only
logical object to be returned in case of a list of factors, is an unordered
factor.

In your use case (so with a list of factors with identical ordered levels)
the solution is one extra step:

x <- list(
  factor(c("a","b"),
 levels = c("a","b","c"),
 ordered = TRUE),
  factor(c("b","c"),
 levels = c("a","b","c"),
 ordered = TRUE)
)
res <- sapply(x, min)
res <- ordered(res, levels = levels(res))
min(res)


I hope this explains

Cheers
Joris


On Fri, Jun 16, 2017 at 3:03 PM, "Jens Oehlschlägel" <
jens.oehlschlae...@truecluster.com> wrote:

> Dear all,
>
> I don't know if you consider this a bug or feature, but it breaks
> reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even
> if all levels are equal. Here is a simple example:
>
> o <- ordered(letters)
> o[[1]]
> lapply(o, min)[[1]]  # ordered factor
> unlist(lapply(o, min))[[1]]  # no longer ordered
> sapply(o, min)[[1]]  # no longer ordered
>
> Jens Oehlschlägel
>
>
> P.S: The above examples are silly for simple reproduction. The current
> behavior broke my use-case which had a structure like this
>
> # have some data
> x <- 1:20
> # apply some function to each element
> somefunc <- function(x){
>   # do something and return an ordinal level
>   sample(o, 1)
> }
> x <- sapply(x, somefunc)
> # get minimum result
> min(x)
> # Error in Summary.factor(c(2L, 26L), na.rm = FALSE) :
> #   ‘min’ not meaningful for factors
>
>
> > version
>_
> platform   x86_64-pc-linux-gnu
> arch   x86_64
> os linux-gnu
> system x86_64, linux-gnu
> status
> major  3
> minor  4.0
> year   2017
> month  04
> day21
> svn rev72570
> language   R
> version.string R version 3.4.0 (2017-04-21)
> nickname   You Stupid Darkness
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel




-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread Henrik Bengtsson
I'm just curious (no complaints), what was the reason for choosing the
letter 'L' as a suffix for integer constants?  Does it stand for
something (literal?), is it because it visually stands out, ..., or no
specific reason at all?

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread Serguei Sokol

Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :

I'm just curious (no complaints), what was the reason for choosing the
letter 'L' as a suffix for integer constants?  Does it stand for
something (literal?), is it because it visually stands out, ..., or no
specific reason at all?

My guess is that it is inherited form C "long integer" type (contrary to "short integer" 
or simply "integer")
https://en.wikipedia.org/wiki/C_data_types

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] duplicated factor labels.

2017-06-16 Thread Paul Johnson
On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys  wrote:
> To extwnd on Martin 's explanation :
>
> In factor(), levels are the unique input values and labels the unique output
> values. So the function levels() actually displays the labels.
>

Dear Joris

I think we agree. Currently, factor insists both levels and labels be unique.

I wish that it would not accept nonunique labels. I also understand it
is impractical to change this now in base R.

I don't think I succeeded in explaining why this would be nicer.
Here's another example. Fairly often, we see input data like

x <- c("Male", "Man", "male", "Man", "Female")

The first four represent the same value.  I'd like to go in one step
to a new factor variable with enumerated types "Male" and "Female".
This fails

xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
labels = c("Male", "Male", "Male", "Female"))

Instead, we need 2 steps.

xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
levels(xf) <- c("Male", "Male", "Male", "Female")

I think it is quirky that `levels<-.factor` allows the duplicated
labels, whereas factor does not.

I wrote a function rockchalk::combineLevels to simplify combining
levels, but most of the students here like plyr::mapvalues to do it.
The use of levels() can be tricky because one must enumerate all
values, not just the ones being changed.

But I do understand Martin's point. Its been this way 25 years, it
won't change. :).

> Cheers
> Joris
>
>


-- 
Paul E. Johnson   http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu

To write to me directly, please address me at pauljohn at ku.edu.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] duplicated factor labels.

2017-06-16 Thread Joris Meys
Hi Paul,

Now I see what you're getting at. I misread your original mail completely.
So we definitely agree, and wholeheartedly even.

The use case you just gave, is definitely in my top 5 of frustrations about
R. I would like to be able to assign the same label to multiple levels
without having to use eg dplyr::recode_factor() or some other vectorized
switch statement to recode all data first.

I understand "it's been like that 25 years", but I've looked hard to find a
use case where adding this behaviour would invalid existing code and
couldn't come up with something.

So I add my (totally insignificant) vote for adding the possibility of
assigning the same label to multiple levels in factor() itself.

Cheers and thank you for bringing this up!


On Fri, Jun 16, 2017 at 6:02 PM, Paul Johnson  wrote:

> On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys  wrote:
> > To extwnd on Martin 's explanation :
> >
> > In factor(), levels are the unique input values and labels the unique
> output
> > values. So the function levels() actually displays the labels.
> >
>
> Dear Joris
>
> I think we agree. Currently, factor insists both levels and labels be
> unique.
>
> I wish that it would not accept nonunique labels. I also understand it
> is impractical to change this now in base R.
>
> I don't think I succeeded in explaining why this would be nicer.
> Here's another example. Fairly often, we see input data like
>
> x <- c("Male", "Man", "male", "Man", "Female")
>
> The first four represent the same value.  I'd like to go in one step
> to a new factor variable with enumerated types "Male" and "Female".
> This fails
>
> xf <- factor(x, levels = c("Male", "Man", "male", "Female"),
> labels = c("Male", "Male", "Male", "Female"))
>
> Instead, we need 2 steps.
>
> xf <- factor(x, levels = c("Male", "Man", "male", "Female"))
> levels(xf) <- c("Male", "Male", "Male", "Female")
>
> I think it is quirky that `levels<-.factor` allows the duplicated
> labels, whereas factor does not.
>
> I wrote a function rockchalk::combineLevels to simplify combining
> levels, but most of the students here like plyr::mapvalues to do it.
> The use of levels() can be tricky because one must enumerate all
> values, not just the ones being changed.
>
> But I do understand Martin's point. Its been this way 25 years, it
> won't change. :).
>
> > Cheers
> > Joris
> >
> >
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'ordered' destroyed to 'factor'

2017-06-16 Thread peter dalgaard

> On 16 Jun 2017, at 15:59 , Robert McGehee  wrote:
> 
> For instance, what would you expect to get from unlist() if each element of 
> the list had different levels, or were both ordered, but in a different way, 
> or if some elements of the list were factors and others were ordered factors?
>> unlist(list(ordered(c("a","b")), ordered(c("b","a"
> [1] ?

Those actually have the same levels in the same order: a < b

Possibly, this brings the point home more clearly

unlist(list(ordered(c("a","c")), ordered(c("b","d"

(Notice that alphabetical order is largely irrelevant, so all of these level 
orderings are equally possible:

a < c < b < d
a < b < c < d
a < b < d < c
b < a < c < d
b < a < d < c
b < d < a < c

).

-pd
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread Yihui Xie
Yeah, that was what I heard from our instructor when I was a graduate
student: L stands for Long (integer).

Regards,
Yihui
--
https://yihui.name


On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol  wrote:
> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :
>>
>> I'm just curious (no complaints), what was the reason for choosing the
>> letter 'L' as a suffix for integer constants?  Does it stand for
>> something (literal?), is it because it visually stands out, ..., or no
>> specific reason at all?
>
> My guess is that it is inherited form C "long integer" type (contrary to
> "short integer" or simply "integer")
> https://en.wikipedia.org/wiki/C_data_types

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread Radford Neal
> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:
>
> > I like the idea of string literals, but the C/C++ way clearly does not
> > work. The Python/Julia way might, i.e.:
> >
> > """this is a
> > multi-line
> > lineral"""
> 
> luke-tier...@uiowa.edu:

> This does look like a promising option; some more careful checking
> would be needed to make sure there aren't cases where currently
> working code would be broken.

I don't see how this proposal solves any problem of interest.

String literals can already be as long as you like.  The problem is
that they will get wrapped around in an editor (or not all be
visible), destroying the nice formatting of your program.

With the proposed extension, you can write long string literals with
short lines only if they were long only because they consisted of
multiple lines.  Getting a string literal that's 79 characters long
with no newlines (a perfectly good error message, for example) to fit
in your 80-character-wide editing window would still be impossible.

Furthermore, these Python-style literals have to have their second
and later lines start at the left edge, destroying the indentation
of your program (supposing you actually wanted to use one).

In contrast, C-style concatenation (by the parser) of consecutive
string literals works just fine for what you'd want to do in a
program.  The only thing they wouldn't do that the Python-style
literals would do is allow you to put big blocks of literal text in
your program, without having to put quotes around each line.  But
shouldn't such text really be stored in a separate file that gets
read, rather than in the program source?

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread Gábor Csárdi
On Fri, Jun 16, 2017 at 7:04 PM, Radford Neal  wrote:
>> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:
>>
>> > I like the idea of string literals, but the C/C++ way clearly does not
>> > work. The Python/Julia way might, i.e.:
>> >
>> > """this is a
>> > multi-line
>> > lineral"""
>>
>> luke-tier...@uiowa.edu:
>
>> This does look like a promising option; some more careful checking
>> would be needed to make sure there aren't cases where currently
>> working code would be broken.
>
> I don't see how this proposal solves any problem of interest.
>
> String literals can already be as long as you like.  The problem is
> that they will get wrapped around in an editor (or not all be
> visible), destroying the nice formatting of your program.

From the Python docs:

String literals can span multiple lines. One way is using
triple-quotes: """...""" or '''...'''. End of lines are automatically
included in the string, but it’s possible to prevent this by adding a
\ at the end of the line.

[...]

Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread William Dunlap via R-devel
But R "integers" are C "ints", as opposed to S "integers", which are C
"long ints".  (I suppose R never had to run on ancient hardware with 16 bit
ints.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie  wrote:

> Yeah, that was what I heard from our instructor when I was a graduate
> student: L stands for Long (integer).
>
> Regards,
> Yihui
> --
> https://yihui.name
>
>
> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol 
> wrote:
> > Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :
> >>
> >> I'm just curious (no complaints), what was the reason for choosing the
> >> letter 'L' as a suffix for integer constants?  Does it stand for
> >> something (literal?), is it because it visually stands out, ..., or no
> >> specific reason at all?
> >
> > My guess is that it is inherited form C "long integer" type (contrary to
> > "short integer" or simply "integer")
> > https://en.wikipedia.org/wiki/C_data_types
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread peter dalgaard
Wikipedia claims that C ints are still only guaranteed to be at least 16 bits, 
and longs are at least 32 bits. So no, R's integers are long.

-pd

> On 16 Jun 2017, at 20:20 , William Dunlap via R-devel  
> wrote:
> 
> But R "integers" are C "ints", as opposed to S "integers", which are C
> "long ints".  (I suppose R never had to run on ancient hardware with 16 bit
> ints.)
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie  wrote:
> 
>> Yeah, that was what I heard from our instructor when I was a graduate
>> student: L stands for Long (integer).
>> 
>> Regards,
>> Yihui
>> --
>> https://yihui.name
>> 
>> 
>> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol 
>> wrote:
>>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :
 
 I'm just curious (no complaints), what was the reason for choosing the
 letter 'L' as a suffix for integer constants?  Does it stand for
 something (literal?), is it because it visually stands out, ..., or no
 specific reason at all?
>>> 
>>> My guess is that it is inherited form C "long integer" type (contrary to
>>> "short integer" or simply "integer")
>>> https://en.wikipedia.org/wiki/C_data_types
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread William Dunlap via R-devel
"Writing R Extensions" says "int":

R storage mode  C type  FORTRAN type
logical  int*  INTEGER
integer  int*  INTEGER
double  double*  DOUBLE PRECISION
complex  Rcomplex*  DOUBLE COMPLEX
character  char**  CHARACTER*255
raw  unsigned char*  none

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard  wrote:
>
> Wikipedia claims that C ints are still only guaranteed to be at least 16
bits, and longs are at least 32 bits. So no, R's integers are long.
>
> -pd
>
> > On 16 Jun 2017, at 20:20 , William Dunlap via R-devel <
r-devel@r-project.org> wrote:
> >
> > But R "integers" are C "ints", as opposed to S "integers", which are C
> > "long ints".  (I suppose R never had to run on ancient hardware with 16
bit
> > ints.)
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie  wrote:
> >
> >> Yeah, that was what I heard from our instructor when I was a graduate
> >> student: L stands for Long (integer).
> >>
> >> Regards,
> >> Yihui
> >> --
> >> https://yihui.name
> >>
> >>
> >> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol 
> >> wrote:
> >>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :
> 
>  I'm just curious (no complaints), what was the reason for choosing
the
>  letter 'L' as a suffix for integer constants?  Does it stand for
>  something (literal?), is it because it visually stands out, ..., or
no
>  specific reason at all?
> >>>
> >>> My guess is that it is inherited form C "long integer" type (contrary
to
> >>> "short integer" or simply "integer")
> >>> https://en.wikipedia.org/wiki/C_data_types
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread Hadley Wickham
On Fri, Jun 16, 2017 at 1:14 PM, Gábor Csárdi  wrote:
> On Fri, Jun 16, 2017 at 7:04 PM, Radford Neal  wrote:
>>> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:
>>>
>>> > I like the idea of string literals, but the C/C++ way clearly does not
>>> > work. The Python/Julia way might, i.e.:
>>> >
>>> > """this is a
>>> > multi-line
>>> > lineral"""
>>>
>>> luke-tier...@uiowa.edu:
>>
>>> This does look like a promising option; some more careful checking
>>> would be needed to make sure there aren't cases where currently
>>> working code would be broken.
>>
>> I don't see how this proposal solves any problem of interest.
>>
>> String literals can already be as long as you like.  The problem is
>> that they will get wrapped around in an editor (or not all be
>> visible), destroying the nice formatting of your program.
>
> From the Python docs:
>
> String literals can span multiple lines. One way is using
> triple-quotes: """...""" or '''...'''. End of lines are automatically
> included in the string, but it’s possible to prevent this by adding a
> \ at the end of the line.

And additionally, in Julia triple quoted strings:

Trailing whitespace is left unaltered. They can contain " symbols
without escaping. Triple-quoted strings are also dedented to the level
of the least-indented line. This is useful for defining strings within
code that is indented. For example:

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Simplify and By Convert Factors To Numeric Values

2017-06-16 Thread Charles C. Berry

On Fri, 16 Jun 2017, Dario Strbenac wrote:


Good day,



It's not described anywhere in the help page, but tapply and by 
functions will, by default, convert factors into numeric values. Perhaps 
this needs to be documented or the behaviour changed.



It *is* described in the help page.

This returns a list of objects and each object class has "factor"

tapply(rep(1:2,2), rep(1:2,2),
  function(x) factor(LETTERS[x], levels = LETTERS))

and this




tapply(1:3, 1:3, function(x) factor(LETTERS[x], levels = LETTERS))

1 2 3
1 2 3


returns a vector object with no class.





The documentation states "... tapply returns a multi-way array 
containing the values ..." but doesn't mention anything about converting 
factors into integers. I'd expect the values to be of the same type.


and also states

"If FUN returns a single atomic value for each such cell ... and when 
simplify is TRUE ...  if the return value has a class (e.g., an object of 
class "Date") the class is discarded."


which is what just happened in your example.

Maybe you want:

unlist(tapply(1:3, 1:3, function(x) factor(LETTERS[x],
  levels = LETTERS),simplify=FALSE))

Trying to preserve class worked here in a way you might have 
hoped/expected, but might lead to difficulties in other uses.


HTH,

Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread Duncan Murdoch

On 16/06/2017 2:04 PM, Radford Neal wrote:

On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:


I like the idea of string literals, but the C/C++ way clearly does not
work. The Python/Julia way might, i.e.:

"""this is a
multi-line
lineral"""


luke-tier...@uiowa.edu:



This does look like a promising option; some more careful checking
would be needed to make sure there aren't cases where currently
working code would be broken.


I don't see how this proposal solves any problem of interest.

String literals can already be as long as you like.  The problem is
that they will get wrapped around in an editor (or not all be
visible), destroying the nice formatting of your program.

With the proposed extension, you can write long string literals with
short lines only if they were long only because they consisted of
multiple lines.  Getting a string literal that's 79 characters long
with no newlines (a perfectly good error message, for example) to fit
in your 80-character-wide editing window would still be impossible.

Furthermore, these Python-style literals have to have their second
and later lines start at the left edge, destroying the indentation
of your program (supposing you actually wanted to use one).

In contrast, C-style concatenation (by the parser) of consecutive
string literals works just fine for what you'd want to do in a
program.  The only thing they wouldn't do that the Python-style
literals would do is allow you to put big blocks of literal text in
your program, without having to put quotes around each line.  But
shouldn't such text really be stored in a separate file that gets
read, rather than in the program source?


I agree with most of this, but I still don't see the need for a syntax 
change.  That's a lot of work just to avoid typing "paste0" and some 
commas in


 paste0("this is the first part",
"this is the second part")

If the rather insignificant amount of time it takes to execute this 
function call really matters (and I'm not convinced of that), then 
shouldn't it be solved by the compiler applying constant folding to 
paste0()?


(Some syntax like r"xyz" to make it easier to type strings containing 
backslashes and quotes would actually be useful, but that's a different 
issue.)


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread Jim Hester
The relevant sections of the C standard are
http://c0x.coding-guidelines.com/5.2.4.2.1.html, which specifies that C
ints are only guaranteed to be 16 bits, C long ints at least 32 bits in
size, as Peter mentioned. Also http://c0x.coding-guidelines.com/6.4.4.1.html
specifies l or L as the suffix for a long int constants.

However R does define integers as `int` in it's source code, so use of L is
not strictly correct if a compiler uses 16 bit int types. I guess this
ambiguity is why the `int32_t` typedef exists.

On Fri, Jun 16, 2017 at 3:01 PM, William Dunlap via R-devel <
r-devel@r-project.org> wrote:

> "Writing R Extensions" says "int":
>
> R storage mode  C type  FORTRAN type
> logical  int*  INTEGER
> integer  int*  INTEGER
> double  double*  DOUBLE PRECISION
> complex  Rcomplex*  DOUBLE COMPLEX
> character  char**  CHARACTER*255
> raw  unsigned char*  none
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard  wrote:
> >
> > Wikipedia claims that C ints are still only guaranteed to be at least 16
> bits, and longs are at least 32 bits. So no, R's integers are long.
> >
> > -pd
> >
> > > On 16 Jun 2017, at 20:20 , William Dunlap via R-devel <
> r-devel@r-project.org> wrote:
> > >
> > > But R "integers" are C "ints", as opposed to S "integers", which are C
> > > "long ints".  (I suppose R never had to run on ancient hardware with 16
> bit
> > > ints.)
> > >
> > > Bill Dunlap
> > > TIBCO Software
> > > wdunlap tibco.com
> > >
> > > On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie  wrote:
> > >
> > >> Yeah, that was what I heard from our instructor when I was a graduate
> > >> student: L stands for Long (integer).
> > >>
> > >> Regards,
> > >> Yihui
> > >> --
> > >> https://yihui.name
> > >>
> > >>
> > >> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol <
> so...@insa-toulouse.fr
> >
> > >> wrote:
> > >>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :
> > 
> >  I'm just curious (no complaints), what was the reason for choosing
> the
> >  letter 'L' as a suffix for integer constants?  Does it stand for
> >  something (literal?), is it because it visually stands out, ..., or
> no
> >  specific reason at all?
> > >>>
> > >>> My guess is that it is inherited form C "long integer" type (contrary
> to
> > >>> "short integer" or simply "integer")
> > >>> https://en.wikipedia.org/wiki/C_data_types
> > >>
> > >> __
> > >> R-devel@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: pd@cbs.dk  Priv: pda...@gmail.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines

2017-06-16 Thread peter dalgaard

> On 16 Jun 2017, at 21:17 , Duncan Murdoch  wrote:
> 
> paste0("this is the first part",
>"this is the second part")
> 
> If the rather insignificant amount of time it takes to execute this function 
> call really matters (and I'm not convinced of that), then shouldn't it be 
> solved by the compiler applying constant folding to paste0()?

And, of course, if it is equivalent to a literal, it can be precomputed. There 
is no point in having it in the middle of a tight loop. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?

2017-06-16 Thread Prof Brian Ripley

On 16/06/2017 20:37, Jim Hester wrote:

The relevant sections of the C standard are
http://c0x.coding-guidelines.com/5.2.4.2.1.html, which specifies that C


There is more than one C standard, but that is none of them.


ints are only guaranteed to be 16 bits, C long ints at least 32 bits in
size, as Peter mentioned. Also http://c0x.coding-guidelines.com/6.4.4.1.html
specifies l or L as the suffix for a long int constants.

However R does define integers as `int` in it's source code, so use of L is
not strictly correct if a compiler uses 16 bit int types. I guess this
ambiguity is why the `int32_t` typedef exists.


However, R checks that the compiler uses 32-bit ints in its build 
(configure and src/main/arithmetic.c) and documents that in R-admin . 
In any case, the C standard does not apply to the R language.


Also, int32_t

- postdates R (it was introduced in C99, a few OSes having it earlier)
- is optional in the C99 and C11 standards (§7.20.1.1 in C11).




On Fri, Jun 16, 2017 at 3:01 PM, William Dunlap via R-devel <
r-devel@r-project.org> wrote:


"Writing R Extensions" says "int":

R storage mode  C type  FORTRAN type
logical  int*  INTEGER
integer  int*  INTEGER
double  double*  DOUBLE PRECISION
complex  Rcomplex*  DOUBLE COMPLEX
character  char**  CHARACTER*255
raw  unsigned char*  none

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard  wrote:


Wikipedia claims that C ints are still only guaranteed to be at least 16

bits, and longs are at least 32 bits. So no, R's integers are long.


-pd


On 16 Jun 2017, at 20:20 , William Dunlap via R-devel <

r-devel@r-project.org> wrote:


But R "integers" are C "ints", as opposed to S "integers", which are C
"long ints".  (I suppose R never had to run on ancient hardware with 16

bit

ints.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie  wrote:


Yeah, that was what I heard from our instructor when I was a graduate
student: L stands for Long (integer).

Regards,
Yihui
--
https://yihui.name


On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol <

so...@insa-toulouse.fr



wrote:

Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit :


I'm just curious (no complaints), what was the reason for choosing

the

letter 'L' as a suffix for integer constants?  Does it stand for
something (literal?), is it because it visually stands out, ..., or

no

specific reason at all?


My guess is that it is inherited form C "long integer" type (contrary

to

"short integer" or simply "integer")
https://en.wikipedia.org/wiki/C_data_types


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


   [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com











 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel