Re: [Rd] duplicated factor labels.
To extwnd on Martin 's explanation : In factor(), levels are the unique input values and labels the unique output values. So the function levels() actually displays the labels. Cheers Joris On 15 Jun 2017 17:15, "Martin Maechler" wrote: > Paul Johnson > on Wed, 14 Jun 2017 19:00:11 -0500 writes: > Dear R devel > I've been wondering about this for a while. I am sorry to ask for your > time, but can one of you help me understand this? > This concerns duplicated labels, not levels, in the factor function. > I think it is hard to understand that factor() fails, but levels() > after does not >> x <- 1:6 >> xlevels <- 1:6 >> xlabels <- c(1, NA, NA, 4, 4, 4) >> y <- factor(x, levels = xlevels, labels = xlabels) > Error in `levels<-`(`*tmp*`, value = if (nl == nL) > as.character(labels) else paste0(labels, : > factor level [3] is duplicated >> y <- factor(x, levels = xlevels) >> levels(y) <- xlabels >> y > [1] 1 444 > Levels: 1 4 > If the latter use of levels() causes a good, expected result, couldn't > factor(..., labels = xlabels) be made to the same thing? I may misunderstand, but I think you are confusing 'labels' and 'levels' here, (and you are not alone in this!) mostly because R's factor() function treats them as arguments in a way that can be confusing.. (but I don't think we'd want to change that; it's been documented and in use for > 25 year (in S, S+, R). Note that after the above, > dput(y) structure(c(1L, NA, NA, 2L, 2L, 2L), .Label = c("1", "4"), class = "factor") and that of course _is_ a valid factor .. which you can easily get directly via e.g. > identical(y, factor(c(1,NA,NA,4,4,4))) [1] TRUE or also via > identical(y, factor(c("1",NA,NA,"4","4","4"))) [1] TRUE I really don't see a need for a change of factor(). It should remain as simple as possible (but not simpler :-). Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 'ordered' destroyed to 'factor'
Dear all, I don't know if you consider this a bug or feature, but it breaks reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels are equal. Here is a simple example: o <- ordered(letters) o[[1]] lapply(o, min)[[1]] # ordered factor unlist(lapply(o, min))[[1]] # no longer ordered sapply(o, min)[[1]] # no longer ordered Jens Oehlschlägel P.S: The above examples are silly for simple reproduction. The current behavior broke my use-case which had a structure like this # have some data x <- 1:20 # apply some function to each element somefunc <- function(x){ # do something and return an ordinal level sample(o, 1) } x <- sapply(x, somefunc) # get minimum result min(x) # Error in Summary.factor(c(2L, 26L), na.rm = FALSE) : # ‘min’ not meaningful for factors > version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 4.0 year 2017 month 04 day 21 svn rev 72570 language R version.string R version 3.4.0 (2017-04-21) nickname You Stupid Darkness __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
>> I don't think it is reasonable to change the parser this way. This is >> currently valid R code: >> >> a <- "foo" >> "bar" >> >> and with the new syntax, it is also valid, but with a different >> meaning. Or you can even consider >> >> a <- "foo" >> bar %>% func() %>% print() >> >> etc. >> >> I like the idea of string literals, but the C/C++ way clearly does not >> work. The Python/Julia way might, i.e.: >> >> """this is a >> multi-line >> lineral""" > > > This does look like a promising option; some more careful checking > would be needed to make sure there aren't cases where currently > working code would be broken. > > Another Python idea worth considering is the raw string notation > r"xyx" that does not process escape sequences -- this would make > writing things like regular expressions easier. If this is something you would consider, we'd be happy to put together a patch for review. Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'ordered' destroyed to 'factor'
Hi, It's been my experience that when you combine or aggregate vectors of factors using a function, you should be prepared for surprises, as it's not obvious what the "right" way to combine factors is (ordered or not), especially if two vectors of factors have different levels or (if ordered) are ordered in a different way. For instance, what would you expect to get from unlist() if each element of the list had different levels, or were both ordered, but in a different way, or if some elements of the list were factors and others were ordered factors? > unlist(list(ordered(c("a","b")), ordered(c("b","a" [1] ? Honestly, my biggest surprise from your question was that unlist even returned a factor at all. For example, the c() function just converts factors to integers. > c(ordered(c("a","b")), ordered(c("a","b"))) [1] 1 2 1 2 And here's one that's especially weird. When rbind() data frames with an ordered factor, you still get an ordered factor back, but the order may be different from either of the original orders: > x1 <- data.frame(a=ordered(c("b","c"))) > x2 <- data.frame(a=ordered(c("a","b","c"))) > str(rbind(x1,x2)) # Note b < a 'data.frame': 5 obs. of 1 variable: $ a: Ord.factor w/ 3 levels "b"<"c"<"a": 1 2 3 1 2 Should rbind just have returned an integer like c(), or returned a factor like unlist(), or should it kept the result as an ordered factor, but ordered the result in a different way? I have no idea. So in short, IMO, there are definitely inconsistencies in how ordered/factors are handled across functions, but I think it would be hard to point to any single function and say it is wrong or needs to be changed. My best advice, is to just be careful when combining or aggregating factors. --Robert -Original Message- From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of "Jens Oehlschlägel" Sent: Friday, June 16, 2017 9:04 AM To: r-devel@r-project.org Cc: jens.oehlschlae...@truecluster.com Subject: [Rd] 'ordered' destroyed to 'factor' Dear all, I don't know if you consider this a bug or feature, but it breaks reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even if all levels are equal. Here is a simple example: o <- ordered(letters) o[[1]] lapply(o, min)[[1]] # ordered factor unlist(lapply(o, min))[[1]] # no longer ordered sapply(o, min)[[1]] # no longer ordered Jens Oehlschlägel P.S: The above examples are silly for simple reproduction. The current behavior broke my use-case which had a structure like this # have some data x <- 1:20 # apply some function to each element somefunc <- function(x){ # do something and return an ordinal level sample(o, 1) } x <- sapply(x, somefunc) # get minimum result min(x) # Error in Summary.factor(c(2L, 26L), na.rm = FALSE) : # ‘min’ not meaningful for factors > version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 4.0 year 2017 month 04 day 21 svn rev 72570 language R version.string R version 3.4.0 (2017-04-21) nickname You Stupid Darkness __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'ordered' destroyed to 'factor'
This can be traced back to the following line in unlist(): structure(res, levels = lv, names = nm, class = "factor") The Details section of ?unlist states specifically how it treats factors, so this is documented and expected behaviour. This is also the appropriate behaviour. In your case one could argue that unlist should maintain the order, as there's only a single factor. However, the moment you have 2 ordered factors, there's no guarantee that the levels are the same, or even in the same order. Hence it is impossible to determine what should be the correct order. For this reason, the only logical object to be returned in case of a list of factors, is an unordered factor. In your use case (so with a list of factors with identical ordered levels) the solution is one extra step: x <- list( factor(c("a","b"), levels = c("a","b","c"), ordered = TRUE), factor(c("b","c"), levels = c("a","b","c"), ordered = TRUE) ) res <- sapply(x, min) res <- ordered(res, levels = levels(res)) min(res) I hope this explains Cheers Joris On Fri, Jun 16, 2017 at 3:03 PM, "Jens Oehlschlägel" < jens.oehlschlae...@truecluster.com> wrote: > Dear all, > > I don't know if you consider this a bug or feature, but it breaks > reasonable code: 'unlist' and 'sapply' convert 'ordered' to 'factor' even > if all levels are equal. Here is a simple example: > > o <- ordered(letters) > o[[1]] > lapply(o, min)[[1]] # ordered factor > unlist(lapply(o, min))[[1]] # no longer ordered > sapply(o, min)[[1]] # no longer ordered > > Jens Oehlschlägel > > > P.S: The above examples are silly for simple reproduction. The current > behavior broke my use-case which had a structure like this > > # have some data > x <- 1:20 > # apply some function to each element > somefunc <- function(x){ > # do something and return an ordinal level > sample(o, 1) > } > x <- sapply(x, somefunc) > # get minimum result > min(x) > # Error in Summary.factor(c(2L, 26L), na.rm = FALSE) : > # ‘min’ not meaningful for factors > > > > version >_ > platform x86_64-pc-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 3 > minor 4.0 > year 2017 > month 04 > day21 > svn rev72570 > language R > version.string R version 3.4.0 (2017-04-21) > nickname You Stupid Darkness > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
I'm just curious (no complaints), what was the reason for choosing the letter 'L' as a suffix for integer constants? Does it stand for something (literal?), is it because it visually stands out, ..., or no specific reason at all? /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : I'm just curious (no complaints), what was the reason for choosing the letter 'L' as a suffix for integer constants? Does it stand for something (literal?), is it because it visually stands out, ..., or no specific reason at all? My guess is that it is inherited form C "long integer" type (contrary to "short integer" or simply "integer") https://en.wikipedia.org/wiki/C_data_types __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] duplicated factor labels.
On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys wrote: > To extwnd on Martin 's explanation : > > In factor(), levels are the unique input values and labels the unique output > values. So the function levels() actually displays the labels. > Dear Joris I think we agree. Currently, factor insists both levels and labels be unique. I wish that it would not accept nonunique labels. I also understand it is impractical to change this now in base R. I don't think I succeeded in explaining why this would be nicer. Here's another example. Fairly often, we see input data like x <- c("Male", "Man", "male", "Man", "Female") The first four represent the same value. I'd like to go in one step to a new factor variable with enumerated types "Male" and "Female". This fails xf <- factor(x, levels = c("Male", "Man", "male", "Female"), labels = c("Male", "Male", "Male", "Female")) Instead, we need 2 steps. xf <- factor(x, levels = c("Male", "Man", "male", "Female")) levels(xf) <- c("Male", "Male", "Male", "Female") I think it is quirky that `levels<-.factor` allows the duplicated labels, whereas factor does not. I wrote a function rockchalk::combineLevels to simplify combining levels, but most of the students here like plyr::mapvalues to do it. The use of levels() can be tricky because one must enumerate all values, not just the ones being changed. But I do understand Martin's point. Its been this way 25 years, it won't change. :). > Cheers > Joris > > -- Paul E. Johnson http://pj.freefaculty.org Director, Center for Research Methods and Data Analysis http://crmda.ku.edu To write to me directly, please address me at pauljohn at ku.edu. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] duplicated factor labels.
Hi Paul, Now I see what you're getting at. I misread your original mail completely. So we definitely agree, and wholeheartedly even. The use case you just gave, is definitely in my top 5 of frustrations about R. I would like to be able to assign the same label to multiple levels without having to use eg dplyr::recode_factor() or some other vectorized switch statement to recode all data first. I understand "it's been like that 25 years", but I've looked hard to find a use case where adding this behaviour would invalid existing code and couldn't come up with something. So I add my (totally insignificant) vote for adding the possibility of assigning the same label to multiple levels in factor() itself. Cheers and thank you for bringing this up! On Fri, Jun 16, 2017 at 6:02 PM, Paul Johnson wrote: > On Fri, Jun 16, 2017 at 2:35 AM, Joris Meys wrote: > > To extwnd on Martin 's explanation : > > > > In factor(), levels are the unique input values and labels the unique > output > > values. So the function levels() actually displays the labels. > > > > Dear Joris > > I think we agree. Currently, factor insists both levels and labels be > unique. > > I wish that it would not accept nonunique labels. I also understand it > is impractical to change this now in base R. > > I don't think I succeeded in explaining why this would be nicer. > Here's another example. Fairly often, we see input data like > > x <- c("Male", "Man", "male", "Man", "Female") > > The first four represent the same value. I'd like to go in one step > to a new factor variable with enumerated types "Male" and "Female". > This fails > > xf <- factor(x, levels = c("Male", "Man", "male", "Female"), > labels = c("Male", "Male", "Male", "Female")) > > Instead, we need 2 steps. > > xf <- factor(x, levels = c("Male", "Man", "male", "Female")) > levels(xf) <- c("Male", "Male", "Male", "Female") > > I think it is quirky that `levels<-.factor` allows the duplicated > labels, whereas factor does not. > > I wrote a function rockchalk::combineLevels to simplify combining > levels, but most of the students here like plyr::mapvalues to do it. > The use of levels() can be tricky because one must enumerate all > values, not just the ones being changed. > > But I do understand Martin's point. Its been this way 25 years, it > won't change. :). > > > Cheers > > Joris > > > > > > > -- > Paul E. Johnson http://pj.freefaculty.org > Director, Center for Research Methods and Data Analysis > http://crmda.ku.edu > > To write to me directly, please address me at pauljohn at ku.edu. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 'ordered' destroyed to 'factor'
> On 16 Jun 2017, at 15:59 , Robert McGehee wrote: > > For instance, what would you expect to get from unlist() if each element of > the list had different levels, or were both ordered, but in a different way, > or if some elements of the list were factors and others were ordered factors? >> unlist(list(ordered(c("a","b")), ordered(c("b","a" > [1] ? Those actually have the same levels in the same order: a < b Possibly, this brings the point home more clearly unlist(list(ordered(c("a","c")), ordered(c("b","d" (Notice that alphabetical order is largely irrelevant, so all of these level orderings are equally possible: a < c < b < d a < b < c < d a < b < d < c b < a < c < d b < a < d < c b < d < a < c ). -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
Yeah, that was what I heard from our instructor when I was a graduate student: L stands for Long (integer). Regards, Yihui -- https://yihui.name On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol wrote: > Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : >> >> I'm just curious (no complaints), what was the reason for choosing the >> letter 'L' as a suffix for integer constants? Does it stand for >> something (literal?), is it because it visually stands out, ..., or no >> specific reason at all? > > My guess is that it is inherited form C "long integer" type (contrary to > "short integer" or simply "integer") > https://en.wikipedia.org/wiki/C_data_types __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote: > > > I like the idea of string literals, but the C/C++ way clearly does not > > work. The Python/Julia way might, i.e.: > > > > """this is a > > multi-line > > lineral""" > > luke-tier...@uiowa.edu: > This does look like a promising option; some more careful checking > would be needed to make sure there aren't cases where currently > working code would be broken. I don't see how this proposal solves any problem of interest. String literals can already be as long as you like. The problem is that they will get wrapped around in an editor (or not all be visible), destroying the nice formatting of your program. With the proposed extension, you can write long string literals with short lines only if they were long only because they consisted of multiple lines. Getting a string literal that's 79 characters long with no newlines (a perfectly good error message, for example) to fit in your 80-character-wide editing window would still be impossible. Furthermore, these Python-style literals have to have their second and later lines start at the left edge, destroying the indentation of your program (supposing you actually wanted to use one). In contrast, C-style concatenation (by the parser) of consecutive string literals works just fine for what you'd want to do in a program. The only thing they wouldn't do that the Python-style literals would do is allow you to put big blocks of literal text in your program, without having to put quotes around each line. But shouldn't such text really be stored in a separate file that gets read, rather than in the program source? Radford Neal __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On Fri, Jun 16, 2017 at 7:04 PM, Radford Neal wrote: >> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote: >> >> > I like the idea of string literals, but the C/C++ way clearly does not >> > work. The Python/Julia way might, i.e.: >> > >> > """this is a >> > multi-line >> > lineral""" >> >> luke-tier...@uiowa.edu: > >> This does look like a promising option; some more careful checking >> would be needed to make sure there aren't cases where currently >> working code would be broken. > > I don't see how this proposal solves any problem of interest. > > String literals can already be as long as you like. The problem is > that they will get wrapped around in an editor (or not all be > visible), destroying the nice formatting of your program. From the Python docs: String literals can span multiple lines. One way is using triple-quotes: """...""" or '''...'''. End of lines are automatically included in the string, but it’s possible to prevent this by adding a \ at the end of the line. [...] Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
But R "integers" are C "ints", as opposed to S "integers", which are C "long ints". (I suppose R never had to run on ancient hardware with 16 bit ints.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie wrote: > Yeah, that was what I heard from our instructor when I was a graduate > student: L stands for Long (integer). > > Regards, > Yihui > -- > https://yihui.name > > > On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol > wrote: > > Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : > >> > >> I'm just curious (no complaints), what was the reason for choosing the > >> letter 'L' as a suffix for integer constants? Does it stand for > >> something (literal?), is it because it visually stands out, ..., or no > >> specific reason at all? > > > > My guess is that it is inherited form C "long integer" type (contrary to > > "short integer" or simply "integer") > > https://en.wikipedia.org/wiki/C_data_types > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
Wikipedia claims that C ints are still only guaranteed to be at least 16 bits, and longs are at least 32 bits. So no, R's integers are long. -pd > On 16 Jun 2017, at 20:20 , William Dunlap via R-devel > wrote: > > But R "integers" are C "ints", as opposed to S "integers", which are C > "long ints". (I suppose R never had to run on ancient hardware with 16 bit > ints.) > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie wrote: > >> Yeah, that was what I heard from our instructor when I was a graduate >> student: L stands for Long (integer). >> >> Regards, >> Yihui >> -- >> https://yihui.name >> >> >> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol >> wrote: >>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : I'm just curious (no complaints), what was the reason for choosing the letter 'L' as a suffix for integer constants? Does it stand for something (literal?), is it because it visually stands out, ..., or no specific reason at all? >>> >>> My guess is that it is inherited form C "long integer" type (contrary to >>> "short integer" or simply "integer") >>> https://en.wikipedia.org/wiki/C_data_types >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
"Writing R Extensions" says "int": R storage mode C type FORTRAN type logical int* INTEGER integer int* INTEGER double double* DOUBLE PRECISION complex Rcomplex* DOUBLE COMPLEX character char** CHARACTER*255 raw unsigned char* none Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard wrote: > > Wikipedia claims that C ints are still only guaranteed to be at least 16 bits, and longs are at least 32 bits. So no, R's integers are long. > > -pd > > > On 16 Jun 2017, at 20:20 , William Dunlap via R-devel < r-devel@r-project.org> wrote: > > > > But R "integers" are C "ints", as opposed to S "integers", which are C > > "long ints". (I suppose R never had to run on ancient hardware with 16 bit > > ints.) > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > > > On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie wrote: > > > >> Yeah, that was what I heard from our instructor when I was a graduate > >> student: L stands for Long (integer). > >> > >> Regards, > >> Yihui > >> -- > >> https://yihui.name > >> > >> > >> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol > >> wrote: > >>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : > > I'm just curious (no complaints), what was the reason for choosing the > letter 'L' as a suffix for integer constants? Does it stand for > something (literal?), is it because it visually stands out, ..., or no > specific reason at all? > >>> > >>> My guess is that it is inherited form C "long integer" type (contrary to > >>> "short integer" or simply "integer") > >>> https://en.wikipedia.org/wiki/C_data_types > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On Fri, Jun 16, 2017 at 1:14 PM, Gábor Csárdi wrote: > On Fri, Jun 16, 2017 at 7:04 PM, Radford Neal wrote: >>> On Wed, 14 Jun 2017, G?bor Cs?rdi wrote: >>> >>> > I like the idea of string literals, but the C/C++ way clearly does not >>> > work. The Python/Julia way might, i.e.: >>> > >>> > """this is a >>> > multi-line >>> > lineral""" >>> >>> luke-tier...@uiowa.edu: >> >>> This does look like a promising option; some more careful checking >>> would be needed to make sure there aren't cases where currently >>> working code would be broken. >> >> I don't see how this proposal solves any problem of interest. >> >> String literals can already be as long as you like. The problem is >> that they will get wrapped around in an editor (or not all be >> visible), destroying the nice formatting of your program. > > From the Python docs: > > String literals can span multiple lines. One way is using > triple-quotes: """...""" or '''...'''. End of lines are automatically > included in the string, but it’s possible to prevent this by adding a > \ at the end of the line. And additionally, in Julia triple quoted strings: Trailing whitespace is left unaltered. They can contain " symbols without escaping. Triple-quoted strings are also dedented to the level of the least-indented line. This is useful for defining strings within code that is indented. For example: Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Simplify and By Convert Factors To Numeric Values
On Fri, 16 Jun 2017, Dario Strbenac wrote: Good day, It's not described anywhere in the help page, but tapply and by functions will, by default, convert factors into numeric values. Perhaps this needs to be documented or the behaviour changed. It *is* described in the help page. This returns a list of objects and each object class has "factor" tapply(rep(1:2,2), rep(1:2,2), function(x) factor(LETTERS[x], levels = LETTERS)) and this tapply(1:3, 1:3, function(x) factor(LETTERS[x], levels = LETTERS)) 1 2 3 1 2 3 returns a vector object with no class. The documentation states "... tapply returns a multi-way array containing the values ..." but doesn't mention anything about converting factors into integers. I'd expect the values to be of the same type. and also states "If FUN returns a single atomic value for each such cell ... and when simplify is TRUE ... if the return value has a class (e.g., an object of class "Date") the class is discarded." which is what just happened in your example. Maybe you want: unlist(tapply(1:3, 1:3, function(x) factor(LETTERS[x], levels = LETTERS),simplify=FALSE)) Trying to preserve class worked here in a way you might have hoped/expected, but might lead to difficulties in other uses. HTH, Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On 16/06/2017 2:04 PM, Radford Neal wrote: On Wed, 14 Jun 2017, G?bor Cs?rdi wrote: I like the idea of string literals, but the C/C++ way clearly does not work. The Python/Julia way might, i.e.: """this is a multi-line lineral""" luke-tier...@uiowa.edu: This does look like a promising option; some more careful checking would be needed to make sure there aren't cases where currently working code would be broken. I don't see how this proposal solves any problem of interest. String literals can already be as long as you like. The problem is that they will get wrapped around in an editor (or not all be visible), destroying the nice formatting of your program. With the proposed extension, you can write long string literals with short lines only if they were long only because they consisted of multiple lines. Getting a string literal that's 79 characters long with no newlines (a perfectly good error message, for example) to fit in your 80-character-wide editing window would still be impossible. Furthermore, these Python-style literals have to have their second and later lines start at the left edge, destroying the indentation of your program (supposing you actually wanted to use one). In contrast, C-style concatenation (by the parser) of consecutive string literals works just fine for what you'd want to do in a program. The only thing they wouldn't do that the Python-style literals would do is allow you to put big blocks of literal text in your program, without having to put quotes around each line. But shouldn't such text really be stored in a separate file that gets read, rather than in the program source? I agree with most of this, but I still don't see the need for a syntax change. That's a lot of work just to avoid typing "paste0" and some commas in paste0("this is the first part", "this is the second part") If the rather insignificant amount of time it takes to execute this function call really matters (and I'm not convinced of that), then shouldn't it be solved by the compiler applying constant folding to paste0()? (Some syntax like r"xyz" to make it easier to type strings containing backslashes and quotes would actually be useful, but that's a different issue.) Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
The relevant sections of the C standard are http://c0x.coding-guidelines.com/5.2.4.2.1.html, which specifies that C ints are only guaranteed to be 16 bits, C long ints at least 32 bits in size, as Peter mentioned. Also http://c0x.coding-guidelines.com/6.4.4.1.html specifies l or L as the suffix for a long int constants. However R does define integers as `int` in it's source code, so use of L is not strictly correct if a compiler uses 16 bit int types. I guess this ambiguity is why the `int32_t` typedef exists. On Fri, Jun 16, 2017 at 3:01 PM, William Dunlap via R-devel < r-devel@r-project.org> wrote: > "Writing R Extensions" says "int": > > R storage mode C type FORTRAN type > logical int* INTEGER > integer int* INTEGER > double double* DOUBLE PRECISION > complex Rcomplex* DOUBLE COMPLEX > character char** CHARACTER*255 > raw unsigned char* none > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard wrote: > > > > Wikipedia claims that C ints are still only guaranteed to be at least 16 > bits, and longs are at least 32 bits. So no, R's integers are long. > > > > -pd > > > > > On 16 Jun 2017, at 20:20 , William Dunlap via R-devel < > r-devel@r-project.org> wrote: > > > > > > But R "integers" are C "ints", as opposed to S "integers", which are C > > > "long ints". (I suppose R never had to run on ancient hardware with 16 > bit > > > ints.) > > > > > > Bill Dunlap > > > TIBCO Software > > > wdunlap tibco.com > > > > > > On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie wrote: > > > > > >> Yeah, that was what I heard from our instructor when I was a graduate > > >> student: L stands for Long (integer). > > >> > > >> Regards, > > >> Yihui > > >> -- > > >> https://yihui.name > > >> > > >> > > >> On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol < > so...@insa-toulouse.fr > > > > >> wrote: > > >>> Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : > > > > I'm just curious (no complaints), what was the reason for choosing > the > > letter 'L' as a suffix for integer constants? Does it stand for > > something (literal?), is it because it visually stands out, ..., or > no > > specific reason at all? > > >>> > > >>> My guess is that it is inherited form C "long integer" type (contrary > to > > >>> "short integer" or simply "integer") > > >>> https://en.wikipedia.org/wiki/C_data_types > > >> > > >> __ > > >> R-devel@r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > [[alternative HTML version deleted]] > > > > > > __ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > > Peter Dalgaard, Professor, > > Center for Statistics, Copenhagen Business School > > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Office: A 4.23 > > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
> On 16 Jun 2017, at 21:17 , Duncan Murdoch wrote: > > paste0("this is the first part", >"this is the second part") > > If the rather insignificant amount of time it takes to execute this function > call really matters (and I'm not convinced of that), then shouldn't it be > solved by the compiler applying constant folding to paste0()? And, of course, if it is equivalent to a literal, it can be precomputed. There is no point in having it in the middle of a tight loop. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R history: Why 'L; in suffix character ‘L’ for integer constants?
On 16/06/2017 20:37, Jim Hester wrote: The relevant sections of the C standard are http://c0x.coding-guidelines.com/5.2.4.2.1.html, which specifies that C There is more than one C standard, but that is none of them. ints are only guaranteed to be 16 bits, C long ints at least 32 bits in size, as Peter mentioned. Also http://c0x.coding-guidelines.com/6.4.4.1.html specifies l or L as the suffix for a long int constants. However R does define integers as `int` in it's source code, so use of L is not strictly correct if a compiler uses 16 bit int types. I guess this ambiguity is why the `int32_t` typedef exists. However, R checks that the compiler uses 32-bit ints in its build (configure and src/main/arithmetic.c) and documents that in R-admin . In any case, the C standard does not apply to the R language. Also, int32_t - postdates R (it was introduced in C99, a few OSes having it earlier) - is optional in the C99 and C11 standards (§7.20.1.1 in C11). On Fri, Jun 16, 2017 at 3:01 PM, William Dunlap via R-devel < r-devel@r-project.org> wrote: "Writing R Extensions" says "int": R storage mode C type FORTRAN type logical int* INTEGER integer int* INTEGER double double* DOUBLE PRECISION complex Rcomplex* DOUBLE COMPLEX character char** CHARACTER*255 raw unsigned char* none Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 16, 2017 at 11:53 AM, peter dalgaard wrote: Wikipedia claims that C ints are still only guaranteed to be at least 16 bits, and longs are at least 32 bits. So no, R's integers are long. -pd On 16 Jun 2017, at 20:20 , William Dunlap via R-devel < r-devel@r-project.org> wrote: But R "integers" are C "ints", as opposed to S "integers", which are C "long ints". (I suppose R never had to run on ancient hardware with 16 bit ints.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 16, 2017 at 10:47 AM, Yihui Xie wrote: Yeah, that was what I heard from our instructor when I was a graduate student: L stands for Long (integer). Regards, Yihui -- https://yihui.name On Fri, Jun 16, 2017 at 11:00 AM, Serguei Sokol < so...@insa-toulouse.fr wrote: Le 16/06/2017 à 17:54, Henrik Bengtsson a écrit : I'm just curious (no complaints), what was the reason for choosing the letter 'L' as a suffix for integer constants? Does it stand for something (literal?), is it because it visually stands out, ..., or no specific reason at all? My guess is that it is inherited form C "long integer" type (contrary to "short integer" or simply "integer") https://en.wikipedia.org/wiki/C_data_types __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel