[Rd] possible internal (un)tar bug
This is a not too old R-devel on Linux, it already fails in R 3.4.4, and on macOS as well. The tar file seems valid, external tar can untar it, so maybe an untar() bug. setwd(tempdir()) dir.create("pkg") cat("foobar\n", file = file.path("pkg", "NAMESPACE")) cat("this: that\n", file = file.path("pkg", "DESCRIPTION")) tar("pkg_1.0.tar.gz", "pkg", compression = "gzip", tar = "internal") unlink("pkg", recursive = TRUE) con <- file("pkg_1.0.tar.gz", open = "rb") ex <- tempfile() untar(con, files = "pkg/DESCRIPTION", exdir = ex) #> Error in untar2(tarfile, files, list, exdir, restore_times) : #> incomplete block on file __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] issue with model.frame()
A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. # first create a data set with many long names n <- 30 # number of rows for the dummy data set vname <- vector("character", 26) for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) names(tdata) <- c('y', vname) # Use it in a formula myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") mf <- model.frame(formula(myform), data=tdata) match(attr(terms(mf), "term.labels"), names(mf)) # gives NA In the user's case the function is ridge(x1, x2, ) rather than cbind, but the effect is the same. Any ideas for a work around? Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] possible internal (un)tar bug
> Gábor Csárdi > on Tue, 1 May 2018 12:05:32 + writes: > This is a not too old R-devel on Linux, it already fails > in R 3.4.4, and on macOS as well. and fails in considerably older R versions, too. Basically untar() seems to fail on a connection, but works fine on a plain file name. This is a bug --> Thank you for the report, Gábor ! I'm investigating. Martin --- my version of your reprex : setwd(tempdir()) dir.create("pkg") cat("this: that\n", file = file.path("pkg", "DESCRIPTION")) tf <- "pkg_1.0.tar.gz" tar(tf, "pkg", compression = "gzip", tar = "internal") unlink("pkg", recursive = TRUE) ## MM: tar *file* is good stopifnot(identical(untar(tf, list=TRUE), "pkg/DESCRIPTION")) untar(tf, files = (f <- "pkg/DESCRIPTION")) # no problem stopifnot(file.exists(f)) unlink("pkg", recursive = TRUE) ## Now with a connection -- "nothing works": con <- file(tf, open = "rb"); try( untar(con, list = TRUE) ) ## -> Error con <- file(tf, open = "rb"); try( untar(con, files = "pkg/DESCRIPTION") ) ## The error message is the same in both cases: ' Error in untar2(tarfile, files, list, exdir, restore_times) : incomplete block on file ' __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] possible internal (un)tar bug
> Martin Maechler > on Tue, 1 May 2018 16:14:43 +0200 writes: > Gábor Csárdi > on Tue, 1 May 2018 12:05:32 + writes: >> This is a not too old R-devel on Linux, it already fails >> in R 3.4.4, and on macOS as well. > and fails in considerably older R versions, too. > Basically untar() seems to fail on a connection, but works > fine on a plain file name. Well, there's an easy workaround: If you want to use a connection (instead of a simple filename) with untar() and want to use compression (as in the example), you can currently do that easily when you ensure the connection is a "gzcon" one : ##=> Workaround for now: ## Create : setwd(tempdir()) ; dir.create("pkg") cat("this: that\n", file = file.path("pkg", "DESCRIPTION")) tf <- "pkg_1.0.tar.gz" tar(tf, "pkg", compression = "gzip", tar = "internal") unlink("pkg", recursive = TRUE) ## As it is a compressed tar file, use it via a gzcon() connection, ## and both cases work fine: con <- gzcon(file(tf, open = "rb")) ; (f <- untar(con, list = TRUE)) ## ~ con <- gzcon(file(tf, open = "rb")) ; untar(con, files = f) stopifnot(identical(f, "pkg/DESCRIPTION"), file.exists(f)) unlink(c(tf,"pkg"), recursive = TRUE) # clean after me Of course, ideally untar() should do that for us and I'm testing a simple patch to do that. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with model.frame()
> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel > wrote: > > A user sent me an example where coxph fails, and the root of the failure is a > case where names(mf) is not equal to the term.labels attribute of the formula > -- the latter has an extraneous newline. Here is an example that does not use > the survival library. > > # first create a data set with many long names > n <- 30 # number of rows for the dummy data set > vname <- vector("character", 26) > for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long > variable names > > tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) > names(tdata) <- c('y', vname) > > # Use it in a formula > myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") > mf <- model.frame(formula(myform), data=tdata) > > match(attr(terms(mf), "term.labels"), names(mf)) # gives NA > > > > In the user's case the function is ridge(x1, x2, ) rather than cbind, but > the effect is the same. > Any ideas for a work around? Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), collapse = " "))[-1L] ?? > > Aside: the ridge() function is very simple, it was added as an example to > show how a user can add their own penalization to coxph. I never expected > serious use of it. For this particular user the best answer is to use glmnet > instead. He/she is trying to apply an L2 penalty to a large number of SNP * > covariate interactions. > > Terry T. > HTH, Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] possible internal (un)tar bug
TLDR: Use gzfile(), not file() .. and you have no problems. > Martin Maechler > on Tue, 1 May 2018 16:39:57 +0200 writes: > Martin Maechler > on Tue, 1 May 2018 16:14:43 +0200 writes: > Gábor Csárdi > on Tue, 1 May 2018 12:05:32 + writes: >>> This is a not too old R-devel on Linux, it already fails >>> in R 3.4.4, and on macOS as well. >> and fails in considerably older R versions, too. >> Basically untar() seems to fail on a connection, but works >> fine on a plain file name. > Well, there's an easy workaround: If you want to use a > connection (instead of a simple filename) with untar() and want > to use compression (as in the example), you > can currently do that easily when you ensure the connection is > a "gzcon" one : > ##=> Workaround for now: > ## Create : > setwd(tempdir()) ; dir.create("pkg") > cat("this: that\n", file = file.path("pkg", "DESCRIPTION")) > tf <- "pkg_1.0.tar.gz" > tar(tf, "pkg", compression = "gzip", tar = "internal") > unlink("pkg", recursive = TRUE) > ## As it is a compressed tar file, use it via a gzcon() connection, > ## and both cases work fine: > con <- gzcon(file(tf, open = "rb")) ; (f <- untar(con, list = TRUE)) > ## ~ > con <- gzcon(file(tf, open = "rb")) ; untar(con, files = f) > stopifnot(identical(f, "pkg/DESCRIPTION"), > file.exists(f)) > unlink(c(tf,"pkg"), recursive = TRUE) # clean after me Actually, much better than gzcon(file()) is gzfile() The latter works for all compression types that are supported by tar(), not just for gzip compression. In the end, I'd conclude for now that the bug is mostly in the documentation and the unhelpful error message. We could try to "fix" your use case by wrapping the connection by gzcon(.) and that is okay also for uncompressed tar files. However it fails for the newer compression schemes which are all supported via gzfile(). I propose to commit the following change : 1) change the documentation of untar() to say that a connection to a compressed tar file should be created by gzfile(). 2) in the case of a connection which gave the "block error", the error would newly be more helpful, mentioning gzfile(). Currently: > con <- file(tf, open = "rb"); try( untar(con, list = TRUE) ) ## -> Error Error in untar2(tarfile, files, list, exdir, restore_times) : incomplete block: rather use gzfile(.) created connection? > Feedback (by anyone) ?? Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] debugonce() functions are not considered as debugged
Gabor, Others can speak to the origins of this more directly, but from what I recall this has been true at least since I was working in this space on the debugcall stuff a couple years ago. I imagine the reasoning is what you would expect: a single bit of course can't tell R both that a function is debugged AND that it should undebug after the first call. I don't know of any R-facing way to check for debugonce status, though its possible I missed it That said, it would be possible to alter how the two bits are used so that debugonce sets both of them, and debug (not once) only sets one, rather them being treated as mutually exclusive. This would alter the behavior so that debugonce'ed functions that haven't been called yet are considered debugged, e.g., by isdebugged. This would not, strictly speaking, be backwards compatible, but by the very nature of what debugging means, it would not break any existing script code. It could, and likely would, effect code implementing GUIs, however. R-core - is this a patch that you are interested in and would consider incorporating? If so I can volunteer to work on it. Best, ~G On Sat, Apr 28, 2018 at 4:57 AM, Gábor Csárdi wrote: > debugonce() sets a different flag (RSTEP), and this is not queried by > isdebugged(), and it is also not unset by undebug(). > > Is this expected? If yes, is there a way to query and unset the RSTEP flag > from R code? > > ❯ f <- function() { } > ❯ debugonce(f) > ❯ isdebugged(f) > [1] FALSE > > ❯ undebug(f) > Warning message: > In undebug(f) : argument is not being debugged > > ❯ f() > debugging in: f() > debug at #1: { > } > Browse[2]> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Gabriel Becker, Ph.D Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: issue with model.frame()
Great catch. I'm very reluctant to use my own model.frame, since that locks me into tracking all the base R changes, potentially breaking survival in a bad way if I miss one. But, this shows me clearly what the issue is and will allow me to think about it. Another solution for the user is to use multiple ridge() calls to break it up; since he/she was using a fixed tuning parameter the result is the same. Terry T. On 05/01/2018 11:43 AM, Berry, Charles wrote: On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel wrote: A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. # first create a data set with many long names n <- 30 # number of rows for the dummy data set vname <- vector("character", 26) for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) names(tdata) <- c('y', vname) # Use it in a formula myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") mf <- model.frame(formula(myform), data=tdata) match(attr(terms(mf), "term.labels"), names(mf)) # gives NA In the user's case the function is ridge(x1, x2, ) rather than cbind, but the effect is the same. Any ideas for a work around? Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), collapse = " "))[-1L] ?? Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. Terry T. HTH, Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with model.frame()
You run into the same problem when using 'non-syntactical' names: > mfB <- model.frame(y ~ `Temp(C)` + `Pres(mb)`, data=data.frame(check.names=FALSE, y=1:10, `Temp(C)`=21:30, `Pres(mb)`=991:1000)) > match(attr(terms(mfB), "term.labels"), names(mfB)) # gives NA's [1] NA NA > attr(terms(mfB), "term.labels") [1] "`Temp(C)`" "`Pres(mb)`" > names(mfB) [1] "y""Temp(C)" "Pres(mb)" Note that names(mfB) does not give a hint as whether they represent R expressions or not (in this case they do not). When they do represent R expressions then one could parse() them and compare them to as.list(attr(mfB),"variables")[-1]). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, May 1, 2018 at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel < r-devel@r-project.org> wrote: > A user sent me an example where coxph fails, and the root of the failure > is a case where names(mf) is not equal to the term.labels attribute of the > formula -- the latter has an extraneous newline. Here is an example that > does not use the survival library. > > # first create a data set with many long names > n <- 30 # number of rows for the dummy data set > vname <- vector("character", 26) > for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # > long variable names > > tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) > names(tdata) <- c('y', vname) > > # Use it in a formula > myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") > mf <- model.frame(formula(myform), data=tdata) > > match(attr(terms(mf), "term.labels"), names(mf)) # gives NA > > > > In the user's case the function is ridge(x1, x2, ) rather than cbind, > but the effect is the same. > Any ideas for a work around? > > Aside: the ridge() function is very simple, it was added as an example to > show how a user can add their own penalization to coxph. I never expected > serious use of it. For this particular user the best answer is to use > glmnet instead. He/she is trying to apply an L2 penalty to a large number > of SNP * covariate interactions. > > Terry T. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [EXTERNAL] Re: issue with model.frame()
Unfortunately, I spoke too soon. model.frame calls formula <- terms(formula, data = data) if formula does not inherit from class "terms" as in your case. And that is where the bad terms.labels attribute comes from. So, the fix I suggested won't work. But maybe you can just supply a terms object to model.frame that has correct term.labels. Chuck > On May 1, 2018, at 10:55 AM, Therneau, Terry M., Ph.D. via R-devel > wrote: > > Great catch. I'm very reluctant to use my own model.frame, since that locks > me into tracking all the base R changes, potentially breaking survival in a > bad way if I miss one. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error
I have very little knowledge about file encodings and would like to learn more. I've read the following pages to learn more: https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=HegPJMcZ_5R6vYtdQLgIsh-M6ElOlewHPBZxe8IPSlI&e= https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=KGDvHJrfkvqbwyKnIiY0V45HtN-W4Rpq4ZBXfIFaFMk&e= https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=Ka1kGiCw3w22tOLfA50AyrKsMT-La14TQdutJJkdE04&e= The last one, in particular, has been very helpful. I would be interested in any further references that you suggest. I attach a file that reproduces the issue I would like to learn more about. I do not know if the file encoding will be correctly preserved through email, so I also provide the file (temporarily) on Dropbox here: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=58a7qB9IHt3s2ZLDglGEHwWARuo8xvSlH_z8G5jDaUY&e= The file gives an error when using "source()" with the argument echo = TRUE: > source("encoding_export_issue.R", echo = TRUE) Error in nchar(dep, "c") : invalid multibyte string, element 1 In addition: Warning message: In grepl("^[[:blank:]]*$", dep[1L]) : input string 1 is invalid in this locale The problem comes from the "á" character in the .R file. The file appears to be encoded as "iso-8859-1": $ file --mime-encoding encoding_export_issue.R encoding_export_issue.R: iso-8859-1 Note that for me: > getOption("encoding") [1] "native.enc" so "native.enc" is used for the "encoding" argument of source(). The following two calls succeed: > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") Is this file a valid "iso-8859-1" encoded file? Why does source() fail in the case of encoding set to "native.enc"? Is it because of the settings to UTF-8 in my locale (see info on my system at the bottom of this email). I'm guessing it would be a bad idea to put options(encoding = "unknown") in my .Rprofile, because it is difficult to always correctly guess the encoding of files? Is there a reason why setting it to "unknown" would lead to more problems than leaving it set to "native.enc"? I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below is my session info and locale info for my system with the 3.4.3 version: > sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.3 > Sys.getlocale() [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" Thanks for your time, Scott -- Scott Kostyshak Assistant Professor of Economics University of Florida https://people.clas.ufl.edu/skostyshak/ # Ch?vez quantile_type <- 4 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error
Hi Scott, This question is appropriate for the r-help mailing list, but probably off-topic here on r-devel. Best, Ista On Tue, May 1, 2018 at 2:57 PM, Scott Kostyshak wrote: > I have very little knowledge about file encodings and would like to > learn more. > > I've read the following pages to learn more: > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=HegPJMcZ_5R6vYtdQLgIsh-M6ElOlewHPBZxe8IPSlI&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=KGDvHJrfkvqbwyKnIiY0V45HtN-W4Rpq4ZBXfIFaFMk&e= > > https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=Ka1kGiCw3w22tOLfA50AyrKsMT-La14TQdutJJkdE04&e= > > The last one, in particular, has been very helpful. I would be > interested in any further references that you suggest. > > I attach a file that reproduces the issue I would like to learn more > about. I do not know if the file encoding will be correctly preserved > through email, so I also provide the file (temporarily) on Dropbox here: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0&d=DwIDAw&c=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM&r=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk&m=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw&s=58a7qB9IHt3s2ZLDglGEHwWARuo8xvSlH_z8G5jDaUY&e= > > The file gives an error when using "source()" with the > argument echo = TRUE: > > > source("encoding_export_issue.R", echo = TRUE) > Error in nchar(dep, "c") : invalid multibyte string, element 1 > In addition: Warning message: > In grepl("^[[:blank:]]*$", dep[1L]) : > input string 1 is invalid in this locale > > The problem comes from the "á" character in the .R file. The file > appears to be encoded as "iso-8859-1": > > $ file --mime-encoding encoding_export_issue.R > encoding_export_issue.R: iso-8859-1 > > Note that for me: > > > getOption("encoding") > [1] "native.enc" > > so "native.enc" is used for the "encoding" argument of source(). > > The following two calls succeed: > > > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown") > > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1") > > Is this file a valid "iso-8859-1" encoded file? Why does source() fail > in the case of encoding set to "native.enc"? Is it because of the > settings to UTF-8 in my locale (see info on my system at the bottom of > this email). > > I'm guessing it would be a bad idea to put > > options(encoding = "unknown") > > in my .Rprofile, because it is difficult to always correctly guess the > encoding of files? Is there a reason why setting it to "unknown" would > lead to more problems than leaving it set to "native.enc"? > > I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below > is my session info and locale info for my system with the 3.4.3 version: > >> sessionInfo() > R version 3.4.3 (2017-11-30) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.3 LTS > > Matrix products: default > BLAS: /usr/lib/libblas/libblas.so.3.6.0 > LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.4.3 > >> Sys.getlocale() > [1] > "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C" > > Thanks for your time, > > Scott > > > -- > Scott Kostyshak > Assistant Professor of Economics > University of Florida > https://people.clas.ufl.edu/skostyshak/ > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with model.frame()
> Berry, Charles > on Tue, 1 May 2018 16:43:18 + writes: >> On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel wrote: >> >> A user sent me an example where coxph fails, and the root of the failure is a case where names(mf) is not equal to the term.labels attribute of the formula -- the latter has an extraneous newline. Here is an example that does not use the survival library. >> >> # first create a data set with many long names >> n <- 30 # number of rows for the dummy data set >> vname <- vector("character", 26) >> for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='') # long variable names >> >> tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n)) >> names(tdata) <- c('y', vname) >> >> # Use it in a formula >> myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")") >> mf <- model.frame(formula(myform), data=tdata) >> >> match(attr(terms(mf), "term.labels"), names(mf)) # gives NA >> >> >> >> In the user's case the function is ridge(x1, x2, ) rather than cbind, but the effect is the same. >> Any ideas for a work around? > Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass method where the width cutoff arg here (around lines 57-58 of model.frame.default) is made larger: > varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500), > collapse = " "))[-1L] What version of R is that ? In current versions it is varnames <- vapply(vars, deparse2, " ")[-1L] and deparse2() is a slightly enhanced version of the above function, again with 'width.cutoff = 500' *BUT* if you read help(deparse) you will learn that 500 is the upper bound allowed currently. (and yes, one could consider increasing that as it has been unchanged in R since the very beginning (I have checked R version 0.49 from 1997). On the other hand, deparse2 (and your older code above) do paste all the parts together via collapse = " " so I don't see quite yet ... Martin >> Aside: the ridge() function is very simple, it was added as an example to show how a user can add their own penalization to coxph. I never expected serious use of it. For this particular user the best answer is to use glmnet instead. He/she is trying to apply an L2 penalty to a large number of SNP * covariate interactions. >> >> Terry T. > HTH, > Chuck > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with model.frame()
> On May 1, 2018, at 1:15 PM, Martin Maechler > wrote: > > What version of R is that ? Sorry. It was 3.4.2. But it doesn't matter, because my diagnosis was wrong even there. I think (based on my reading of my outdated version) the problem is a bit upstream in terms() as I noted in a follow up to the Terry. > In current versions it is > >varnames <- vapply(vars, deparse2, " ")[-1L] > > and deparse2() is a slightly enhanced version of the above > function, again with 'width.cutoff = 500' > > *BUT* if you read help(deparse) you will learn that 500 is the > upper bound allowed currently. (and yes, one could consider > increasing that as it has been unchanged in R since the very > beginning (I have checked R version 0.49 from 1997). > > On the other hand, deparse2 (and your older code above) do paste > all the parts together via collapse = " " so I don't see > quite yet ... > Again, due to my bad diagnosis, I guess. Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with model.frame()
I want to add that the priority for this is rather low, since we have a couple of work arounds for the user/data set in question. I have some ideas about changing the way in which ridge() works, which might make the problem moot. The important short-term result was finding that it wasn't an error of mine in the survival package. :-) Add it to your "think about it" list. Terry On 05/01/2018 03:15 PM, Martin Maechler wrote: What version of R is that ? In current versions it is varnames <- vapply(vars, deparse2, " ")[-1L] and deparse2() is a slightly enhanced version of the above function, again with 'width.cutoff = 500' *BUT* if you read help(deparse) you will learn that 500 is the upper bound allowed currently. (and yes, one could consider increasing that as it has been unchanged in R since the very beginning (I have checked R version 0.49 from 1997). On the other hand, deparse2 (and your older code above) do paste all the parts together via collapse = " " so I don't see quite yet ... Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel