Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column
> Kurt Van Dijck > on Tue, 26 Mar 2019 21:20:07 +0100 writes: > On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote: >> Please file a bug on bugzilla so we can discuss this >> further. > All fine. I didn't find a way to create an account on > bugs.r-project.org. Did I just not see it? or do I need > administrator assistance? > Kind regards, Kurt --> https://www.r-project.org/bugs.html Yes, there's some effort involved - for logistic reasons, but I now find it's a also good thing that you have to read and understand and then even e-talk to a human in the process. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Discrepancy between is.list() and is(x, "list")
I would recommend reading https://adv-r.hadley.nz/base-types.html and https://adv-r.hadley.nz/s3.html. Understanding the distinction between base types and S3 classes is very important to make this sort of question precise, and in my experience, you'll find R easier to understand if you carefully distinguish between them. (And hence you shouldn't expect is.x(), inherits(, "x") and is(, "x") to always return the same results) Also note that many of is.*() functions are not testing for types or classes, but instead often have more complex semantics. For example, is.vector() tests for objects with an underlying base vector type that have no attributes (apart from names). is.numeric() tests for objects with base type integer or double, and that have the same algebraic properties as numbers. Hadley On Mon, Mar 25, 2019 at 10:28 PM Abs Spurdle wrote: > > > I have noticed a discrepancy between is.list() and is(x, “list”) > > There's a similar problem with inherits(). > > On R 3.5.3: > > > f = function () 1 > > class (f) = "f" > > > is.function (f) > [1] TRUE > > inherits (f, "function") > [1] FALSE > > I didn't check what happens with: > > class (f) = c ("f", "function") > > However, they should have the same result, regardless. > > > Is this discrepancy intentional? > > I hope not. > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Proposal to mitigate problem with stray processes left behind by parallel::makeCluster()
The problem causing the stray worker processes when the master fails to open a server socket to listen to connections from workers is not related to timeout in socketConnection(), because socketConnection() will fail right away. It is caused by a bug in checking the setup timeout (PR 17391). Fixed in 76275. Best Tomas On 3/18/19 2:23 AM, Henrik Bengtsson wrote: (Bcc: CRAN) This is a proposal helping CRAN and alike as well as individual developers to avoid stray R processes being left behind that might be produced when an example or a package test fails to set up a parallel::makeCluster(). ISSUE If a package test sets up a PSOCK cluster and then the master process dies for one reason or the other, the PSOCK worker processes will remain running for 30 days ('timeout') until they timeout and terminate that way. When this happens on CRAN servers, where many packages are checked all the time, this will result in a lot of stray R processes. Here is an example illustrating how R leaves behind stray R processes if fails to establish a connection to one or more background R processes launched by 'parallel::makeCluster()'. First, let's make sure there are no other R processes running: $ ps aux | grep -E "exec[/]R" Then, lets create a PSOCK cluster for which connection will fail (because port 80 is reserved): $ Rscript -e 'parallel::makeCluster(1L, port=80)' Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : cannot open the connection Calls: ... makePSOCKcluster -> newPSOCKnode -> socketConnection In addition: Warning message: In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : port 80 cannot be opened The launched R worker is still running: $ ps aux | grep -E "exec[/]R" hb 20778 37.0 0.4 283092 70624 pts/0S17:50 0:00 /usr/lib/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=80 OUT=/dev/null SETUPTIMEOUT=120 TIMEOUT=2 592000 XDR=TRUE This process will keep running for 'TIMEOUT=2592000' seconds (= 30 days). The reason for this is that it is currently in the state where it attempts to set up a connection to the main R process: > parallel:::.slaveRSOCK function () { makeSOCKmaster <- function(master, port, setup_timeout, timeout, useXDR) { ... repeat { con <- tryCatch({ socketConnection(master, port = port, blocking = TRUE, open = "a+b", timeout = timeout) }, error = identity) ... } In other words, it is stuck in 'socketConnection()' and it won't time out until 'timeout' seconds. SUGGESTION To mitigate the problem with above stray processes from running 'R CMD check', we could shorten the 'timeout' which is currently hardcoded to 30 days (src/library/parallel/R/snow.R). By making it possible to control the default via environment variables, e.g. setup_timeout = as.numeric(Sys.getenv("R_PARALLEL_SETUP_TIMEOUT", 60 * 2)), # 2 minutes timeout = as.numeric(Sys.getenv("R_PARALLEL_SETUP_TIMEOUT", 60 * 60 * 24 * 30)), # 30 days it would be straightforward to adjust `R CMD check` to use, say, R_PARALLEL_SETUP_TIMEOUT=60 by default. This would cause any stray processes to time out after 60 seconds (instead of 30 days as now). /Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Thank you for your answers. I rather do not file a new bug, since what I coded isn't really a bug. The problem I (my colleagues) have today is very stupid: We read .csv files with a lot of columns, of which most contain date-time stamps, coded in DD/MM/ HH:MM. This is not exotic, but the base library's readtable (and derivatives) only accept date-times in a limited number of possible formats (which I understand very well). We could specify a format in a rather complicated format, for each column individually, but this syntax is rather difficult to maintain. My solution to this specific problem became trivial, yet generic extension to read.table. Rather than relying on the built-in type detection, I added a parameter to a function that will be called for each to-be-type-probed column so I can overrule the built-in limited default. If nothing returns from the function, the built-in default is still used. This way, I could construct a type-probing function that is straight-forward, not hard to code, and makes reading my .csv files acceptible in terms of code (read.table parameters). I'm sure I'm not the only one dealing with such needs, escpecially date-time formats exist in enormous amounts, but I want to stress here that my approach is agnostic to my specific problem. For those asking to 'show me the code', I redirect to my 2nd patch, where the tests have been extended with my specific problem. What are your opinions about this? Kind regards, Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
This has some nice properties: 1) It self-documents the input expectations in a similar manner to colClasses. 2) The implementation could eventually "push down" the coercion, e.g., calling it on each chunk of an iterative read operation. The implementation needs work though, and I'm not convinced that coercion failures should fallback gracefully to the default. Feature requests fall under a "bug" in bugzilla terminology, so please submit this there. I think I've made you an account. Thanks, Michael On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck < dev.k...@vandijck-laurijssen.be> wrote: > Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a bug. > > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/ HH:MM. > This is not exotic, but the base library's readtable (and derivatives) > only accept date-times in a limited number of possible formats (which I > understand very well). > > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to maintain. > > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a parameter > to a function that will be called for each to-be-type-probed column so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress here > that my approach is agnostic to my specific problem. > > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > > What are your opinions about this? > > Kind regards, > Kurt > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Just to clarify/amplify: on the bug tracking system there's a drop-down menu to specify severity, and "enhancement" is one of the choices, so you don't have to worry that you're misrepresenting your patch as fixing a bug. The fact that an R-core member (Michael Lawrence) thinks this is worth looking at is very encouraging (and somewhat unusual for feature/enhancement suggestions)! Ben Bolker On Wed, Mar 27, 2019 at 5:29 PM Michael Lawrence via R-devel wrote: > > This has some nice properties: > > 1) It self-documents the input expectations in a similar manner to > colClasses. > 2) The implementation could eventually "push down" the coercion, e.g., > calling it on each chunk of an iterative read operation. > > The implementation needs work though, and I'm not convinced that coercion > failures should fallback gracefully to the default. > > Feature requests fall under a "bug" in bugzilla terminology, so please > submit this there. I think I've made you an account. > > Thanks, > Michael > > On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck < > dev.k...@vandijck-laurijssen.be> wrote: > > > Thank you for your answers. > > I rather do not file a new bug, since what I coded isn't really a bug. > > > > The problem I (my colleagues) have today is very stupid: > > We read .csv files with a lot of columns, of which most contain > > date-time stamps, coded in DD/MM/ HH:MM. > > This is not exotic, but the base library's readtable (and derivatives) > > only accept date-times in a limited number of possible formats (which I > > understand very well). > > > > We could specify a format in a rather complicated format, for each > > column individually, but this syntax is rather difficult to maintain. > > > > My solution to this specific problem became trivial, yet generic > > extension to read.table. > > Rather than relying on the built-in type detection, I added a parameter > > to a function that will be called for each to-be-type-probed column so I > > can overrule the built-in limited default. > > If nothing returns from the function, the built-in default is still > > used. > > > > This way, I could construct a type-probing function that is > > straight-forward, not hard to code, and makes reading my .csv files > > acceptible in terms of code (read.table parameters). > > > > I'm sure I'm not the only one dealing with such needs, escpecially > > date-time formats exist in enormous amounts, but I want to stress here > > that my approach is agnostic to my specific problem. > > > > For those asking to 'show me the code', I redirect to my 2nd patch, > > where the tests have been extended with my specific problem. > > > > What are your opinions about this? > > > > Kind regards, > > Kurt > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Discrepancy between is.list() and is(x, "list")
> the prison made by ancient design choices That prison of ancient design choices isn't so bad. I have no further comments on object oriented semantics. However, I'm planning to follow the following design pattern. If I set the class of an object, I will append the new class to the existing class. #good class (object) = c ("something", class (object) ) #bad class (object) = "something" I encourage others to do the same. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] issue with latest release of R-devel
I'm getting ready to submit an update of survival, and is my habit I run the checks on all packages that depend/import/suggest survival. I am getting some very odd behaviour wrt non-reproducability. It came to a head when some things failed on one machine and worked on another. I found that the difference was that the failure was using the 3/27 release and the success was still on a late Jan release. When I updated R on the latter machine it now fails too. An example is the test cases in genfrail.Rd, in the frailtySurv package. (The package depends on survival, but I'm fairly sure that this function does not.) It's a fairly simple function to generate test data sets, with a half dozen calls in the test file. If you cut and paste the whole batch into an R session, the last one of them fails. But if you run that call by itself it works. This yes/no behavior is reproducable. Another puzzler was the ranger package. In the tests/testthat directory, source('test_maxstat') fails if it is preceeded by source('test_jackknife'), but not otherwise. Again, I don't think the survival package is implicated in either of these tests. Another package that succeeded under the older r-devel and now fails is arsenal, but I haven't looked deeply at that. Any insight would be be appreciated. Terry T. Here is the sessionInfo() for one of the machines. The other is running xubuntu 18 LTS. (It's at the office, and I can send that tomorrow when I get in.) R Under development (unstable) (2019-03-28 r76277) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.6 LTS Matrix products: default BLAS: /usr/local/src/R-devel/lib/libRblas.so LAPACK: /usr/local/src/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.0 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] issue with latest release of R-devel
Could this be related to "SIGNIFICANT USER-VISIBLE CHANGES The default method for generating from a discrete uniform distribution (used in sample(), for instance) has been changed. This addresses the fact, pointed out by Ottoboni and Stark, that the previous method made sample() noticeably non-uniform on large populations. See PR#17494 for a discussion. The previous method can be requested using RNGkind() or RNGversion() if necessary for reproduction of old results. Thanks to Duncan Murdoch for contributing the patch and Gabe Becker for further assistance." If so, testing with export _R_RNG_VERSION_=3.5.0 might remove/explain those errors. Just a thought Henrik On Wed, Mar 27, 2019 at 8:16 PM Therneau, Terry M., Ph.D. via R-devel wrote: > > I'm getting ready to submit an update of survival, and is my habit I run the > checks on all > packages that depend/import/suggest survival. I am getting some very odd > behaviour wrt > non-reproducability. It came to a head when some things failed on one > machine and worked > on another. I found that the difference was that the failure was using the > 3/27 release > and the success was still on a late Jan release. When I updated R on the > latter machine > it now fails too. > > An example is the test cases in genfrail.Rd, in the frailtySurv package. > (The package > depends on survival, but I'm fairly sure that this function does not.) It's > a fairly > simple function to generate test data sets, with a half dozen calls in the > test file. If > you cut and paste the whole batch into an R session, the last one of them > fails. But if > you run that call by itself it works. This yes/no behavior is reproducable. > > Another puzzler was the ranger package. In the tests/testthat directory, > source('test_maxstat') fails if it is preceeded by source('test_jackknife'), > but not > otherwise. Again, I don't think the survival package is implicated in either > of these tests. > > Another package that succeeded under the older r-devel and now fails is > arsenal, but I > haven't looked deeply at that. > > Any insight would be be appreciated. > > Terry T. > > > > Here is the sessionInfo() for one of the machines. The other is running > xubuntu 18 LTS. > (It's at the office, and I can send that tomorrow when I get in.) > > R Under development (unstable) (2019-03-28 r76277) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 16.04.6 LTS > > Matrix products: default > BLAS: /usr/local/src/R-devel/lib/libRblas.so > LAPACK: /usr/local/src/R-devel/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8LC_COLLATE=C > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.0 > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] SUGGESTION: Proposal to mitigate problem with stray processes left behind by parallel::makeCluster()
Thank you Tomas. For the record, I'm confirming that the stray background R worker process now times out properly after 'setup_timeout' (= 120) seconds: {0s}$ Rscript -e 'parallel::makeCluster(1L, port=80)' Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : cannot open the connection Calls: ... makePSOCKcluster -> newPSOCKnode -> socketConnection In addition: Warning message: In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, : port 80 cannot be opened Execution halted {1s}$ ps aux | grep -E "exec[/]R" hb 17645 2.0 0.3 259104 55144 pts/5S20:58 0:00 /home/hb/software/R-devel/trunk/lib/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=80 OUT=/dev/null SETUPTIMEOUT=120 TIMEOUT=2592000 XDR=TRUE {2s}$ sleep 120 {122s}$ ps aux | grep -E "exec[/]R" {122s}$ Good spotting of the bug: - if (Sys.time() - t0 > setup_timeout) break + if (difftime(Sys.time(), t0, units="secs") > setup_timeout) break For those who find this thread, I think what's going on here is that 'setup_timeout = 120' is a numeric that is compared a 'difftime' than keeps changing unit as times goes by. When compared as 'Sys.time() - t0 > setup_timeout' the LHS would be in units of seconds as long as less than 60 seconds had passed: > Sys.time() - t0 Time difference of 59 secs > as.numeric(Sys.time() - t0) [1] 59 However, as soon as more than 60 seconds has passed, the unit turns into minutes and we're comparing minutes to seconds: > Sys.time() - t0 Time difference of 1.016667 mins > as.numeric(Sys.time() - t0) [1] 1.016667 which is now compared to 'setup_timeout'. If the unit remained to be minutes it would timeout after 120 [minutes]. However, after 120 minutes, the unit of Sys.time() - t0 is in hours, and we're comparing hours to seconds, and so on. It would only timeout if we used 'setup_timeout' < 60 seconds. /Henrik On Wed, Mar 27, 2019 at 12:52 PM Tomas Kalibera wrote: > > > The problem causing the stray worker processes when the master fails to > open a server socket to listen to connections from workers is not > related to timeout in socketConnection(), because socketConnection() > will fail right away. It is caused by a bug in checking the setup > timeout (PR 17391). > > Fixed in 76275. > > Best > Tomas > > On 3/18/19 2:23 AM, Henrik Bengtsson wrote: > > (Bcc: CRAN) > > > > This is a proposal helping CRAN and alike as well as individual > > developers to avoid stray R processes being left behind that might be > > produced when an example or a package test fails to set up a > > parallel::makeCluster(). > > > > > > ISSUE > > > > If a package test sets up a PSOCK cluster and then the master process > > dies for one reason or the other, the PSOCK worker processes will > > remain running for 30 days ('timeout') until they timeout and > > terminate that way. When this happens on CRAN servers, where many > > packages are checked all the time, this will result in a lot of stray > > R processes. > > > > Here is an example illustrating how R leaves behind stray R processes > > if fails to establish a connection to one or more background R > > processes launched by 'parallel::makeCluster()'. First, let's make > > sure there are no other R processes running: > > > >$ ps aux | grep -E "exec[/]R" > > > > Then, lets create a PSOCK cluster for which connection will fail > > (because port 80 is reserved): > > > >$ Rscript -e 'parallel::makeCluster(1L, port=80)' > >Error in socketConnection("localhost", port = port, server = TRUE, > > blocking = TRUE, : > > cannot open the connection > >Calls: ... makePSOCKcluster -> newPSOCKnode -> > > socketConnection > >In addition: Warning message: > >In socketConnection("localhost", port = port, server = TRUE, > > blocking = TRUE, : > > port 80 cannot be opened > > > > The launched R worker is still running: > > > >$ ps aux | grep -E "exec[/]R" > >hb 20778 37.0 0.4 283092 70624 pts/0S17:50 0:00 > > /usr/lib/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK() > > --args MASTER=localhost PORT=80 OUT=/dev/null SETUPTIMEOUT=120 > > TIMEOUT=2 592000 XDR=TRUE > > > > This process will keep running for 'TIMEOUT=2592000' seconds (= 30 > > days). The reason for this is that it is currently in the state where > > it attempts to set up a connection to the main R process: > > > >> parallel:::.slaveRSOCK > >function () > >{ > >makeSOCKmaster <- function(master, port, setup_timeout, timeout, > >useXDR) { > > ... > >repeat { > >con <- tryCatch({ > >socketConnection(master, port = port, blocking = TRUE, > > open = "a+b", timeout = timeout) > >}, error = identity) > >... > >} > > > > In other words, it is stuck in 'socketConnection()' and it won't time > > out until 'timeout' seconds. > >
Re: [Rd] default for 'signif.stars'
I read through the editorial. This is the one of the most mega-ultra-super-biased articles I've ever read. e.g. The authors encourage Baysian methods, and literally encourage subjective approaches. However, there's only one reference to robust methods and one reference to nonparametric methods, both of which are labelled as purely exploratory methods, which I regard as extremely offensive. And there don't appear to be any references to semiparameric methods, or machine learning. Surprisingly, they encourage multiple testing, however, don't mention the multiple comparison problem. Something I can't understand at all. So, maybe we should replace signif.stars with emoji...? [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Hey, In the meantime, I submitted a bug. Thanks for the assistence on that. >and I'm not convinced that >coercion failures should fallback gracefully to the default. the gracefull fallback: - makes the code more complex + keeps colConvert implementations limited + requires the user to only implement what changed from the default + seemed to me to smallest overall effort In my opinion, gracefull fallback makes the thing better, but without it, the colConvert parameter remains usefull, it would still fill a gap. >The implementation needs work though, Other than to remove the gracefull fallback? Kind regards, Kurt On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote: >This has some nice properties: >1) It self-documents the input expectations in a similar manner to >colClasses. >2) The implementation could eventually "push down" the coercion, e.g., >calling it on each chunk of an iterative read operation. >The implementation needs work though, and I'm not convinced that >coercion failures should fallback gracefully to the default. >Feature requests fall under a "bug" in bugzilla terminology, so please >submit this there. I think I've made you an account. >Thanks, >Michael > >On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck ><[1]dev.k...@vandijck-laurijssen.be> wrote: > > Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a > bug. > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/ HH:MM. > This is not exotic, but the base library's readtable (and > derivatives) > only accept date-times in a limited number of possible formats > (which I > understand very well). > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to > maintain. > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a > parameter > to a function that will be called for each to-be-type-probed column > so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress > here > that my approach is agnostic to my specific problem. > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > What are your opinions about this? > Kind regards, > Kurt __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [RFC] readtable enhancement
Kurt, Cool idea and great "seeing new faces" on here proposing things on here and engaging with R-core on here. Some comments on the issue of fallbacks below. On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck < dev.k...@vandijck-laurijssen.be> wrote: > Hey, > > In the meantime, I submitted a bug. Thanks for the assistence on that. > > >and I'm not convinced that > >coercion failures should fallback gracefully to the default. > > the gracefull fallback: > - makes the code more complex > + keeps colConvert implementations limited > + requires the user to only implement what changed from the default > + seemed to me to smallest overall effort > > In my opinion, gracefull fallback makes the thing better, > but without it, the colConvert parameter remains usefull, it would still > fill a gap. > Another way of viewing coercion failure, I think, is that either the user-supplied converter has a bug in it or was mistakenly applied in a situation where it shouldn't have been. If thats the case the fail early and loud paradigm might ultimately be more helpful to users there. Another thought in the same vein is that if fallback occurs, the returned result will not be what the user asked for and is expecting. So either their code which assumes (e.g., that a column has correctly parsed as a date) is going to break in mysterious (to them) ways, or they have to put a bunch of their own checking logic after the call to see if their converters actually worked in order to protect themselves from that. Neither really seems ideal to me; I think an error would be better, myself. I'm more of a software developer than a script writer/analyst though, so its possible others' opinions would differ (though I'd be a bit surprised by that in this particular case given the danger). Best, ~G > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel