Re: [Rd] [PATCH 1/2] readtable: add hook for type conversions per column

2019-03-27 Thread Martin Maechler
> Kurt Van Dijck 
> on Tue, 26 Mar 2019 21:20:07 +0100 writes:

> On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote:
>> Please file a bug on bugzilla so we can discuss this
>> further.

> All fine.  I didn't find a way to create an account on
> bugs.r-project.org.  Did I just not see it? or do I need
> administrator assistance?

> Kind regards, Kurt

--> https://www.r-project.org/bugs.html

Yes, there's some effort involved - for logistic reasons,
but I now find it's a also good thing that you have to read and
understand and then even e-talk to a human in the process.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-27 Thread Hadley Wickham
I would recommend reading https://adv-r.hadley.nz/base-types.html and
https://adv-r.hadley.nz/s3.html. Understanding the distinction between
base types and S3 classes is very important to make this sort of
question precise, and in my experience, you'll find R easier to
understand if you carefully distinguish between them. (And hence you
shouldn't expect is.x(), inherits(, "x") and is(, "x") to always
return the same results)

Also note that many of is.*() functions are not testing for types or
classes, but instead often have more complex semantics. For example,
is.vector() tests for objects with an underlying base vector type that
have no attributes (apart from names). is.numeric() tests for objects
with base type integer or double, and that have the same algebraic
properties as numbers.

Hadley

On Mon, Mar 25, 2019 at 10:28 PM Abs Spurdle  wrote:
>
> > I have noticed a discrepancy between is.list() and is(x, “list”)
>
> There's a similar problem with inherits().
>
> On R 3.5.3:
>
> > f = function () 1
> > class (f) = "f"
>
> > is.function (f)
> [1] TRUE
> > inherits (f, "function")
> [1] FALSE
>
> I didn't check what happens with:
> > class (f) = c ("f", "function")
>
> However, they should have the same result, regardless.
>
> > Is this discrepancy intentional?
>
> I hope not.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Proposal to mitigate problem with stray processes left behind by parallel::makeCluster()

2019-03-27 Thread Tomas Kalibera



The problem causing the stray worker processes when the master fails to 
open a server socket to listen to connections from workers is not 
related to timeout in socketConnection(), because socketConnection() 
will fail right away. It is caused by a bug in checking the setup 
timeout (PR 17391).


Fixed in 76275.

Best
Tomas

On 3/18/19 2:23 AM, Henrik Bengtsson wrote:

(Bcc: CRAN)

This is a proposal helping CRAN and alike as well as individual
developers to avoid stray R processes being left behind that might be
produced when an example or a package test fails to set up a
parallel::makeCluster().


ISSUE

If a package test sets up a PSOCK cluster and then the master process
dies for one reason or the other, the PSOCK worker processes will
remain running for 30 days ('timeout') until they timeout and
terminate that way.  When this happens on CRAN servers, where many
packages are checked all the time, this will result in a lot of stray
R processes.

Here is an example illustrating how R leaves behind stray R processes
if fails to establish a connection to one or more background R
processes launched by 'parallel::makeCluster()'.  First, let's make
sure there are no other R processes running:

   $ ps aux | grep -E "exec[/]R"

Then, lets create a PSOCK cluster for which connection will fail
(because port 80 is reserved):

   $ Rscript -e 'parallel::makeCluster(1L, port=80)'
   Error in socketConnection("localhost", port = port, server = TRUE,
blocking = TRUE,  :
 cannot open the connection
   Calls:  ... makePSOCKcluster -> newPSOCKnode -> socketConnection
   In addition: Warning message:
   In socketConnection("localhost", port = port, server = TRUE,
blocking = TRUE,  :
 port 80 cannot be opened

The launched R worker is still running:

   $ ps aux | grep -E "exec[/]R"
   hb   20778 37.0  0.4 283092 70624 pts/0S17:50   0:00
/usr/lib/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK()
--args MASTER=localhost PORT=80 OUT=/dev/null SETUPTIMEOUT=120
TIMEOUT=2 592000 XDR=TRUE

This process will keep running for 'TIMEOUT=2592000' seconds (= 30
days).  The reason for this is that it is currently in the state where
it attempts to set up a connection to the main R process:

   > parallel:::.slaveRSOCK
   function ()
   {
   makeSOCKmaster <- function(master, port, setup_timeout, timeout,
   useXDR) {
...
   repeat {
   con <- tryCatch({
   socketConnection(master, port = port, blocking = TRUE,
 open = "a+b", timeout = timeout)
   }, error = identity)
   ...
   }

In other words, it is stuck in 'socketConnection()' and it won't time
out until 'timeout' seconds.


SUGGESTION

To mitigate the problem with above stray processes from running 'R CMD
check', we could shorten the 'timeout' which is currently hardcoded to
30 days (src/library/parallel/R/snow.R).  By making it possible to
control the default via environment variables, e.g.

   setup_timeout = as.numeric(Sys.getenv("R_PARALLEL_SETUP_TIMEOUT", 60
* 2)),  # 2 minutes
   timeout = as.numeric(Sys.getenv("R_PARALLEL_SETUP_TIMEOUT", 60 * 60
* 24 * 30)), # 30 days

it would be straightforward to adjust `R CMD check` to use, say,

   R_PARALLEL_SETUP_TIMEOUT=60

by default.  This would cause any stray processes to time out after 60
seconds (instead of 30 days as now).

/Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Kurt Van Dijck
Thank you for your answers.
I rather do not file a new bug, since what I coded isn't really a bug.

The problem I (my colleagues) have today is very stupid:
We read .csv files with a lot of columns, of which most contain
date-time stamps, coded in DD/MM/ HH:MM.
This is not exotic, but the base library's readtable (and derivatives)
only accept date-times in a limited number of possible formats (which I
understand very well).

We could specify a format in a rather complicated format, for each
column individually, but this syntax is rather difficult to maintain.

My solution to this specific problem became trivial, yet generic
extension to read.table.
Rather than relying on the built-in type detection, I added a parameter
to a function that will be called for each to-be-type-probed column so I
can overrule the built-in limited default.
If nothing returns from the function, the built-in default is still
used.

This way, I could construct a type-probing function that is
straight-forward, not hard to code, and makes reading my .csv files
acceptible in terms of code (read.table parameters).

I'm sure I'm not the only one dealing with such needs, escpecially
date-time formats exist in enormous amounts, but I want to stress here
that my approach is agnostic to my specific problem.

For those asking to 'show me the code', I redirect to my 2nd patch,
where the tests have been extended with my specific problem.

What are your opinions about this?

Kind regards,
Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Michael Lawrence via R-devel
This has some nice properties:

1) It self-documents the input expectations in a similar manner to
colClasses.
2) The implementation could eventually "push down" the coercion, e.g.,
calling it on each chunk of an iterative read operation.

The implementation needs work though, and I'm not convinced that coercion
failures should fallback gracefully to the default.

Feature requests fall under a "bug" in bugzilla terminology, so please
submit this there. I think I've made you an account.

Thanks,
Michael

On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck <
dev.k...@vandijck-laurijssen.be> wrote:

> Thank you for your answers.
> I rather do not file a new bug, since what I coded isn't really a bug.
>
> The problem I (my colleagues) have today is very stupid:
> We read .csv files with a lot of columns, of which most contain
> date-time stamps, coded in DD/MM/ HH:MM.
> This is not exotic, but the base library's readtable (and derivatives)
> only accept date-times in a limited number of possible formats (which I
> understand very well).
>
> We could specify a format in a rather complicated format, for each
> column individually, but this syntax is rather difficult to maintain.
>
> My solution to this specific problem became trivial, yet generic
> extension to read.table.
> Rather than relying on the built-in type detection, I added a parameter
> to a function that will be called for each to-be-type-probed column so I
> can overrule the built-in limited default.
> If nothing returns from the function, the built-in default is still
> used.
>
> This way, I could construct a type-probing function that is
> straight-forward, not hard to code, and makes reading my .csv files
> acceptible in terms of code (read.table parameters).
>
> I'm sure I'm not the only one dealing with such needs, escpecially
> date-time formats exist in enormous amounts, but I want to stress here
> that my approach is agnostic to my specific problem.
>
> For those asking to 'show me the code', I redirect to my 2nd patch,
> where the tests have been extended with my specific problem.
>
> What are your opinions about this?
>
> Kind regards,
> Kurt
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Ben Bolker
   Just to clarify/amplify: on the bug tracking system there's a
drop-down menu to specify severity, and "enhancement" is one of the
choices, so you don't have to worry that you're misrepresenting your
patch as fixing a bug.

  The fact that an R-core member (Michael Lawrence) thinks this is
worth looking at is very encouraging (and somewhat unusual for
feature/enhancement suggestions)!

  Ben Bolker

On Wed, Mar 27, 2019 at 5:29 PM Michael Lawrence via R-devel
 wrote:
>
> This has some nice properties:
>
> 1) It self-documents the input expectations in a similar manner to
> colClasses.
> 2) The implementation could eventually "push down" the coercion, e.g.,
> calling it on each chunk of an iterative read operation.
>
> The implementation needs work though, and I'm not convinced that coercion
> failures should fallback gracefully to the default.
>
> Feature requests fall under a "bug" in bugzilla terminology, so please
> submit this there. I think I've made you an account.
>
> Thanks,
> Michael
>
> On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck <
> dev.k...@vandijck-laurijssen.be> wrote:
>
> > Thank you for your answers.
> > I rather do not file a new bug, since what I coded isn't really a bug.
> >
> > The problem I (my colleagues) have today is very stupid:
> > We read .csv files with a lot of columns, of which most contain
> > date-time stamps, coded in DD/MM/ HH:MM.
> > This is not exotic, but the base library's readtable (and derivatives)
> > only accept date-times in a limited number of possible formats (which I
> > understand very well).
> >
> > We could specify a format in a rather complicated format, for each
> > column individually, but this syntax is rather difficult to maintain.
> >
> > My solution to this specific problem became trivial, yet generic
> > extension to read.table.
> > Rather than relying on the built-in type detection, I added a parameter
> > to a function that will be called for each to-be-type-probed column so I
> > can overrule the built-in limited default.
> > If nothing returns from the function, the built-in default is still
> > used.
> >
> > This way, I could construct a type-probing function that is
> > straight-forward, not hard to code, and makes reading my .csv files
> > acceptible in terms of code (read.table parameters).
> >
> > I'm sure I'm not the only one dealing with such needs, escpecially
> > date-time formats exist in enormous amounts, but I want to stress here
> > that my approach is agnostic to my specific problem.
> >
> > For those asking to 'show me the code', I redirect to my 2nd patch,
> > where the tests have been extended with my specific problem.
> >
> > What are your opinions about this?
> >
> > Kind regards,
> > Kurt
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Discrepancy between is.list() and is(x, "list")

2019-03-27 Thread Abs Spurdle
> the prison made by ancient design choices

That prison of ancient design choices isn't so bad.

I have no further comments on object oriented semantics.
However, I'm planning to follow the following design pattern.

If I set the class of an object, I will append the new class to the
existing class.

#good
class (object) = c ("something", class (object) )

#bad
class (object) = "something"

I encourage others to do the same.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] issue with latest release of R-devel

2019-03-27 Thread Therneau, Terry M., Ph.D. via R-devel
I'm getting ready to submit an update of survival, and is my habit I run the 
checks on all 
packages that depend/import/suggest  survival.  I am getting some very odd 
behaviour wrt 
non-reproducability.  It came to a head when some things failed on one machine 
and worked 
on another.   I found that the difference was that the failure was using the 
3/27 release 
and the success was still on a late Jan release.   When I updated R on the 
latter machine 
it now fails too.

An example is the test cases in genfrail.Rd, in the frailtySurv package.   (The 
package 
depends on survival, but I'm fairly sure that this function does not.)   It's a 
fairly 
simple function to generate test data sets, with a half dozen calls in the test 
file.  If 
you cut and paste the whole batch into an R session, the last one of them 
fails.  But if 
you run that call by itself it works.   This yes/no behavior is reproducable.

Another puzzler was the ranger package.  In the tests/testthat directory,  
source('test_maxstat') fails if it is preceeded by source('test_jackknife'), 
but not 
otherwise.  Again, I don't think the survival package is implicated in either 
of these tests.

Another package that succeeded under the older r-devel and now fails is 
arsenal, but I 
haven't looked deeply at that.

Any insight would be be appreciated.

Terry T.



Here is the sessionInfo() for one of the machines.  The other is running 
xubuntu 18 LTS.  
(It's at the office, and I can send that tomorrow when I get in.)

R Under development (unstable) (2019-03-28 r76277)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.6.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] issue with latest release of R-devel

2019-03-27 Thread Henrik Bengtsson
Could this be related to

"SIGNIFICANT USER-VISIBLE CHANGES

The default method for generating from a discrete uniform distribution
(used in sample(), for instance) has been changed. This addresses the
fact, pointed out by Ottoboni and Stark, that the previous method made
sample() noticeably non-uniform on large populations. See PR#17494 for
a discussion. The previous method can be requested using RNGkind() or
RNGversion() if necessary for reproduction of old results. Thanks to
Duncan Murdoch for contributing the patch and Gabe Becker for further
assistance."

If so, testing with

   export _R_RNG_VERSION_=3.5.0

might remove/explain those errors.

Just a thought

Henrik

On Wed, Mar 27, 2019 at 8:16 PM Therneau, Terry M., Ph.D. via R-devel
 wrote:
>
> I'm getting ready to submit an update of survival, and is my habit I run the 
> checks on all
> packages that depend/import/suggest  survival.  I am getting some very odd 
> behaviour wrt
> non-reproducability.  It came to a head when some things failed on one 
> machine and worked
> on another.   I found that the difference was that the failure was using the 
> 3/27 release
> and the success was still on a late Jan release.   When I updated R on the 
> latter machine
> it now fails too.
>
> An example is the test cases in genfrail.Rd, in the frailtySurv package.   
> (The package
> depends on survival, but I'm fairly sure that this function does not.)   It's 
> a fairly
> simple function to generate test data sets, with a half dozen calls in the 
> test file.  If
> you cut and paste the whole batch into an R session, the last one of them 
> fails.  But if
> you run that call by itself it works.   This yes/no behavior is reproducable.
>
> Another puzzler was the ranger package.  In the tests/testthat directory,
> source('test_maxstat') fails if it is preceeded by source('test_jackknife'), 
> but not
> otherwise.  Again, I don't think the survival package is implicated in either 
> of these tests.
>
> Another package that succeeded under the older r-devel and now fails is 
> arsenal, but I
> haven't looked deeply at that.
>
> Any insight would be be appreciated.
>
> Terry T.
> 
>
>
> Here is the sessionInfo() for one of the machines.  The other is running 
> xubuntu 18 LTS.
> (It's at the office, and I can send that tomorrow when I get in.)
>
> R Under development (unstable) (2019-03-28 r76277)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.6 LTS
>
> Matrix products: default
> BLAS:   /usr/local/src/R-devel/lib/libRblas.so
> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
>   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>   [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.0
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SUGGESTION: Proposal to mitigate problem with stray processes left behind by parallel::makeCluster()

2019-03-27 Thread Henrik Bengtsson
Thank you Tomas.

For the record, I'm confirming that the stray background R worker
process now times out properly after 'setup_timeout' (= 120) seconds:

{0s}$ Rscript -e 'parallel::makeCluster(1L, port=80)'
Error in socketConnection("localhost", port = port, server = TRUE,
blocking = TRUE,  :
  cannot open the connection
Calls:  ... makePSOCKcluster -> newPSOCKnode -> socketConnection
In addition: Warning message:
In socketConnection("localhost", port = port, server = TRUE, blocking = TRUE,  :
  port 80 cannot be opened
Execution halted
{1s}$ ps aux | grep -E "exec[/]R"
hb   17645  2.0  0.3 259104 55144 pts/5S20:58   0:00
/home/hb/software/R-devel/trunk/lib/R/bin/exec/R --slave --no-restore
-e parallel:::.slaveRSOCK() --args MASTER=localhost PORT=80
OUT=/dev/null SETUPTIMEOUT=120 TIMEOUT=2592000 XDR=TRUE
{2s}$ sleep 120
{122s}$ ps aux | grep -E "exec[/]R"
{122s}$

Good spotting of the bug:

-  if (Sys.time() - t0 > setup_timeout) break
+  if (difftime(Sys.time(), t0, units="secs") > setup_timeout) break

For those who find this thread, I think what's going on here is that
'setup_timeout = 120' is a numeric that is compared a 'difftime' than
keeps changing unit as times goes by.  When compared as 'Sys.time() -
t0 > setup_timeout' the LHS would be in units of seconds as long as
less than 60 seconds had passed:

> Sys.time() - t0
Time difference of 59 secs
> as.numeric(Sys.time() - t0)
[1] 59

However, as soon as more than 60 seconds has passed, the unit turns
into minutes and we're comparing minutes to seconds:

> Sys.time() - t0
Time difference of 1.016667 mins
> as.numeric(Sys.time() - t0)
[1] 1.016667

which is now compared to 'setup_timeout'.  If the unit remained to be
minutes it would timeout after 120 [minutes]. However, after 120
minutes, the unit of Sys.time() - t0 is in hours, and we're comparing
hours to seconds, and so on.  It would only timeout if we used
'setup_timeout' < 60 seconds.

/Henrik

On Wed, Mar 27, 2019 at 12:52 PM Tomas Kalibera
 wrote:
>
>
> The problem causing the stray worker processes when the master fails to
> open a server socket to listen to connections from workers is not
> related to timeout in socketConnection(), because socketConnection()
> will fail right away. It is caused by a bug in checking the setup
> timeout (PR 17391).
>
> Fixed in 76275.
>
> Best
> Tomas
>
> On 3/18/19 2:23 AM, Henrik Bengtsson wrote:
> > (Bcc: CRAN)
> >
> > This is a proposal helping CRAN and alike as well as individual
> > developers to avoid stray R processes being left behind that might be
> > produced when an example or a package test fails to set up a
> > parallel::makeCluster().
> >
> >
> > ISSUE
> >
> > If a package test sets up a PSOCK cluster and then the master process
> > dies for one reason or the other, the PSOCK worker processes will
> > remain running for 30 days ('timeout') until they timeout and
> > terminate that way.  When this happens on CRAN servers, where many
> > packages are checked all the time, this will result in a lot of stray
> > R processes.
> >
> > Here is an example illustrating how R leaves behind stray R processes
> > if fails to establish a connection to one or more background R
> > processes launched by 'parallel::makeCluster()'.  First, let's make
> > sure there are no other R processes running:
> >
> >$ ps aux | grep -E "exec[/]R"
> >
> > Then, lets create a PSOCK cluster for which connection will fail
> > (because port 80 is reserved):
> >
> >$ Rscript -e 'parallel::makeCluster(1L, port=80)'
> >Error in socketConnection("localhost", port = port, server = TRUE,
> > blocking = TRUE,  :
> >  cannot open the connection
> >Calls:  ... makePSOCKcluster -> newPSOCKnode -> 
> > socketConnection
> >In addition: Warning message:
> >In socketConnection("localhost", port = port, server = TRUE,
> > blocking = TRUE,  :
> >  port 80 cannot be opened
> >
> > The launched R worker is still running:
> >
> >$ ps aux | grep -E "exec[/]R"
> >hb   20778 37.0  0.4 283092 70624 pts/0S17:50   0:00
> > /usr/lib/R/bin/exec/R --slave --no-restore -e parallel:::.slaveRSOCK()
> > --args MASTER=localhost PORT=80 OUT=/dev/null SETUPTIMEOUT=120
> > TIMEOUT=2 592000 XDR=TRUE
> >
> > This process will keep running for 'TIMEOUT=2592000' seconds (= 30
> > days).  The reason for this is that it is currently in the state where
> > it attempts to set up a connection to the main R process:
> >
> >> parallel:::.slaveRSOCK
> >function ()
> >{
> >makeSOCKmaster <- function(master, port, setup_timeout, timeout,
> >useXDR) {
> > ...
> >repeat {
> >con <- tryCatch({
> >socketConnection(master, port = port, blocking = TRUE,
> >  open = "a+b", timeout = timeout)
> >}, error = identity)
> >...
> >}
> >
> > In other words, it is stuck in 'socketConnection()' and it won't time
> > out until 'timeout' seconds.
> >

Re: [Rd] default for 'signif.stars'

2019-03-27 Thread Abs Spurdle
I read through the editorial.
This is the one of the most mega-ultra-super-biased articles I've ever read.

e.g.
The authors encourage Baysian methods, and literally encourage subjective
approaches.
However, there's only one reference to robust methods and one reference to
nonparametric methods, both of which are labelled as purely exploratory
methods, which I regard as extremely offensive.
And there don't appear to be any references to semiparameric methods, or
machine learning.

Surprisingly, they encourage multiple testing, however, don't mention the
multiple comparison problem.
Something I can't understand at all.

So, maybe we should replace signif.stars with emoji...?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Kurt Van Dijck
Hey,

In the meantime, I submitted a bug. Thanks for the assistence on that.

>and I'm not convinced that
>coercion failures should fallback gracefully to the default.

the gracefull fallback:
- makes the code more complex
+ keeps colConvert implementations limited
+ requires the user to only implement what changed from the default
+ seemed to me to smallest overall effort

In my opinion, gracefull fallback makes the thing better,
but without it, the colConvert parameter remains usefull, it would still
fill a gap.

>The implementation needs work though,

Other than to remove the gracefull fallback?

Kind regards,
Kurt

On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote:
>This has some nice properties:
>1) It self-documents the input expectations in a similar manner to
>colClasses.
>2) The implementation could eventually "push down" the coercion, e.g.,
>calling it on each chunk of an iterative read operation.
>The implementation needs work though, and I'm not convinced that
>coercion failures should fallback gracefully to the default.
>Feature requests fall under a "bug" in bugzilla terminology, so please
>submit this there. I think I've made you an account.
>Thanks,
>Michael
> 
>On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck
><[1]dev.k...@vandijck-laurijssen.be> wrote:
> 
>  Thank you for your answers.
>  I rather do not file a new bug, since what I coded isn't really a
>  bug.
>  The problem I (my colleagues) have today is very stupid:
>  We read .csv files with a lot of columns, of which most contain
>  date-time stamps, coded in DD/MM/ HH:MM.
>  This is not exotic, but the base library's readtable (and
>  derivatives)
>  only accept date-times in a limited number of possible formats
>  (which I
>  understand very well).
>  We could specify a format in a rather complicated format, for each
>  column individually, but this syntax is rather difficult to
>  maintain.
>  My solution to this specific problem became trivial, yet generic
>  extension to read.table.
>  Rather than relying on the built-in type detection, I added a
>  parameter
>  to a function that will be called for each to-be-type-probed column
>  so I
>  can overrule the built-in limited default.
>  If nothing returns from the function, the built-in default is still
>  used.
>  This way, I could construct a type-probing function that is
>  straight-forward, not hard to code, and makes reading my .csv files
>  acceptible in terms of code (read.table parameters).
>  I'm sure I'm not the only one dealing with such needs, escpecially
>  date-time formats exist in enormous amounts, but I want to stress
>  here
>  that my approach is agnostic to my specific problem.
>  For those asking to 'show me the code', I redirect to my 2nd patch,
>  where the tests have been extended with my specific problem.
>  What are your opinions about this?
>  Kind regards,
>  Kurt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [RFC] readtable enhancement

2019-03-27 Thread Gabriel Becker
Kurt,

Cool idea and great "seeing new faces" on here proposing things on here and
engaging with R-core on here.

Some comments on the issue of fallbacks below.


On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck <
dev.k...@vandijck-laurijssen.be> wrote:

> Hey,
>
> In the meantime, I submitted a bug. Thanks for the assistence on that.
>
> >and I'm not convinced that
> >coercion failures should fallback gracefully to the default.
>
> the gracefull fallback:
> - makes the code more complex
> + keeps colConvert implementations limited
> + requires the user to only implement what changed from the default
> + seemed to me to smallest overall effort
>
> In my opinion, gracefull fallback makes the thing better,
> but without it, the colConvert parameter remains usefull, it would still
> fill a gap.
>

Another way of viewing coercion failure, I think, is that either the
user-supplied converter has a bug in it or was mistakenly applied in a
situation where it shouldn't have been. If thats the case the fail early
and loud paradigm might ultimately be more helpful to users there.

Another thought in the same vein is that if fallback occurs, the returned
result will not be what the user asked for and is expecting. So either
their code which assumes (e.g., that a column has correctly parsed as a
date) is going to break in mysterious (to them) ways, or they have to put a
bunch of their own checking logic after the call to see if their converters
actually worked in order to protect themselves from that.  Neither really
seems ideal to me; I think an error would be better, myself. I'm more of a
software developer than a script writer/analyst though, so its possible
others' opinions would differ (though I'd be a bit surprised by that in
this particular case given the danger).

Best,
~G

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel