Re: [Rd] A few suggestions and perspectives from a PhD student

Gabor Grothendieck Fri, 05 May 2017 13:33:57 -0700

Regarding the anonymous-function-in-a-pipeline point one can already
do this which does use brackets but even so it involves fewer
characters than the example shown.  Here { . * 2 } is basically a
lambda whose argument is dot. Would this be sufficient?


  library(magrittr)

  1.5 %>% { . * 2 }
  ## [1] 3

Regarding currying note that with magrittr Ista's code could be written as:

  1:5 %>% lapply(foo, y = 3)

or at the expense of slightly more verbosity:

  1:5 %>% Map(f = . %>% foo(y = 3))


On Fri, May 5, 2017 at 1:00 PM, Antonin Klima <anton...@idi.ntnu.no> wrote:
> Dear Sir or Madam,
>
> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
> computer science, and have been using R heavily during my PhD. As such, I 
> have put together a list of certain features in R that, in my opinion, would 
> be beneficial to add, or could be improved. The first two are already 
> implemented in packages, but given that it is implemented as user-defined 
> operators, it greatly restricts its usefulness. I hope you will find my 
> suggestions interesting. If you find time, I will welcome any feedback as to 
> whether you find the suggestions useful, or why you do not think they should 
> be implemented. I will also welcome if you enlighten me with any features I 
> might be unaware of, that might solve the issues I have pointed out below.
>
> 1) piping
> Currently available in package magrittr, piping makes the code better 
> readable by having the line start at its natural starting point, and 
> following with functions that are applied - in order. The readability of 
> several nested calls with a number of parameters each is almost zero, it’s 
> almost as if one would need to come up with the solution himself. Pipeline in 
> comparison is very straightforward, especially together with the point (2).
>
> The package here works rather good nevertheless, the shortcomings of piping 
> not being native are not quite as severe as in point (2). Nevertheless, an 
> intuitive symbol such as | would be helpful, and it sometimes bothers me that 
> I have to parenthesize anonymous function, which would probably not be 
> required in a native pipe-operator, much like it is not required in f.ex. 
> lapply. That is,
> 1:5 %>% function(x) x+2
> should be totally fine
>
> 2) currying
> Currently available in package Curry. The idea is that, having a function 
> such as foo = function(x, y) x+y, one would like to write for example 
> lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not 
> make a value result, but it can still give a function result - a function of 
> y. This would be indeed most useful for various apply functions, rather than 
> writing function(x) foo(3,x).
>
> I suggest that currying would make the code easier to write, and more 
> readable, especially when using apply functions. One might imagine that there 
> could be some confusion with such a feature, especially from people 
> unfamiliar with functional programming, although R already does take function 
> as first-order arguments, so it could be just fine. But one could address it 
> with special syntax, such as $foo(3) [$foo(x=3)] for partial application.  
> The current currying package has very limited usefulness, as, being limited 
> by the user-defined operator framework, it only rarely can contribute to less 
> code/more readability. Compare yourself:
> $foo(x=3) vs foo %<% 3
> goo = function(a,b,c)
> $goo(b=3) vs goo %><% list(b=3)
>
> Moreover, one would often like currying to have highest priority. For 
> example, when piping:
> data %>% foo %>% foo1 %<% 3
> if one wants to do data %>% foo %>% $foo(x=3)
>
> 3) Code executable only when running the script itself
> Whereas the first two suggestions are somewhat stealing from Haskell and the 
> like, this suggestion would be stealing from Python. I’m building quite a 
> complicated pipeline, using S4 classes. After defining the class and its 
> methods, I also define how to build the class to my likings, based on my 
> input data, using various now-defined methods. So I end up having a list of 
> command line arguments to process, and the way to create the class instance 
> based on them. If I write it to the class file, however, I end up running the 
> code when it is sourced from the next step in the pipeline, that needs the 
> previous class definitions.
>
> A feature such as pythonic “if __name__ == __main__” would thus be useful. As 
> it is, I had to create run scripts as separate files. Which is actually not 
> so terrible, given the class and its methods often span a few hundred lines, 
> but still.
>
> 4) non-exported global variables
> I also find it lacking, that I seem to be unable to create constants that 
> would not get passed to files that source the class definition. That is, if 
> class1 features global constant CONSTANT=3, then if class2 sources class1, it 
> will also include the constant. This 1) clutters the namespace when running 
> the code interactively, 2) potentially overwrites the constants in case of 
> nameclash. Some kind of export/nonexport variable syntax, or symbolic import, 
> or namespace would be useful. I know if I converted it to a package I would 
> get at least something like a namespace, but still.
>
> I understand that the variable cannot just not be imported, in general, as 
> the functions will generally rely on it (otherwise it wouldn’t have to be 
> there). But one could consider hiding it in an implicit namespace for the 
> file, for example.
>
> 5) S4 methods with same name, for different classes
> Say I have an S4 class called datasetSingle, and another S4 class called 
> datasetMulti, which gathers up a number of datasetSingle classes, and adds 
> some extra functionality on top. The datasetSingle class may have a method 
> replicates, that returns a named vector assigning replicate number to 
> experiment names of the dataset. But I would also like to have a function 
> with the same name for the datasetMulti class, that returns for data frame, 
> or list, covering replicate numbers for all the datasets included.
>
> But then, I need to setGeneric for the method. But if I set generic before 
> both implementations, I will reset the generic in the second call, losing the 
> definition for “replicates” for datasetSingle. Skipping this in the code for 
> datasetMulti means that 1) I have to remember that I had the function defined 
> for datasetSingle, 2) if I remove the function or change its name in 
> datasetSingle, I now have to change the datasetMulti class file too. 
> Moreover, if I would like to have a different generic for the datasetMulti 
> version, I have to change it not in datasetMulti class file, but in the 
> datasetSingle file, where it might not make much sense. In this case, I 
> wanted to have another argument “datasets”, which would return the replicates 
> only for the datasets specified, rather than for all.
>
> I made a wrapper that could circumvent the first issue, but the second issue 
> is not easy to circumvent.
>
> 6) Many parameters freeze S4 method calls
> If I specify ca over 6 parameters for an S4 method, I would often get a 
> “freeze” on the method call. The process would eat up a lot of memory before 
> going into the call, upon which it would execute the call as normal (if it 
> didn’t run out of memory or I didn’t run out of patience). Subsequent calls 
> of the method would not include this overhead. The amount of memory this 
> could take could be in gigabytes, and the time in minutes. I suspect this 
> might be due to generating an entry in call table for each accepted 
> signature. It can be circumvented, but sure isn’t a behaviour one would 
> expect.
>
> 7) Default values for S4 methods
> It would seem that it is not possible to set up default parameters for an S4 
> method in a usual way of definiton = function (x, y=5). I resorted to making 
> class unions with “missing” for signatures on the call, with the call 
> starting with if(missing(param)) param=DEFAULT_VALUE, but it certainly does 
> not improve readability or ease of coding.
>
>
> Thank you for your time if you have finished reading thus far. :) Looking 
> forward to any answer.
>
> Yours Sincerely,
> Antonin Klima
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A few suggestions and perspectives from a PhD student

Reply via email to