[Rd] A few suggestions and perspectives from a PhD student

2017-05-05 Thread Antonin Klima
er of datasetSingle classes, and adds some 
extra functionality on top. The datasetSingle class may have a method 
replicates, that returns a named vector assigning replicate number to 
experiment names of the dataset. But I would also like to have a function with 
the same name for the datasetMulti class, that returns for data frame, or list, 
covering replicate numbers for all the datasets included.

But then, I need to setGeneric for the method. But if I set generic before both 
implementations, I will reset the generic in the second call, losing the 
definition for “replicates” for datasetSingle. Skipping this in the code for 
datasetMulti means that 1) I have to remember that I had the function defined 
for datasetSingle, 2) if I remove the function or change its name in 
datasetSingle, I now have to change the datasetMulti class file too. Moreover, 
if I would like to have a different generic for the datasetMulti version, I 
have to change it not in datasetMulti class file, but in the datasetSingle 
file, where it might not make much sense. In this case, I wanted to have 
another argument “datasets”, which would return the replicates only for the 
datasets specified, rather than for all.

I made a wrapper that could circumvent the first issue, but the second issue is 
not easy to circumvent.

6) Many parameters freeze S4 method calls
If I specify ca over 6 parameters for an S4 method, I would often get a 
“freeze” on the method call. The process would eat up a lot of memory before 
going into the call, upon which it would execute the call as normal (if it 
didn’t run out of memory or I didn’t run out of patience). Subsequent calls of 
the method would not include this overhead. The amount of memory this could 
take could be in gigabytes, and the time in minutes. I suspect this might be 
due to generating an entry in call table for each accepted signature. It can be 
circumvented, but sure isn’t a behaviour one would expect.

7) Default values for S4 methods
It would seem that it is not possible to set up default parameters for an S4 
method in a usual way of definiton = function (x, y=5). I resorted to making 
class unions with “missing” for signatures on the call, with the call starting 
with if(missing(param)) param=DEFAULT_VALUE, but it certainly does not improve 
readability or ease of coding.


Thank you for your time if you have finished reading thus far. :) Looking 
forward to any answer.

Yours Sincerely,
Antonin Klima

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-08 Thread Antonin Klima
Thanks for the answers,

I’m aware of the ‘.’ option, just wanted to give a very simple example.

But the lapply ‘…' parameter use has eluded me and thanks for enlightening me. 

What do you mean by messing up the call stack. As far as I understand it, 
piping should translate into same code as deep nesting. So then I only see a 
tiny downside for debugging here. No loss of time/space efficiency or anything. 
With a change of inadvertent error in your example, coming from the fact that a 
variable is being reused and noone now checks for me whether it is being passed 
between the lines. And with having to specify the variable every single time. 
For me, that solution is clearly inferior.

Too bad you didn’t find my other comments interesting though.

>Why do you think being implemented in a contributed package restricts
>the usefulness of a feature?

I guess it depends on your philosophy. It may not restrict it per say, although 
it would make a lot of sense to me reusing the bash-style ‘|' and have a 
shorter, more readable version. One has extra dependence on a package for an 
item that fits the language so well that it should be its part.  It is without 
doubt my most used operator at least. Going to some of my folders I found 101 
uses in 750 lines, and 132 uses in 3303 lines. I would compare it to having a 
computer game being really good with a fan-created mod, but lacking otherwise. 
:) 

So to me, it makes sense that if there is no doubt that a feature improves the 
language, and especially if people extensively use it through a package 
already, it should be part of the “standard”. Question is whether it is indeed 
very popular, and whether you share my view. But that’s now up to you, I just 
wanted to point it out I guess.

Best Regards,
Antonin

> On 05 May 2017, at 22:33, Gabor Grothendieck  wrote:
> 
> Regarding the anonymous-function-in-a-pipeline point one can already
> do this which does use brackets but even so it involves fewer
> characters than the example shown.  Here { . * 2 } is basically a
> lambda whose argument is dot. Would this be sufficient?
> 
>  library(magrittr)
> 
>  1.5 %>% { . * 2 }
>  ## [1] 3
> 
> Regarding currying note that with magrittr Ista's code could be written as:
> 
>  1:5 %>% lapply(foo, y = 3)
> 
> or at the expense of slightly more verbosity:
> 
>  1:5 %>% Map(f = . %>% foo(y = 3))
> 
> 
> On Fri, May 5, 2017 at 1:00 PM, Antonin Klima  wrote:
>> Dear Sir or Madam,
>> 
>> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
>> computer science, and have been using R heavily during my PhD. As such, I 
>> have put together a list of certain features in R that, in my opinion, would 
>> be beneficial to add, or could be improved. The first two are already 
>> implemented in packages, but given that it is implemented as user-defined 
>> operators, it greatly restricts its usefulness. I hope you will find my 
>> suggestions interesting. If you find time, I will welcome any feedback as to 
>> whether you find the suggestions useful, or why you do not think they should 
>> be implemented. I will also welcome if you enlighten me with any features I 
>> might be unaware of, that might solve the issues I have pointed out below.
>> 
>> 1) piping
>> Currently available in package magrittr, piping makes the code better 
>> readable by having the line start at its natural starting point, and 
>> following with functions that are applied - in order. The readability of 
>> several nested calls with a number of parameters each is almost zero, it’s 
>> almost as if one would need to come up with the solution himself. Pipeline 
>> in comparison is very straightforward, especially together with the point 
>> (2).
>> 
>> The package here works rather good nevertheless, the shortcomings of piping 
>> not being native are not quite as severe as in point (2). Nevertheless, an 
>> intuitive symbol such as | would be helpful, and it sometimes bothers me 
>> that I have to parenthesize anonymous function, which would probably not be 
>> required in a native pipe-operator, much like it is not required in f.ex. 
>> lapply. That is,
>> 1:5 %>% function(x) x+2
>> should be totally fine
>> 
>> 2) currying
>> Currently available in package Curry. The idea is that, having a function 
>> such as foo = function(x, y) x+y, one would like to write for example 
>> lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not 
>> make a value result, but it can still give a function result - a function of 
>> y. This would be indeed most useful for various apply functions, rather than 
>> writing function(x) foo(3,x).
>> 
>>