Re: [Rd] write.csv problems

2024-06-29 Thread Rui Barradas

Às 17:02 de 28/06/2024, Spencer Graves escreveu:

Hello, All:


   I'm getting strange errors with write.csv with some objects of 
class c('findFn', 'data.frame'). Consider the following:



df1 <- data.frame(x=1)
class(df1) <- c('findFn', 'data.frame')
write.csv(df1, 'df1.csv')
# Error in x$Package : $ operator is invalid for atomic vectors

df2 <- data.frame(a=letters[1:2],
   b=as.POSIXct('2024-06-28'))
class(df2) <- c('findFn', 'data.frame')
write.csv(df2, 'df1.csv')
# Error in tapply(rep(1, nrow(x)), xP, length) :
#  arguments must have same length


   "write.csv" works with some objects of class c('findFn', 
'data.frame') but not others. I have 'findFn' object with 5264 rows that 
fails with the following error:



Error in `[<-.data.frame`(`*tmp*`, needconv, value = list(Count = 
c("83",  :

   replacement element 1 has 526 rows, need 5264


   I have NOT yet been able to reproduce this error with a smaller 
example. However, starting 'write.csv' with something like the following 
should fix all these problems:



if(is.data.frame(x)) class(x) <- 'data.frame'


   Comments?
   Thanks for all your work to help improve the quality of 
statistical software available to the world.



   Spencer Graves

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Hello,

I don't know if this answers to question.
I wasn't able to reproduce errors but warnings, yes I was.

A way of not giving errors or warnings is to call write.csv at the end 
of a pipe such as the following.



df1 <- findFn("mean")
df1 |> as.data.frame() |> write.csv("df1.csv")


This solution is equivalent to the code proposed in the OP without the 
need for a change in base R.


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] write.csv problems

2024-06-29 Thread Spencer Graves

Hi, Rui et al.:


On 6/29/24 14:24, Rui Barradas wrote:

Às 17:02 de 28/06/2024, Spencer Graves escreveu:

Hello, All:


   I'm getting strange errors with write.csv with some objects of 
class c('findFn', 'data.frame'). Consider the following:



df1 <- data.frame(x=1)
class(df1) <- c('findFn', 'data.frame')
write.csv(df1, 'df1.csv')
# Error in x$Package : $ operator is invalid for atomic vectors

df2 <- data.frame(a=letters[1:2],
   b=as.POSIXct('2024-06-28'))
class(df2) <- c('findFn', 'data.frame')
write.csv(df2, 'df1.csv')
# Error in tapply(rep(1, nrow(x)), xP, length) :
#  arguments must have same length


   "write.csv" works with some objects of class c('findFn', 
'data.frame') but not others. I have 'findFn' object with 5264 rows 
that fails with the following error:



Error in `[<-.data.frame`(`*tmp*`, needconv, value = list(Count = 
c("83",  :

   replacement element 1 has 526 rows, need 5264


   I have NOT yet been able to reproduce this error with a smaller 
example. However, starting 'write.csv' with something like the 
following should fix all these problems:



if(is.data.frame(x)) class(x) <- 'data.frame'


   Comments?
   Thanks for all your work to help improve the quality of 
statistical software available to the world.



   Spencer Graves

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Hello,

I don't know if this answers to question.
I wasn't able to reproduce errors but warnings, yes I was.

A way of not giving errors or warnings is to call write.csv at the end 
of a pipe such as the following.



df1 <- findFn("mean")
df1 |> as.data.frame() |> write.csv("df1.csv")


This solution is equivalent to the code proposed in the OP without the 
need for a change in base R.



	  Thanks for this. Ivan Krylov informed me that this was NOT a problem 
with base R but with "[.findFn". I fixed that and got help from Ivan 
fixing another problem with "sos". Now it is officially "on its way to 
CRAN."




Hope this helps,



  Yes. I'm not yet facile with "|>", but I'm learning.


  Spencer Graves



Rui Barradas




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] \>

2024-06-29 Thread Duncan Murdoch




  Yes. I'm not yet facile with "|>", but I'm learning.


  Spencer Graves


There's very little to know.  This:

 x |> f() |> g()

is just a different way of writing

g(f(x))

If f() or g() have extra arguments, just add them afterwards:

x |> f(a = 1) |> g(b = 2)

is just

g(f(x, a = 1), b = 2)

This isn't quite true of the magrittr pipe, but it is exactly true of 
the base pipe.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] \>

2024-06-29 Thread Spencer Graves

Hi, Duncan:


On 6/29/24 17:24, Duncan Murdoch wrote:



  Yes. I'm not yet facile with "|>", but I'm learning.


  Spencer Graves


There's very little to know.  This:

  x |> f() |> g()

is just a different way of writing

     g(f(x))

If f() or g() have extra arguments, just add them afterwards:

     x |> f(a = 1) |> g(b = 2)

is just

     g(f(x, a = 1), b = 2)



	  Agreed. If I understand correctly, the supporters of the former think 
it's easier to highlight and execute a subset of the earlier character 
string, e.g., "x |> f(a = 1)" than the corresponding subset of the 
latter, "f(x, a = 1)". I remain unconvinced.



  For debugging, I prefer the following:


  fx1 <- f(x, a = 1)
  g(fx1, b=2)


	  Yes, "fx1" occupies storage space that the other two do not. Ir you 
are writing code for an 8086, the difference in important. However, for 
my work, ease of debugging is important, which is why I prefer, "fx1 <- 
f(x, a = 1); g(fx1, b=2)".



  Thanks, again, for the reply.
  Spencer Graves



This isn't quite true of the magrittr pipe, but it is exactly true of 
the base pipe.


Duncan Murdoch



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] \>

2024-06-29 Thread Duncan Murdoch
I agree with you (I think we may be similarly aged), but there is the 
`magrittr::debug_pipe()` function, which can be inserted anywhere into 
either kind of pipe.  It will call `debug()` at that point, and let you 
examine the current value, before passing it on to the next entry.


You can't single step through a pipe (as far as I know), but with that 
modification, you can see what you've got at any point.


Duncan Murdoch


On 2024-06-29 6:57 p.m., Spencer Graves wrote:

Hi, Duncan:


On 6/29/24 17:24, Duncan Murdoch wrote:



   Yes. I'm not yet facile with "|>", but I'm learning.


   Spencer Graves


There's very little to know.  This:

   x |> f() |> g()

is just a different way of writing

      g(f(x))

If f() or g() have extra arguments, just add them afterwards:

      x |> f(a = 1) |> g(b = 2)

is just

      g(f(x, a = 1), b = 2)



  Agreed. If I understand correctly, the supporters of the former think
it's easier to highlight and execute a subset of the earlier character
string, e.g., "x |> f(a = 1)" than the corresponding subset of the
latter, "f(x, a = 1)". I remain unconvinced.


  For debugging, I prefer the following:


  fx1 <- f(x, a = 1)
  g(fx1, b=2)


  Yes, "fx1" occupies storage space that the other two do not. Ir you
are writing code for an 8086, the difference in important. However, for
my work, ease of debugging is important, which is why I prefer, "fx1 <-
f(x, a = 1); g(fx1, b=2)".


  Thanks, again, for the reply.
  Spencer Graves



This isn't quite true of the magrittr pipe, but it is exactly true of
the base pipe.

Duncan Murdoch



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] \>

2024-06-29 Thread avi.e.gross
I suggest there is actually quite a lot to know about piping, albeit you can 
use it fine while knowing little.

For those who can happily write complex lines of code containing nested 
function calls and never have to explain it to anyone, feel free. I can do that 
and sometimes months later I only figure out what I did in ten minutes and then 
check to see if I got it right!

But for people who are used to features vaguely similar in other languages, 
pipes are a great way to visualize data and process flow as they show a sort of 
sequence.

No, they are not at all the same as a UNIX pipe but that is not a bad model as 
it lets you write shell scripts that do one conceptual step at a time and pass 
along data to the input of another program that processes it further and passes 
it along until you reach some goal.

Many languages, such as ones using variations on Object Oriented, have a sort 
of pipeline that can look like:

a.method_a(args).method_b(args)

And in some languages, that can be spread across multiple lines to look a bit 
more like a pipeline. This too is an inexact analogy as what really happens is 
that the underlying object can return perhaps another object when you call a 
method and then you can call a method in that object and so on. This can make 
it limited in some ways or quite powerful.

The many versions that have been created of an R pipe can be variations on many 
themes. As an example, you could take the multiple lines in a pipeline and 
rearrange them to look like the nested code with function calls as arguments in 
other functions and then evaluate it. It would, in effect, be a sort of 
syntactic sugar that makes it easier for SOME programmers.

But the topic now shifts to debugging and indeed, the underlying implementation 
of a pipeline can impact on one debugs.

The simplest case is trivial to debug. No visible pipes:

Temp1 <- f1(x, args)
Temp2 <- f2(Temp1,  args)
Result <- f3(Temp2, args)
rm(Temp1, Temp2)

So one form of piping does something like this under the table:

For code like:
X PIPED f1(args) PIPED f2(args) -> Result

It simply does something like this:

. <- x
. <- f1(., args)
.  <- f2(.,  args)
Result <- f3(., args)

The variable "." just gets re-used repeatedly. But as this code swap is done 
outside normal view, can a debugger follow it? And "." keeps changing. As a 
nice feature, some implementations may actually check and if you place "." as 
an argument past the beginning as in f3(args, ., more_args) allow you to pipe 
in not just to the first argument for the many functions that may want the data 
second or third or ...

There are other implementations possible that allow syntactic sugar without 
necessarily being run as shown. I am not sure how the native pipe that was 
added is implemented but it seems quite a bit faster than many other 
implementations and has some quirks such as requiring all functions to include 
parentheses, even if empty like piping to head(), and the way to do some things 
using anonymous functions is a tad annoying.

I think the focus for many people is the HUMAN who is programming and sees a 
logical way to describe what they want without much ambiguity. Of course, if 
you want to keep playing with your code, don't use pipes except perhaps when it 
is pretty much done.

An analogy to consider is another variant of piping used by ggplot where "+" is 
overloaded and:

ggplot(args) +
  geom_point(args) +
  geom_line(args) +
  xlab(args) +
  theme_bw() +
  coord_flip() +
  ...

Is a common way of writing a fairly complex set of operations. But what is 
being piped there is a growing object that each step modifies and an the end, 
the object is rendered into a graph based on whatever complex contents it 
contains. And, yes, that can be painful to debug and a simple option is:

P <- ggplot(args)
P <- P + geom_point(args)
P <- P + geom_line(args)
...
print(P)

Being able to declare incremental changes and layers to a graph this way is 
more intuitive to some. Not using a pipelined approach allows you to comment 
out parts easily, such as not making it black/white sometimes, albeit you can 
as easily comment out the other version.

What some people need to understand is that adding pipes of any of the 
varieties has never taken away to write the code in other ways. It is not in 
any way required. And for some people, it aligns better with how they can 
reason. Yet, if you need lots of debugging in your programs, writing them 
differently may be a better idea, at least until it is debugged.

I have written code for my clients with quite elegant pipelines as well as 
functions like the dplyr mutate() that allow me to do many things in one 
function call, and formatted it beautifully with varying levels of indentation 
so you can see at a glance where things line up. Parts of the code are nested 
function calls and when it all leads to a ggplot structure like above, it can 
be a tad hard for many people to appreciate what it is doing. But