Greg Hirson wrote:
I have noticed an interesting behavior when comparing how the base plot() function deals with a data argument that downloads data from the internet vs. how xyplot() in lattice performs the same task.

The goal is to plot hourly temperature data. The data is downloaded and formatted for R using the function cimishourly() in the package cimis. There is a line within the function that outputs the name of the file being downloaded using cat().

When using plot() to plot the data, the following is written to the console:

library(cimis)
plot(air_temp ~ datetime, data = cimishourly("006"))
Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv
Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv

When using xyplot() to perform the same plot, the data is only downloaded once:

library(lattice)
xyplot(air_temp ~ datetime, data = cimishourly("006"))
Downloading:  ftp://ftpcimis.water.ca.gov/pub/hourly/hourly006.csv

Is this caused by a difference in how the two functions evaluate the data argument?


Looks like nobody answered so far:

Yes, there are several differences.
I think you should not encapsulate downloading-functions into others anyway and download the data once before anything else and then start to work on it.

It is evaluated in plot.formula at two positions:

    if (is.matrix(eval(m$data, parent.frame())))

    mf <- eval(m, parent.frame())

Generally this is not a big issue but for your function it shows quite some performance penalty that can easily be avoided by downloading in advance.

Best,
Uwe Ligges



Even more interesting, when adding a type = "l" argument to plot, the data is downloaded 3 times.

Thank you for your time,

Greg


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to