I guess this has been discussed before, but I don't know the name of this problem, thus had to ask again.

Consider this scenario:

fun <- function(x) { print(x)}
for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] 2
[1] 3
[1] "OK"
[1] "OK"
[1] "OK"

The optimal behaviour is:

fun <- function(x) { print(x)}
for (i in Vectorize(fun, "x")(1:3)) print("OK")
[1] 1
[1] "OK"
[1] 2
[1] "OK"
[1] 3
[1] "OK"

That is, each iteration of vectorized function should yield some result for the 'for' statement, rather than having all results collected beforehand.

The intention of such a pattern, is to separates the data generation logic from data processing logic.

The latter mechanism, I think, is more efficient because it doesn't cache all data before processing -- and the interpreter has the sure knowledge that caching is not needed, since the vectorized function is not used in assignment but as a range.

The difference may be trivial, but this pseud code demonstrates otherwise:

readSample <- function(x) {
        ....
        sampling_time <- readBin(con, integer(), 1, size=4)
        sample_count <- readBin(con, integer(), 1, size=2)
        samples <- readBin(con, float(), sample_count, size=4)
        ....
        matrix # return a big matrix representing a sample
}

for (sample in Vectorize(readSample, "x")(1:10000)) {
        # process sample
}

The data file is a few Gigabytes, and caching them is not effortless. Not having to cache them would make a difference.

This email asks to 1. validate this need of the langauge; 2. alternative design pattern to workaround it; 3. Ask the proper place to discuss this.

Thanks and best...

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to