Hello I am attempting to process a list of csv files in parallel, some of which 
may be empty and fail with read.csv. I tend to use clusterMap as my go-to 
parallel function but have run into an interesting behavior. The behavior is 
that try(read.csv(x)) does not catch read errors resulting from having an empty 
csv file inside of clusterMap. I have not tested this with other functions 
(e.g. read.table, mean, etc.). The parLapply function does, it appears, 
correctly catch the errors. Any suggestions on how I should code with 
clusterMap such that try is guaranteed to catch the error?


I am working on windows server 2012
I have the latest version of R and parallel
I am executing the code from within the rstudio ide Version 0.99.896

Here is a demonstration of the failure

R code used in demonstration:
#prepare csv files - an empty file and a file with data
close(file("c:/temp/badcsv.csv",open="w"))
write.table(data.frame(x=2),"c:/temp/goodcsv.csv")

#prepare a parallel cluster
clus0=makeCluster(1, rscript_args = "--no-site-file")

#read good / bad files in parallel with parLapply - which succeeds: try Does 
catch err
x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...)))
print(x1)

#read good / bad files in parallel with clusterMap - which fails: try does Not 
catch error
x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F)
print(x0)

R output:

> #prepare csv files - an empty file and a file with data
> close(file("c:/temp/badcsv.csv",open="w"))
> write.table(data.frame(x=2),"c:/temp/goodcsv.csv")
>
> #prepare a parallel cluster
> clus0=makeCluster(1, rscript_args = "--no-site-file")
>
> #read good / bad files in parallel with parLapply - which succeeds: try Does 
> catch err
> x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...)))
> print(x1)
[[1]]
[1] "Error in read.table(file = file, header = header, sep = sep, quote = 
quote,  : \n  no lines available in input\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in read.table(file = file, header = header, sep = sep, quote = 
quote,     dec = dec, fill = fill, comment.char = comment.char, ...): no lines 
available in input>

[[2]]
    x
1 1 2

>
> #read good / bad files in parallel with clusterMap - which fails: try does 
> Not catch error
> x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F)
Error in checkForRemoteErrors(val) :
  one node produced an error: Error in read.table(file = file, header = header, 
sep = sep, quote = quote,  :
  no lines available in input
> print(x0)
Error in print(x0) : object 'x0' not found
>


Thanks for any help,
Jacob


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to