Hello I am attempting to process a list of csv files in parallel, some of which may be empty and fail with read.csv. I tend to use clusterMap as my go-to parallel function but have run into an interesting behavior. The behavior is that try(read.csv(x)) does not catch read errors resulting from having an empty csv file inside of clusterMap. I have not tested this with other functions (e.g. read.table, mean, etc.). The parLapply function does, it appears, correctly catch the errors. Any suggestions on how I should code with clusterMap such that try is guaranteed to catch the error?
I am working on windows server 2012 I have the latest version of R and parallel I am executing the code from within the rstudio ide Version 0.99.896 Here is a demonstration of the failure R code used in demonstration: #prepare csv files - an empty file and a file with data close(file("c:/temp/badcsv.csv",open="w")) write.table(data.frame(x=2),"c:/temp/goodcsv.csv") #prepare a parallel cluster clus0=makeCluster(1, rscript_args = "--no-site-file") #read good / bad files in parallel with parLapply - which succeeds: try Does catch err x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) print(x1) #read good / bad files in parallel with clusterMap - which fails: try does Not catch error x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) print(x0) R output: > #prepare csv files - an empty file and a file with data > close(file("c:/temp/badcsv.csv",open="w")) > write.table(data.frame(x=2),"c:/temp/goodcsv.csv") > > #prepare a parallel cluster > clus0=makeCluster(1, rscript_args = "--no-site-file") > > #read good / bad files in parallel with parLapply - which succeeds: try Does > catch err > x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...))) > print(x1) [[1]] [1] "Error in read.table(file = file, header = header, sep = sep, quote = quote, : \n no lines available in input\n" attr(,"class") [1] "try-error" attr(,"condition") <simpleError in read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, ...): no lines available in input> [[2]] x 1 1 2 > > #read good / bad files in parallel with clusterMap - which fails: try does > Not catch error > x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F) Error in checkForRemoteErrors(val) : one node produced an error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : no lines available in input > print(x0) Error in print(x0) : object 'x0' not found > Thanks for any help, Jacob [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.