Hi,
In the same program, I save in a file a DataFrame "df" and a subset of
this DataFrame in another file. The problem I have is that saving the
subset is much slower than saving the entire DataFrame : 220 times slower.
It is too slow and I don't what is my mistake.
Thank you for your advices !
in Julia 0.4.5 :
Saving the entire DataFrame
Saving... results/Stat.csv
1.115944 seconds (13.78 M allocations: 319.534 MB, 2.59% gc time)
Saving the subset of the DataFrame
Saving... significant/Stat.csv
246.099835 seconds (41.79 M allocations: 376.189 GB, 4.77% gc time)
elapsed time: 251.581459853 seconds
in Julia 0.5 :
Saving the entire DataFrame
Saving... results/Stat.csv
1.060365 seconds (7.08 M allocations: 116.025 MB, 0.73% gc time)
Saving the subset of the DataFrame
Saving... significant/Stat.csv
226.813587 seconds (37.40 M allocations: 376.268 GB, 2.42% gc time)
elapsed time: 232.95933586 seconds
################################################
# my function to save the results to a file
function write_results(x, name, dir, sep, h)
outfile = "$dir/$name"
println("Saving...\t", outfile)
writetable( outfile, x, separator = sep, header = h)
end
# save my DataFrame df : very fast
@time write_results(df, name, "results", sep, h)
# subset DataFrame s
s = sub(df, (df[:rank_PV] .<= r_max))
# save my subset DataFrame s : incredibly slow !
@time write_results(s, name, "significant", sep, h)