[julia-users] Writing a subset DataFrame to file is 220 times slower than saving the whole DataFrame

Fred Thu, 27 Oct 2016 05:07:49 -0700

Hi,

In the same program,  I save in a file a DataFrame "df" and a subset of 
this DataFrame in another file. The problem I have is that saving the 
subset is much slower than saving the entire DataFrame : 220 times slower. 
It is too slow and I don't what is my mistake.


Thank you for your advices !

in Julia 0.4.5 : 

Saving the entire DataFrame
Saving... results/Stat.csv
1.115944 seconds (13.78 M allocations: 319.534 MB, 2.59% gc time)


Saving the subset of the DataFrame 
Saving... significant/Stat.csv
246.099835 seconds (41.79 M allocations: 376.189 GB, 4.77% gc time)
elapsed time: 251.581459853 seconds


in Julia 0.5 : 

Saving the entire DataFrame
Saving... results/Stat.csv
1.060365 seconds (7.08 M allocations: 116.025 MB, 0.73% gc time)

Saving the subset of the DataFrame 
Saving... significant/Stat.csv
226.813587 seconds (37.40 M allocations: 376.268 GB, 2.42% gc time)
elapsed time: 232.95933586 seconds

################################################
# my function to save the results to a file

function write_results(x, name, dir, sep, h)
  outfile = "$dir/$name"
  println("Saving...\t", outfile)
  writetable( outfile, x, separator = sep, header = h)
end


# save my DataFrame df : very fast
@time write_results(df, name, "results", sep, h)


# subset DataFrame s
s = sub(df, (df[:rank_PV] .<= r_max))

# save my subset DataFrame s : incredibly slow !

@time write_results(s, name, "significant", sep, h)

[julia-users] Writing a subset DataFrame to file is 220 times slower than saving the whole DataFrame

Reply via email to