ASSI writes: >> I have a Cygwin malloc speedup patch that *might* help the m-t part. >> I'll prepare and submit that to cygwin-patches shortly. > > Well, if you want to test it with the new ZStandard, give it a spin… > I'll check how far I can strip that test down so you can use the Cygwin > source tree for testing.
OK, it's actually pretty simple, do this inside a checkout of newlib-cygwin: $ find newlib winsup texinfo -type f > flist $ zstd --train-cover --ultra -22 -T0 -vv --filelist=flist -o dict-cover On Linux, it reads in all the files in about two seconds, while it takes quite a while longer on Cygwin. But the real bummer is that constructing the partial suffix arrays (which is single-threaded) will seemingly take forever, while it's done much faster on Linux. You can pare down the number of files like that: $ shuf -n 320 flist > slist and then use that shorter file instead. I get this: *** Linux E3-1225v3 4C/4T 3.2/3.6GHz |------+----------+-------+----------+---------+--------+--------+------------| | n | user | sys | total | wall | util | serial | pagefaults | |------+----------+-------+----------+---------+--------+--------+------------| | 100 | 116.092 | 0.187 | 116.279 | 30.82 | 377.3% | 2.0% | 0 | | 200 | 145.481 | 0.135 | 145.616 | 38.65 | 376.8% | 2.1% | 0 | | 400 | 288.341 | 0.414 | 288.755 | 77.84 | 371.0% | 2.6% | 0 | | 800 | 517.288 | 0.623 | 517.911 | 138.93 | 372.8% | 2.4% | 0 | | 1600 | 1229.348 | 1.752 | 1231.100 | 333.37 | 369.3% | 2.8% | 0 | | 3200 | 2508.250 | 3.632 | 2511.882 | 678.96 | 370.0% | 2.7% | 0 | | 6400 | 4380.693 | 5.352 | 4386.045 | 1176.43 | 372.8% | 2.4% | 0 | |------+----------+-------+----------+---------+--------+--------+------------| *** Cygwin E3-1276v3 4C/8T 3.6/4.0GHz |------+----------+--------+----------+---------+--------+--------+------------| | n | user | sys | total | wall | util | serial | pagefaults | |------+----------+--------+----------+---------+--------+--------+------------| | 100 | 141.906 | 0.796 | 142.702 | 20.53 | 695.1% | 2.2% | 327860 | | 200 | 198.140 | 1.328 | 199.468 | 29.39 | 678.7% | 2.6% | 452870 | | 400 | 425.749 | 2.328 | 428.077 | 66.03 | 648.3% | 3.3% | 752357 | | 800 | 822.250 | 3.499 | 825.749 | 150.42 | 549.0% | 6.5% | 1277198 | | 1600 | 1773.578 | 8.483 | 1782.061 | 383.42 | 464.8% | 10.3% | 3011298 | | 3200 | 4322.281 | 15.890 | 4338.171 | 1292.92 | 335.5% | 19.8% | 5746903 | | 6400 | 8499.750 | 29.437 | 8529.187 | 3275.66 | 260.4% | 29.6% | 10543919 | |------+----------+--------+----------+---------+--------+--------+------------| So even with smaller number of files (where the serial portion of the code is not dominating yet) you see that the faster machine expends more cycles already. Looking at the differences there is a strong indication for those pagefaults to constitute the main portion of that extra time. The last column is the time per pagefault in µs assuming that the extra time was all spent there. This is obviously not quite correct, as that number should roughly be constant if that assumption holds, but it's close enough to uphold the original hypothesis. *** Linux vs. Cygwin |------+--------------+--------------+----------------+--------------+------------+------| | n | Linux total | Linux scaled | Cygwin total | Cygwin-Linux | pagefaults | t/pf | |------+--------------+--------------+----------------+--------------+------------+------| | 100 | 116.279 | 104.651 | 142.702 | 38.051 | 327860 | 116. | | 200 | 145.616 | 131.054 | 199.468 | 68.414 | 452870 | 151. | | 400 | 288.755 | 259.880 | 428.077 | 168.197 | 752357 | 224. | | 800 | 517.911 | 466.120 | 825.749 | 359.629 | 1277198 | 282. | | 1600 | 1231.100 | 1107.990 | 1782.061 | 674.071 | 3011298 | 224. | | 3200 | 2511.882 | 2260.694 | 4338.171 | 2077.477 | 5746903 | 361. | | 6400 | 4386.045 | 3947.441 | 8529.187 | 4581.746 | 10543919 | 435. | |------+--------------+--------------+----------------+--------------+------------+------| Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptation for Waldorf rackAttack V1.04R1: http://Synth.Stromeko.net/Downloads.html#WaldorfSDada