Some statistics: After building each of the 17 packages 100 times, using machines with 1 CPU and then with 2 CPUs, I got:
42 failures on machines with 1 CPU. 95 failures on machines with 2 CPUs. So the failure rates are 2.5% vs 5.6%. The difference is significant enough, and maybe it means something, but we don't know what exactly yet. Thanks.