Control: retitle -1 libopenblas0-pthread: gives wrong results with AVX-512 kernel Control: fixed -1 0.3.21+ds-4
Dear Enzo, Le lundi 05 décembre 2022 à 10:37 -0300, Enzo Alberto Dari a écrit : > Package: libopenblas0-pthread > Version: 0.3.13+ds-3 > Severity: important > Tags: upstream > X-Debbugs-Cc: da...@ib.edu.ar > > While upgrading my debian OS from 10.x to 11.x (octave 4.4.5 to 6.2.0), > one of my scripts started failing. I managed to create the following test > that reproduces the problem: > -------- > % non-singular matrix > b=[7110.327, -2592.219, 631.419, -288.541, 169.250, -113.431, 82.646, > -63.812, 51.448, -42.914; > -1218.551, 1508.124, -720.486, 169.250, -74.433, 42.572, -28.131, 20.364, > -15.701, 12.683; > 169.250, -482.641, 674.499, -350.244, 82.646, -36.010, 20.364, -13.333, > 9.592, -7.371; > -49.544, 82.646, -268.958, 399.001, -215.550, 51.448, -22.463, 12.683, > -8.285, 5.950; > 20.364, -27.810, 51.448, -178.696, 275.664, -152.325, 36.804, -16.173, > 9.164, -6.000; > -10.260, 12.683, -18.791, 36.804, -132.946, 211.205, -118.438, 28.958, > -12.831, 7.317; > 5.950, -6.944, 9.164, -14.257, 28.958, -107.451, 174.780, -99.104, 24.511, > -10.963; > -3.839, 4.320, -5.318, 7.317, -11.762, 24.511, -92.803, 153.928, -88.144, > 22.050; > 2.702, -2.966, 3.488, -4.451, 6.304, -10.377, 22.050, -84.854, 143.052, > -82.752; > -2.050, 2.211, -2.519, 3.056, -4.000, 5.789, -9.704, 20.944, -81.727, > 139.664]; > sizeb=n=size(b,1) > rankb=rank(b) > % Builds blocked matrix > B=[eye(n) zeros(n); zeros(n) b]; > % Computes inverse > inv1=inv(B); > % Computes inverse by blocks: non-trivial block: > invb=inv(b); > % Build inverse by blocks > inv2=[eye(n) zeros(n); zeros(n) invb]; > % Both inverse matrices should be equal > diffinvs=norm(inv1-inv2) > % All these condition numbers should be 1 > cond(inv1*B) > cond(B*inv1) > cond(inv2*B) > cond(B*inv2) > -------- > The computation of "inv1" gives wrong results in: > -Intel Core i9-9900X > -Intel Core i9-7900X > -Intel Core i5-1035G1 > and correct results in: > -Intel Core i5-750 > -Intel Core i7-4930K > What pointed me in the direction of a trheads problem was the fact that > setting OMP_NUM_THREADS to 1 modify the output of the computation, (given > correct results in some cases). > The last tests I performed were running octave preloading the pthreads > and the openmp openblas libraries: > LD_PRELOAD=/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblas.so.0 > octave-cli test.m > gives incorrect results, while > LD_PRELOAD=/usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblas.so.0 > octave-cli test.m > works ok. > > (by "works ok" I mean the inverse computed by both methods differ only in the > floating point precision:~2e-17 and all the condition numbers are 1, In the > case of "failure", the inverses differ by ~0.035 and the condition numbers of > inv1*B and B*inv1 are about 4.7-4.9) I can reproduce the problem. It appears when the AVX-512 kernel is used by OpenBLAS (that kernel is internally called “SKYLAKEX”, you can see the current kernel by running “version -blas” from Octave). The three processors on which you experience the problem have AVX-512 support. Also note that the problem is fixed in the version currently in Debian testing/unstable. The challenge now is to find the upstream commit that fixed the bug (somewhere between versions 0.3.13 and 0.3.21), and then to possibly backport it to Debian stable. -- ⢀⣴⠾⠻⢶⣦⠀ Sébastien Villemot ⣾⠁⢠⠒⠀⣿⡁ Debian Developer ⢿⡄⠘⠷⠚⠋⠀ https://sebastien.villemot.name ⠈⠳⣄⠀⠀⠀⠀ https://www.debian.org