On Wed, Dec 10, 2014 at 01:16:02AM +1100, Ray Callaghan wrote: > Perhaps if the SSE self check fails, rather than random, just revert to the > NO-SSE implementation (after self-testing the no-sse version) - so it still > returns valid data, exactly like it should.
... and what if it fails too? I think what you describe makes sense not so much to deal with "the Pentium 3 issue", but simply to support SSE2 in a binary that also works on pre-SSE2 CPUs. To do this, a source file (a new wrapper file or the -nosse source file) would need to be built without gcc's -msse2 (or the current compiler's equivalent). It would need to check CPUID (unfortunately, no "portable" intrinsic for it exists, so some #ifdef's and/or inline asm there). If SSE2 is detected, it would invoke the function from the -sse source file. Otherwise it'd proceed with the one from -nosse. > Sure, it doubles the program size, but on anything which might support SSE, > nobody cares about a few kb of code. However, we care about the associated complexity and risk of bugs. Having a piece of almost-dead code in there, which is omitted from most tests (thereby reducing their relative code coverage), is not great. Having a runtime self-test even after the -nosse code was invoked helps (and I think we should), but it means we do need to decide on how to handle possible failure. > And presumably, compilers should detect SSE issues, but if the > compiler is broken, we can only do it at runtime, with a performance > penalty; but we should probably still consider that a compiler issue. I disagree. A compiler is also not able to "detect SSE issues" other than at runtime. > I think there should definitely be a way for a caller to ask scrypt to do a > self-check; regardless of sse, or any other issues; but we can apply a > heuristic as to whether we automatically force the self-check, or whether > it's only for the caller to explicitly ask for. Possibly, the compiler has > a "yes" (like detecting 64 bit), so uses SSE without a forced self-check; a > "no", uses the NO-SSE; and a "maybe", like the SSE flags are on, but it's a > 32-bit - in which case, compile with SSE, but force a self-check on every > usage. (or compile with both, and have a runtime downgrade to no-sse) > > Thoughts? I think this would introduce too much complexity. I think if we do introduce a runtime self-test, it's better to invoke it all the time (and it would need to be lightweight enough for that). The way scrypt's SSE2 code runs on Pentium 3 era CPUs is just a special case (albeit a rather unique one: both the compiler and the CPU work as intended, yet the result is wrong). There may be other issues, both with -sse and -nosse. I've already mentioned C strict aliasing issues (OK, need to fix those in the source code) and possible compiler bugs. If we do introduce a self-test, it'd be a pity to limit it to just this one special case. Alexander
