You wrote: >在 2023-05-26 23:40, Stefan Kanthak 写道: >> Feel free to propose this alternative here (better elsewhere, where you'll >> earn less laughter). >> But don't forget that this 23-bit mantissa will be all zeroes for quite some >> 64-bit (and even 32-bit) integers which are no power of 2, for example >> 0x8000003fffffffff, and that both FILD and CVT2SI2SS only work on SIGNED >> integers. > > The precision loss can be detected by examining the PF bit (6th bit i.e. > `0x20`) of the x87 status register.
How many instructions and conditional branches do you need then? Is the COMPLETE code using x87 instructions shorter/faster than the pure i386 code? JFTR: I show the DELTA to the generated code with intention; I don't create completely different code. > It doesn't matter whether the number is interpreted as signed or unsigned: > `-0x80000000'00000000` still only has one bit in its mantissa. Another option > is to store the number in the 80-bit extended precision format, with a 64-bit > mantissa which includes the otherwise hidden bit (so if the number is a power > of two, the mantissa will be `0x80000000'00000000`). Correct; you but proposed to use the 23-bit mantissa! > But anyway, traditional x86 has very few GPRs and GCC doesn't optimize multi- > word arithmetic very well. Performance may or may not vary depending on cache > locality and number of μops; not to mention `movq` and `movd` which have > relative high latencies. I would like to see some benchmarking results first. Ask the GCC developers why they generate SSE2 instructions in the first place here, and why they ignore their shortcomings, instead to stick with i386 code. JFTR: adding "(argument != 0) &&" stops their nonsense! Stefan