On 03/05/12 11:27, Dinar Temirbulatov wrote: > Hi, > Here is updated version of patch. I added comments describing the algorithm. > >> Hi Dinar. I'm afraid it gives the wrong results for some dividends >> * 82625484914982912 / 2023346444509052928: gives 4096, should be zero >> * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9 >> * 12097415865295708879 / 4545815675034402816: gives 130, should be 2 >> * 18195490362097456014 / 6999635335417036800: gives 10, should be 2 > Oh, I have used signed multiplication instead of unsigned and that was > the reason of errors above, fixed that typo. > Tested on arm-7l with no new regressions. > Ok for trunk? >
This clearly needs more work. Comments: Need to end with a period and two spaces. Binary Operators: Need to be surrounded with white space. As Andrew Haley has already said, some documentation of the algorithm is needed. Why is this restricted to BITS_PER_WORD == 32? Costs: This code clearly needs to understand the relative cost of doing division this way; there is a limit to the amount of inline expansion that we should tolerate, particularly at -O2 and it's not clear that this will be much better than a library call if we don't have a widening multiply operation (as is true for older ARM chips, and presumably some other CPUs). In essence, you need to work out the cost of a divide instruction (just as rtx costs for this) and the approximate cost of the expanded algorithm. Another issue that worries me is the number of live DImode values; on machines with few registers this could cause excessive spilling to start happening. I also wonder whether it would be beneficial to generate custom functions for division by specific constants (using this algorithm) and then call those functions rather than the standard lib-call. On ELF systems the functions can marked as hidden and put into common sections so that we only end up with one instance of each function in a program. + /* Checking for owerflow, please not that we couldn't user carry-flag + here before the reload pass . + cres = (tmp < u0) || (tmp < u1); */ Generic code can't assume the presence of a carry flag either before or after reload, so the latter part of the comment is somewhat meaningless. Also spelling error in comment. Finally, do you have a copyright assignment with the FSF? We can't use this code without one. R.