On Mon, Jun 6, 2016 at 2:34 AM, Richard Henderson <[email protected]> wrote:
> On 06/05/2016 02:47 PM, Michael Rolnik wrote: > >> Is there a reason this code isn't going into translate.c? >> You wouldn't need the declarations in translate-inst.h or translate.h. >> >> I see here two levels of logic >> a. instruction translation >> b. general flow of program translation. >> > > FWIW, static functions can be automatically inlined by the compiler, > whereas external function calls can't. In theory, the compiler could > auto-inline the entire translator into a single function. these functions are called through pointer so there is no way for these functions to be inlined. > > > Order these functions properly and you don't need forward declarations. >> >> is it a requirements? this way it look cleaner. >> > > Does it? In my experience it just means you've got to edit two places > when one changes things. It still one place because it's either instruction translation or a program translation. from my point of view translate.c will have almost no bugs in very close feature whereas translate-inst.c will hove bug fixes and modification. having it in two files will isolate translate.c from erroneous modifications. > > > While this is exactly the formula in the manual, it's also equal to >> >> ((Rd ^ Rr) ^ R) & 16 >> >> Please explain. I don't see it. >> >> >> http://www.wolframalpha.com/input/?i=A+and+B+or+A+and+not+C+or+B+and+not+C,+A+xor+B+xor+C >> > > I did explain: truth table shows that these computations are different. > > > where we examine the difference between the non-carry addition (Rd ^ >> Rr) >> and the carry addition (R) to find the carry out of bit 3. This >> reduces >> the operation count to 2 from 5. >> > > It's not a manipulation of the original expression, but a different way of > looking at the problem. > > You want to compute carry-out of bit 3. Given the *result* of the > addition, it's easier to examine bit 4, into which carry-in has happened, > rather than examine bit 3 and re-compute the carry-out. > > The AVR hardware probably computes it exactly as described in the manual, > because that can be done in parallel with the addition, and has a lower > total gate latency. This is fairly common in the industry, where the > documentation follows the implementation more closely than it perhaps > should. you can't look onto 4th bit because 4th bits in the input were not 0s. > > > Note that carry and borrow are related, and thus this is *also* >> computable >> via ((Rd ^ Rr) ^ R) on bit 4. >> >> please explain, I don't see it >> >> http://www.wolframalpha.com/input/?i=not+A+and+B+or+not+A+and+C+or++C+and+B,+A+xor+B+xor+C >> > > As above, given the *result* of the subtraction, examining bit 4 into > which borrow-in has happened. > > Once you accept that, you'll note that the same expression can be used to > re-create both carry-in and borrow-in. > > I'll also note that the piece-wise store is big-endian, so you could >> perform this in 1 store for 2_BYTE and 2 stores for 3_BYTE. >> >> I got an expression that the platform is little endian. >> > > Then you've got the order of the stores wrong. Your code pushes the LSB > before pushing the MSB, which results in the MSB at the lower address, > which means big-endian. this is right. However as far as I understand AVR is neither little nor big endian because there it's 8 bit architecture (see here http://www.avrfreaks.net/forum/endian-issue). for time being I defined the platform to be little endian with ret address exception > > > Wow. Um... Surely it would be better to store X and Y internally as >> whole >> 24-bit quantities, and Z as a 16-bit quantity (to be extended with >> rampz, >> rampd, or eind as needed). >> >> rampX/Y/Z are represented now as 0x00ff0000. >> X/Y/Z can be represented as 16 bits registers, however I do not know if >> and >> when r26-r31 are used as 8 bits, so if X/Y/Z are represented as 16 bits it >> would be hard to use r26-r31 in arithmetics >> > > You would use a setup like the following, and use these functions instead > of other direct accesses to the cpu registers. This setup requires similar > functions in cpu.h for use by e.g. gdbstub.c. > I will have to think about it. > > > TCGv cpu_rb[24]; > TCGv cpu_rw[4]; > > TCGv read_byte(unsigned rb) > { > TCGv byte = tcg_temp_new(); > if (rb < 24) { > tcg_gen_mov_tl(byte, cpu_rb[rb]); > } else { > unsigned rw = (rb - 24) / 2; > if (rb & 1) { > tcg_gen_shri_tl(byte, cpu_rw[rw]); > } else { > tcg_gen_ext8u_tl(byte, cpu_rw[rw]); > } > } > return byte; > } > > void write_byte(unsigned rb, TCGv val) > { > if (rb < 24) { > tcg_gen_mov_tl(cpu_rb[rb], val); > } else { > unsigned rw = (rb - 24) / 2; > tcg_gen_deposit_tl(cpu_rw[rw], cpu_rw[rw], val, (rb & 1) * 8, 8); > } > } > > /* Return RB+1:RB. */ > TCGv read_word(unsigned rb) > { > TCGv word = tcg_temp_new(); > if (rb < 24) { > tcg_gen_deposit_tl(word, cpu_rb[rb], cpu_rb[rb + 1], 8, 8); > } else { > unsigned rw = (rb - 24) / 2; > tcg_gen_mov_tl(word, cpu_rw[rw]); > } > return word; > } > > void write_word(unsigned rb, TCGv val) > { > if (rb < 24) { > tcg_gen_ext8u_tl(cpu_rb[rb], val); > tcg_gen_shri_tl(cpu_rb[rb + 1], val, 8); > } else { > unsigned rw = (rb - 24) / 2; > tcg_gen_mov_tl(cpu_rw[rw], val); > } > } > > +int avr_translate_DEC(CPUAVRState *env, DisasContext *ctx, >> uint32_t >> > ... > >> + tcg_gen_setcondi_tl(TCG_COND_EQ, cpu_Vf, Rd, 0x7f); /* >> cpu_Vf = >> Rd == 0x7f */ >> >> >> This is INC overflow. >> >> please explain, I don't see a problem here >> > > You have swapped the overflow conditions for INC and DEC. > > 127 + 1 -> -128 > -128 - 1 -> 127 this is how it's defined in the document. > > > > r~ > -- Best Regards, Michael Rolnik
