Hi all, follow-up question.
In ARMv8, the LDP instruction:
LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]
Will load a pair of 128-bit values (256 in total) from memory to two Q
registers (128-bit vector registers).
When I run debug gem5 to see how the said LDP instruction operates (in
AtomicCPU, for now), I see that it is broken down into 3 micro-ops: 2 loads + 1
register writeback (due to post-increment I'm using in the instruction).
However, I don't get why gem5 triggers two memory loads if the 256-bit that
will feed the registers are contiguous in memory. Couldn't memory provide
256-bit to feed both dest. registers at once?
Some possible reasons I thought:
- memory port only allows 128-bit loads.
Although this could be the case, reading the size of a cache line (64B) would
sound more reasonable.
- we have only one write port
We need two load micro-ops because we can write only one destination register
at a time (and we have two destination registers).
But, in this case, why issue a new memory load in the second uop, if the
previous load had already brought the data (considering memory returns
64B/512-bits)? Why not keep the data memory within the "macro op context" (if
such a thing exists)? Is it simply relying on the cache?
Any clarification on what is the reason for the functioning of this operation
(or macro memory operations in ARM as a whole) is much welcomed!
Thank you,
Pedro.
_______________________________________________
gem5-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s