Thanks, Kumar!
I ended up doing a slightly different program before reading your comment. 
I used the STALL register to get how many clock cycles an instruction 
spares, so that means the instruction actually takes 1 + stall. I came up 
with these values for my BBB rev. A5A for the instructions that matter for 
my application (all of the tests were made using 32 bits only):
LBBO/LBCO = 3 clocks for DRAM and Shared RAM
SBBO/SBCO = 2 clocks for DRAM and Shared RAM
LBBO = 43+ clocks for DDR reading (43.3 in average over 10000 tries)

LBBO = 42 (or 43) clocks for ADC FIFO0DATA reading (41.0001 in average over 
10000 tries)
The ADC clock didn't impact here, tried with 3 MHz and 8 MHz by changing 
the ADC_CLKDIV register).

I believe I will have to adapt my programs to always consider the 
difference of the CYCLE register... It seems to be the only way to be 
deterministic. Up to now I was manually counting instructions and 
subtracting them from the number of delay loops.

Em terça-feira, 26 de maio de 2015 13:34:17 UTC-3, Kumar Abhishek escreveu:
>
> You could also use the code snippet in this article to calculate clock 
> cycles for individual instructions:
>
>
> http://theembeddedkitchen.net/beaglelogic-building-a-logic-analyzer-with-the-prus-part-1/449
>
> On Tuesday, May 26, 2015 at 7:09:38 PM UTC+5:30, [email protected] 
> wrote:
>>
>> Sorry, just saw that you actually mentioned that the shared memory has 
>> the same performance as the DRAM.
>> Also, I found this: 
>> http://processors.wiki.ti.com/index.php/Programmable_Realtime_Unit#Load_.2F_Store_Instructions
>> where it is said that LBBO should take (1+word count) cycles. If that's 
>> right, an LBBO instruction up to 4 bytes should take 2 cycles for VBUS and 
>> 3 cycles for VBUSP. For now I need to study more to understand which one is 
>> the case, but VBUSP matches with your findings.
>>
>> Em sexta-feira, 3 de janeiro de 2014 23:05:30 UTC-2, Lenny escreveu:
>>>
>>> Hello, 
>>>
>>> I am using a Beaglebone Black. When i measured the number of PRU clock 
>>> cycles needed for the execution of various assembler instructions, I found 
>>> surprisingly large values for memory access. Here follows a list, in which 
>>> one cycle corresponds to a delay of 5ns as expected:
>>>
>>> Most operations, such as ADD,SUB,QBxx,MOV,JMP etc.: 1 cycle
>>>
>>> LBBO 1,2,4 Bytes from PRU DRAM: 3 cycles
>>> LBBO 8 Bytes from PRU DRAM: 4 cycles
>>> LBBO 12 Bytes from PRU DRAM: 5 cycles
>>> LBBO 16 Bytes from PRU DRAM: 6 cycles
>>>
>>> LBCO 4 Bytes from DDR: 43 cycles
>>> LBCO 8 Bytes from DDR: 44 cycles
>>> LBCO 12 Bytes from DDR: 45 cycles
>>> LBCO 16 Bytes from DDR: 46 cycles
>>>
>>> With PRU DRAM, i mean any addresses between 0x00000000 and 0x00004000 
>>> and the shared PRU RAM (12 kB starting from 0x00010000). Any other address 
>>> i tried had the delay stated for "DDR".
>>>
>>> Can anybody confirm the long DDR (and other delays if possible) readout 
>>> times that I have measured? Does anybody have an explanation for these 
>>> large delays?
>>>
>>> Thanks in advance! Lenny
>>>
>>

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to