On 22/05/2019 22:34, Jiri Gaisler wrote:
On 5/22/19 7:43 PM, Jiri Gaisler wrote:
On 5/22/19 9:49 AM, Sebastian Huber wrote:
On 22/05/2019 09:39, Jiri Gaisler wrote:
On 5/22/19 8:03 AM, Sebastian Huber wrote:
Hello,
in the libbsd there is a test for the Epoch Based Reclamation:
https://git.rtems.org/rtems-libbsd/tree/testsuite/epoch01/test_main.c
When I run this test using the leon3 BSP on real hardware (150MHz
NGMP FP) the test completes successfully.
If I run the test on the SIS, it is stuck at some point (using "-m
1" works):
sparc-rtems5-sis -leon3 -nouartrx -r -tlim 200 s -m 2
build/sparc-rtems5-leon3-everything/epoch01.exe
This test needs a shorter time-slice in the simulator to succeed (-d
option). The more cpus, the lower number of clocks in the slice is
needed. Through trial-and-error, these values seem to work:
2 CPUs: -m 2 -d 25
3 CPUs: -m 3 -d 10
4 CPUs will not work, even if -d 1 is set. This is most likely a
simulator problem, I will try to find time to look at it in more
detail. A quick trace shows that all CPUs are stuck in a loop
checking for a lock or similar:
It seems cpu 2 and 3 are in _SMP_barrier_Wait(). The cpu 0 and 1 still
to some stuff in the EBR algorithm (ck_* functions). Maybe the
algorithm works only in case some random timing fluctuations occur.
Either that or there is a hidden race condition in the test that does
not show up on real hardware. I noticed that increasing the time slice
actually make the test succeed even on 4 cpus ..!
-m 2 -d 200 PASS
-m 3 -d 200 PASS
-m 4 -d 200 FAIL
-m 4 -d 400 PASS!
BUT
-m 3 -d 400 FAIL!
I will try to add random delays to the interrupt response time to see if
that will make a difference. That is more inline with the real hardware ...
Adding a pseudo-random delay of 0 - 15 clocks to each trap/interrupt causes the
test to pass on all cpu configurations with the default time slice (50)..! I am
not sure what this means - it could be a hidden race condition, the algorithm
might need some jitter to work or it could still be a simulator issue.
Is there any chance that you could compile this test for sis-riscv? RISC-V has
different atomic operations and trap handlers so it would be interesting to see
if the test behaves differently.
It locks up at the same spot:
riscv-rtems5-sis -m 4 build/riscv-rtems5-griscv-default/epoch01.exe
SIS - SPARC/RISCV instruction simulator 2.13, copyright Jiri Gaisler 2019
Bug-reports to j...@gaisler.se
RISCV emulation enabled, 4 cpus online, delta 50 clocks
cpu0> run
*** LIBBSD EPOCH 1 TEST ***
nexus0: <RTEMS Nexus device>
<TestEpoch01>
<EnterExit activeWorker="1">
<Counter worker="0">1059417</Counter>
</EnterExit>
<EnterExit activeWorker="2">
<Counter worker="0">1059303</Counter>
<Counter worker="1">1049390</Counter>
</EnterExit>
<EnterExit activeWorker="3">
<Counter worker="0">1058922</Counter>
<Counter worker="1">1049008</Counter>
<Counter worker="2">1061640</Counter>
</EnterExit>
<EnterExit activeWorker="4">
<Counter worker="0">1058540</Counter>
<Counter worker="1">1048679</Counter>
<Counter worker="2">1061258</Counter>
<Counter worker="3">1061258</Counter>
</EnterExit>
<EnterListOpExit activeWorker="1">
<Counter worker="0">925414</Counter>
<Removals worker="0">100</Removals>
</EnterListOpExit>
<EnterListOpExit activeWorker="2">
<Counter worker="0">704898</Counter>
<Counter worker="1">704835</Counter>
<Removals worker="0">46</Removals>
<Removals worker="1">45</Removals>
</EnterListOpExit>
<EnterListOpExit activeWorker="3">
<Counter worker="0">589977</Counter>
<Counter worker="1">585688</Counter>
<Counter worker="2">592200</Counter>
<Removals worker="0">23</Removals>
<Removals worker="1">23</Removals>
<Removals worker="2">23</Removals>
</EnterListOpExit>
<EnterListOpExit activeWorker="4">
<Counter worker="0">505834</Counter>
<Counter worker="1">501869</Counter>
<Counter worker="2">507615</Counter>
<Counter worker="3">507614</Counter>
<Removals worker="0">19</Removals>
<Removals worker="1">18</Removals>
<Removals worker="2">18</Removals>
<Removals worker="3">18</Removals>
</EnterListOpExit>
<EnterExitPreempt activeWorker="1">
<Counter worker="0">275348</Counter>
</EnterExitPreempt>
<EnterExitPreempt activeWorker="2">
<Counter worker="0">275971</Counter>
<Counter worker="1">280381</Counter>
</EnterExitPreempt>
<EnterExitPreempt activeWorker="3">
<Counter worker="0">275956</Counter>
<Counter worker="1">280283</Counter>
<Counter worker="2">280283</Counter>
</EnterExitPreempt>
<EnterExitPreempt activeWorker="4">
<Counter worker="0">275800</Counter>
<Counter worker="1">280185</Counter>
<Counter worker="2">280185</Counter>
<Counter worker="3">280185</Counter>
</EnterExitPreempt>
<EnterListOpExitPreempt activeWorker="1">
<Counter worker="0">266212</Counter>
<Removals worker="0">68</Removals>
</EnterListOpExitPreempt>
Interrupt!
Stopped at time 975738600 (19514.772 ms)
cpu0>
The EBR is a core synchronization primitive in libbsd. It makes me a bit
nervous to have this dependency on random fluctuations to make progress.
I don't know the algorithm good enough to say if this is the expected
behaviour. A real machine with such an exact relative instruction
execution is probably non-existent.
In general, you can lock up an SMP system quite easily if you perform
the right LL/SC pair on two processors to that they endlessly steal each
other the reservation.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel