On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrill <j...@rtems.org> wrote:
> > > On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft < > jakob.viket...@aacmicrotec.com> wrote: > >> >> *From:* Joel Sherrill [j...@rtems.org] >> *Sent:* Thursday, October 13, 2016 17:38 >> *To:* Jakob Viketoft >> *Cc:* devel@rtems.org >> *Subject:* Re: Time spent in ticks... >> >> >I don't have an or1k handy so ran on a sparc/erc32 simulator/ >> >It is is a SPARC v7 at 15 Mhz. >> >> >These times are in microseconds and based on the tmtests. >> >Specifically tm08and tm27. >> >> >(1) rtems_clock_tick: only case - 52 >> >(2) rtems interrupt: entry overhead returns to interrupted task - 12 >> >(3) rtems interrupt: exit overhead returns to interrupted task - 4 >> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11 >> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3 >> >> > The above was from the master with SMP enabled. I repeated it with > SMP disabled and it had no impact. > > Since the timing change is post 4.11, I decided to try 4.11 with SMP > disabled: > > rtems_clock_tick: only case - 42 > rtems interrupt: entry overhead returns to interrupted task - 11 > rtems interrupt: exit overhead returns to interrupted task - 4 > rtems interrupt: entry overhead returns to nested interrupt - 11 > rtems interrupt: exit overhead returns to nested interrupt - 3 > > So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles > > So the overhead has gone up some but as Pavel says it is quite likely > some mathematical operation on 64 bit types is slow on your CPU. > > HINT: If you can write a benchmark for 64-bit operations, > it would be a good comparison between CPUs and might > highlight where the software implementation needs improvement. > I decided that another good point of reference was the powerpc/psim BSP. It reports the benchmarks in instructions: (1) rtems_clock_tick: only case - 229 (2) rtems interrupt: entry overhead returns to interrupted task - 102 (3) rtems interrupt: exit overhead returns to interrupted task - 95 (4) rtems interrupt: entry overhead returns to nested interrupt - 105 (5) rtems interrupt: exit overhead returns to nested interrupt - 85 229 + 102 + 96 = 427 instructions. That seems roughly inline with the erc32 which is 1 cycle for all instructions except loads which are 3 and stores which are 2. And the sparc has register windows so entering and exiting an ISR can potentially save and restore a lot of registers. So I am still leaning to Pavel's explanation that some primitive operation is really inefficient. > > >> >The clock tick test has 100 tasks but it looks like they are blocked on >> a semaphore >> >without timeout. >> >> >Your times look WAY too high. Maybe the interrupt is stuck on and >> >not being cleared. >> >> >On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from >> >above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles. >> >So at a higher clock rate, it should be even less time. >> >> >My gut feeling is that I think something is wrong with the ISR handler >> >and it is stuck. But the performance is definitely way too high. >> >> >--joel >> >> (Sorry if the format got somewhat I garbled, anything but top-posting >> have to be done manually...) >> >> I re-tested my case using an -O3 optimization (we have been using -O0 >> during development for debugging purposes) and I got a good performance >> boost, but I'm still nowhere near your numbers. I can vouch for that the >> interrupt (exception really) isn't stuck, but that the code unfortunately >> takes a long time to compute. I have a subsecond counter (1/16 of a second) >> which I'm sampling at various places in the code, storing its numbers to a >> buffer in memory so as to interfere with the program as little as possible. >> >> With -O3, a tick handling still takes ~320 us to perform, but the weight >> has now shifted. tc_windup takes ~214 us and the rest is obviously >> _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst >> speed bumps the biggest contribution (~122 us) seem to be coming from scale >> factor recalculation. Since it's 64 bits, it's turned into a software >> function which can be quite time-consuming apparently. >> >> Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound >> much higher than your total tick with a slower system (we're running at 50 >> MHz). >> >> Is there anything we can do to improve these numbers? Is Clock_isr >> intended to be run uninterrupted as it is now? Can't see that much of the >> BSP patch code has anything to do with the speed of what I'm looking at >> right now... >> >> /Jakob >> >> >> >> *Jakob Viketoft *Senior Engineer in RTL and embedded software >> >> ÅAC Microtec AB >> Dag Hammarskjölds väg 48 >> SE-751 83 Uppsala, Sweden >> >> T: +46 702 80 95 97 >> http://www.aacmicrotec.com >> > >
_______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel