On 15/12/2016 18:02, Sebastian Huber wrote:
On 14/12/16 22:15, Chris Johns wrote:
On 15/12/2016 00:39, Sebastian Huber wrote:
Use C11 mutexes instead of Classic semaphores as a performance
optimization and to simplify the application configuration.

The use of C11 mutexes has not been agreed too and we need to discuss
this in more detail before we allow use within RTEMS. I would like to
see positive agreement from all core maintainers before this and
similar patches can be merged.

A patch is a good thing to start such a discussion.


Great.


RTEMS has required the use of the Classic API because:

 1. Available on all architectures, BSPs and tool sets.
 2. Always present in a build.
 3. Was considered faster than POSIX.

3. is not the case. From an API point of view the POSIX operations could
be faster than the Classic API since the parameter evaluation is simpler.


Yes, things have moved on and those crusty old developers like me have a soft spot for the classic API and I suspect these days it is little distorted view. :)


The Classic API provides a base level of required functionality
because it is always available in supported tool sets and leads to the
smallest footprint because we do not need to link in more than one API.

Compared to self-contained objects (like the C11 mutexes for example)
the overhead of the Classic objects is huge in terms of run-time, memory
footprint, code size (object administration) and complexity (object
administration, use of a heap, unlimited objects, configuration).

I agree. The self contained is very attractive and a really big feature.



I understand things change and move on so it is great to see this
change being proposed and our existing base line being challenged.

I see from your performance figures C11 mutexes are better and the
resources are allocated as needed and used which is a better model
than the Classic API's configuration table. This is nice.

Do all architectures and BSPs have working C11 support?

Yes, all architectures and BSPs support the C11 <threads.h> mutexes,
condition variables, thread-specific storage (mapped to POSIX keys),
once support (mapped to POSIX once) in all configurations. The C11
threads are mapped to POSIX threads (for simplicity, not a hard
requirement).

Thank you and well done for all your efforts in this area. This is a really excellent place to be.



Is there tests in the RTEMS testsuite for C11 threading services?

https://git.rtems.org/rtems/tree/testsuites/sptests/spstdthreads01/init.c


Nice.


What target resources are used to support this API, ie code and RAM
usage?

On a 32-bit target:

(gdb) p sizeof(Semaphore_Control)
$1 = 72
(gdb) p sizeof(mtx_t)
$2 = 20

With Thumb-2 instruction set:

size ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-*.o
    text    data     bss     dec     hex filename
     704       0       0     704     2c0
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
     536       0       0     536     218
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
       4       0       0       4       4
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-call_once.o
     100       0       0     100      64
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-cnd.o
     104       0       0     104      68
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-mtx.o
     156       0       0     156      9c
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-thrd.o
      40       0       0      40      28
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-tss.o

size ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem*
    text    data     bss     dec     hex filename
     496       0       0     496     1f0
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semcreate.o
     152       0       0     152      98
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semdelete.o
      68       0       0      68      44
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semflush.o
      28       0       0      28      1c
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semident.o
      48       0       0      48      30
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem.o
     428       0       0     428     1ac
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semobtain.o
     464       0       0     464     1d0
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semrelease.o
     312       0       0     312     138
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semsetpriority.o


Nice.

The libscore_a-mutex.o contains more than one function. For example we
have (Cortex-M7 target):

7000c5f0 <_Mutex_recursive_Acquire>:
7000c5f0:       2380            movs    r3, #128        ; 0x80
7000c5f2:       f3ef 8111       mrs     r1, BASEPRI
7000c5f6:       f383 8812       msr     BASEPRI_MAX, r3
7000c5fa:       4a12            ldr     r2, [pc, #72]   ; (7000c644
<_Mutex_recursive_Acquire+0x54>)
7000c5fc:       68c3            ldr     r3, [r0, #12]
7000c5fe:       6912            ldr     r2, [r2, #16]
7000c600:       b91b            cbnz    r3, 7000c60a
<_Mutex_recursive_Acquire+0x1a>
7000c602:       60c2            str     r2, [r0, #12]
7000c604:       f381 8811       msr     BASEPRI, r1
7000c608:       4770            bx      lr

Only the above 10 instructions need to be executed in case the mutex is
available. Below is the part that is executed in case the thread needs
to block.

7000c60a:       4293            cmp     r3, r2
7000c60c:       d014            beq.n   7000c638
<_Mutex_recursive_Acquire+0x48>
7000c60e:       3008            adds    r0, #8
7000c610:       b5f0            push    {r4, r5, r6, r7, lr}
7000c612:       b08d            sub     sp, #52 ; 0x34
7000c614:       2700            movs    r7, #0
7000c616:       f04f 7600       mov.w   r6, #33554432   ; 0x2000000
7000c61a:       4d0b            ldr     r5, [pc, #44]   ; (7000c648
<_Mutex_recursive_Acquire+0x58>)
7000c61c:       ab0c            add     r3, sp, #48     ; 0x30
7000c61e:       4c0b            ldr     r4, [pc, #44]   ; (7000c64c
<_Mutex_recursive_Acquire+0x5c>)
7000c620:       f88d 700c       strb.w  r7, [sp, #12]
7000c624:       f843 1d30       str.w   r1, [r3, #-48]!
7000c628:       4909            ldr     r1, [pc, #36]   ; (7000c650
<_Mutex_recursive_Acquire+0x60>)
7000c62a:       9601            str     r6, [sp, #4]
7000c62c:       9502            str     r5, [sp, #8]
7000c62e:       940a            str     r4, [sp, #40]   ; 0x28
7000c630:       f7fd fb8e       bl      70009d50 <_Thread_queue_Enqueue>
7000c634:       b00d            add     sp, #52 ; 0x34
7000c636:       bdf0            pop     {r4, r5, r6, r7, pc}
7000c638:       6903            ldr     r3, [r0, #16]
7000c63a:       3301            adds    r3, #1
7000c63c:       6103            str     r3, [r0, #16]
7000c63e:       f381 8811       msr     BASEPRI, r1
7000c642:       4770            bx      lr
7000c644:       70016980        .word   0x70016980
7000c648:       70009d3d        .word   0x70009d3d
7000c64c:       70009d49        .word   0x70009d49
7000c650:       70013c24        .word   0x70013c24


Nice.


Would the "tiny" footprint be smaller if all internal services
including compiler thread support are made C11? Could this actually be
done? Parts of POSIX has been creeping in over time so the position is
a little confused at the moment. I am not sure about a bits and pieces
approach, maybe a full switch is made.

Yes, the footprint would be smaller. If we provide self-contained
threads, then the footprint would be much smaller, e.g. no object
administration, no heap.

Great. This is a powerful reason to look at moving in this direction and removing the remaining POSIX usage in libstdthreads.

A brief audit of rtems.git shows the change is possible with less than 30 Classic task creates and a similar number of semaphore creates so a full change look reachable which is nice.

Should we look at moving all internal services to C11 and standardise it? I think there is value in doing this. It can be a post 4.12 branch activity.



Does C11 work on LLVM (I hear support is close)?

Where is the C11 API implemented? Is the threading code outside the
RTEMS source tree and what effect does that have on those looking to
certify RTEMS?


The C11 support is not a compiler issue. The <threads.h> is a part of
the C standard library and for RTEMS this header file is provided by
Newlib:

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/threads.h;h=9fb08b03d1eb20024c0d680a7924336ec7ea57bb;hb=HEAD


This header file is compatible to C89 (with the next Newlib release,
currently C99 due to use of inline in <sys/lock.h>). I imported several
parts of the FreeBSD <sys/cdefs.h> for this purpose.

The C11 <threads.h> provided functions are implemented in RTEMS:

https://git.rtems.org/rtems/tree/cpukit/libstdthreads


Thanks.


Does a change like this require a coding standard update?

Currently

https://devel.rtems.org/wiki/Developer/Coding/Conventions

gives no advice to use specific API X or Y.


Yes, I knew the answer to this one. :)

Thank you for the detailed and excellent review and analysis of the C11 support. I have no problem with the change and C11 being used internally.

Chris
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to