Re: Self-contained one purpose objects

Sebastian Huber Thu, 23 Jul 2015 04:41:54 -0700

Hello Pavel,

thanks for your comments.


On 23/07/15 12:40, Pavel Pisa wrote:

Hello Sebastian,

the first big thanks for RTEMS architectural updates.

On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:

The Classic RTEMS and POSIX APIs have at least three weaknesses.

* Dynamic memory (the workspace) is used to allocate object pools. This
    requires a complex configuration with heavy use of the C pre-processor.

* Objects are created via function calls which return an object identifier.
    The object operations use this identifier and internally map it to an
    internal object representation.

* The object operations use a rich set of options and attributes. Each time
    these parameters must be evaluated and validated to figure out what
to do.

...

In the long run this could lead to a very small footprint system without
dependencies on dynamic memory and a purely static initialization.

What are your opinions?

I fully understand your motivation and for small footprint system
the direct pointers use is most efficient option.
But in the area of smallest footprint systems there are many
alternatives to RTEMS - MBED, Nuttx etc.

my goal is to get it smaller compared to what we have now. I don't wantthe smallest system on the market. This would be only a side-effect, themain purpose of the self-contained objects is performance and an easierconfiguration. I think the conditional compilation in <rtems/confdefs.h>has reached a problematic complexity.


RTEMS layering (Score, APIs, object identifiers etc) is quite
complex and has considerable overhead. On the other hand
these layers have added value that RTEMS has options to be
used in more complex scenarios. These fact to consider

  * if all locking construction use identifiers then they
    are well traceable. There is some problem with pthreads
    there that pthread_create does not have parameter
    for thread identifier/purpose.

  * use of identifiers and calling system operations with these
    identifiers allows to keep applications API even for case
    when operating system and applications runs in the separate
    domains/CPU privilege/ring. Pasing of pointers is really problematic
    in such use cases. RTEMS does not use memory spaces separation
    and it is questionable if MMU contex switches overhead is appropriate
    for some system use. But on the other hand there can be interesting
    uses where RTEMS is ported to microkernel and multiple RTEMS
    instances are run in address space separated domains even with
    strict temporal separation (POK, SpikeOS). These options
    has not been used in RTEMS too much yet. But there is one specific
    and unique area for RTEMS and it is support for asymmetric,
    heterogeneous multiprocessing. I am not sure how much is this
    feature in actual use today. But RTEMS is unique in this respect
    and use of task optimised CPU cores with different architectures
    would play important role in the future computing - see all todays
    GPUs, APUs and FPGA projects.

So my suggestion is to take all these use cases into consideration.
It should not be taken as the hard requirement, if really means
unacceptable overhead for common use cases but should be considered.

I don't want to change the existing APIs. This object identifierinfrastructure is fine, but it was designed for a specific purpose, e.g.to enable a platform that supports asymmetric multiprocessing (the RTEMSMPCI support). With SMP we see now its limitations. A complex SMPapplication like the FreeBSD network stack uses hundreds of locks andthe protection area of the locks are quite small. The lock/unlocksequence of an uncontested mutex is absolutely performance critical.



[...]


As for implementation, I expect that maximal optimisation is required
for lock path without contention. This path should be optimised to
be inline or simple function call inside application context.
It is question if there should be considered even mutexes which
can be used in heterogeneous setups. If such type is found then there
would be need to call "system" or other more complex function.
But I think that these uses can be left for classic semaphores.
Classic mutexes are usually considered as mechanism used inside
threaded application/subsystem and not to spread from single
address sapace.


Yes.


My feeling is that locking case with contention/wait should be
implemented the way that it allows future privilege separation
of scheduler/system core from applications as well as memory
context separation or use of hypervisors calls for wait.

So I suggest to consider architecture similar to Linux FUTEX.

http://www.akkadia.org/drepper/futex.pdf

and for mutex implementation use this. May it be, add even
in the mutex structure field for identifier/RTEMS object ID.
At least for debug build, it would be great, if there is
in TCB (or in the case of kernel/user separation) well known
TLS variable which would hold pointer to the specific thread
taken mutex chain.

For the optimized OpenMP support I use the Linux futex barrierimplementation of libgomp and added two futex calls for RTEMS (seeattached file of first e-mail). The performance is really good. For themutex and semaphore objects, however, I don't use the futex approach oflibgomp. Futexes have excellent properties for average case systems,e.g. they provide for example random fairness. RTEMS is supposed to be areal-time operating system. So, here random fairness is not enough,instead we need FIFO fairness.

But management of this is not so easy
if mutexes can be released in the different order than locked.


We should not allow this. Such lock order reversals are bad.

So it is not simple single locked list. But mapping which
mutexes are held by given thread is required even for priority
inheritance. This structure has to be kept in user manipulated
data only if we do not want overhead of the syscall in the future.
But all that is manageable and has been solved for FUTEX base
OS API.

Yes, the current priority inheritance implementation needs to getimproved to better support resource nesting. One problem we had inapplications recently occurred in the file system. A file systeminstance like JFFS2 uses a mutex to protect the instance. With this lockheld it uses malloc() and performs device operations which may alsoacquire a mutex. In case a high priority task uses malloc() then thiscould raise the priority of a task accessing the JFFS2 for a very longtime. Since after the malloc() the priority is not immediately restored(resource count not zero).


So generally, I would be very happy if RTEMS is faster
but I hope that there would be found solution viable
for long term and supporting reasonably broad/broad enough
usecases scenarios. I have not all in my head and I think
that this is for more iterations in the discussion.

For the network stack, OpenMP and SMP in general its not a question offaster. Its a question of by far too slow or good enough. We shoulddecide if we want to use self-contained objects for the Newlib internallocks and the C11/C++11 thread support in GCC.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: Self-contained one purpose objects

Reply via email to