Hello Pavel,

thanks for your comments.

On 23/07/15 12:40, Pavel Pisa wrote:
Hello Sebastian,

the first big thanks for RTEMS architectural updates.

On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:
The Classic RTEMS and POSIX APIs have at least three weaknesses.

* Dynamic memory (the workspace) is used to allocate object pools. This
    requires a complex configuration with heavy use of the C pre-processor.

* Objects are created via function calls which return an object identifier.
    The object operations use this identifier and internally map it to an
    internal object representation.

* The object operations use a rich set of options and attributes. Each time
    these parameters must be evaluated and validated to figure out what
to do.
...
In the long run this could lead to a very small footprint system without
dependencies on dynamic memory and a purely static initialization.

What are your opinions?
I fully understand your motivation and for small footprint system
the direct pointers use is most efficient option.
But in the area of smallest footprint systems there are many
alternatives to RTEMS - MBED, Nuttx etc.

my goal is to get it smaller compared to what we have now. I don't want the smallest system on the market. This would be only a side-effect, the main purpose of the self-contained objects is performance and an easier configuration. I think the conditional compilation in <rtems/confdefs.h> has reached a problematic complexity.


RTEMS layering (Score, APIs, object identifiers etc) is quite
complex and has considerable overhead. On the other hand
these layers have added value that RTEMS has options to be
used in more complex scenarios. These fact to consider

  * if all locking construction use identifiers then they
    are well traceable. There is some problem with pthreads
    there that pthread_create does not have parameter
    for thread identifier/purpose.

  * use of identifiers and calling system operations with these
    identifiers allows to keep applications API even for case
    when operating system and applications runs in the separate
    domains/CPU privilege/ring. Pasing of pointers is really problematic
    in such use cases. RTEMS does not use memory spaces separation
    and it is questionable if MMU contex switches overhead is appropriate
    for some system use. But on the other hand there can be interesting
    uses where RTEMS is ported to microkernel and multiple RTEMS
    instances are run in address space separated domains even with
    strict temporal separation (POK, SpikeOS). These options
    has not been used in RTEMS too much yet. But there is one specific
    and unique area for RTEMS and it is support for asymmetric,
    heterogeneous multiprocessing. I am not sure how much is this
    feature in actual use today. But RTEMS is unique in this respect
    and use of task optimised CPU cores with different architectures
    would play important role in the future computing - see all todays
    GPUs, APUs and FPGA projects.

So my suggestion is to take all these use cases into consideration.
It should not be taken as the hard requirement, if really means
unacceptable overhead for common use cases but should be considered.

I don't want to change the existing APIs. This object identifier infrastructure is fine, but it was designed for a specific purpose, e.g. to enable a platform that supports asymmetric multiprocessing (the RTEMS MPCI support). With SMP we see now its limitations. A complex SMP application like the FreeBSD network stack uses hundreds of locks and the protection area of the locks are quite small. The lock/unlock sequence of an uncontested mutex is absolutely performance critical.


[...]

As for implementation, I expect that maximal optimisation is required
for lock path without contention. This path should be optimised to
be inline or simple function call inside application context.
It is question if there should be considered even mutexes which
can be used in heterogeneous setups. If such type is found then there
would be need to call "system" or other more complex function.
But I think that these uses can be left for classic semaphores.
Classic mutexes are usually considered as mechanism used inside
threaded application/subsystem and not to spread from single
address sapace.

Yes.


My feeling is that locking case with contention/wait should be
implemented the way that it allows future privilege separation
of scheduler/system core from applications as well as memory
context separation or use of hypervisors calls for wait.

So I suggest to consider architecture similar to Linux FUTEX.

http://www.akkadia.org/drepper/futex.pdf

and for mutex implementation use this. May it be, add even
in the mutex structure field for identifier/RTEMS object ID.
At least for debug build, it would be great, if there is
in TCB (or in the case of kernel/user separation) well known
TLS variable which would hold pointer to the specific thread
taken mutex chain.

For the optimized OpenMP support I use the Linux futex barrier implementation of libgomp and added two futex calls for RTEMS (see attached file of first e-mail). The performance is really good. For the mutex and semaphore objects, however, I don't use the futex approach of libgomp. Futexes have excellent properties for average case systems, e.g. they provide for example random fairness. RTEMS is supposed to be a real-time operating system. So, here random fairness is not enough, instead we need FIFO fairness.

But management of this is not so easy
if mutexes can be released in the different order than locked.

We should not allow this. Such lock order reversals are bad.

So it is not simple single locked list. But mapping which
mutexes are held by given thread is required even for priority
inheritance. This structure has to be kept in user manipulated
data only if we do not want overhead of the syscall in the future.
But all that is manageable and has been solved for FUTEX base
OS API.

Yes, the current priority inheritance implementation needs to get improved to better support resource nesting. One problem we had in applications recently occurred in the file system. A file system instance like JFFS2 uses a mutex to protect the instance. With this lock held it uses malloc() and performs device operations which may also acquire a mutex. In case a high priority task uses malloc() then this could raise the priority of a task accessing the JFFS2 for a very long time. Since after the malloc() the priority is not immediately restored (resource count not zero).


So generally, I would be very happy if RTEMS is faster
but I hope that there would be found solution viable
for long term and supporting reasonably broad/broad enough
usecases scenarios. I have not all in my head and I think
that this is for more iterations in the discussion.

For the network stack, OpenMP and SMP in general its not a question of faster. Its a question of by far too slow or good enough. We should decide if we want to use self-contained objects for the Newlib internal locks and the C11/C++11 thread support in GCC.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to