Hello Pavel,
thanks for your comments.
On 23/07/15 12:40, Pavel Pisa wrote:
Hello Sebastian,
the first big thanks for RTEMS architectural updates.
On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:
The Classic RTEMS and POSIX APIs have at least three weaknesses.
* Dynamic memory (the workspace) is used to allocate object pools. This
requires a complex configuration with heavy use of the C pre-processor.
* Objects are created via function calls which return an object identifier.
The object operations use this identifier and internally map it to an
internal object representation.
* The object operations use a rich set of options and attributes. Each time
these parameters must be evaluated and validated to figure out what
to do.
...
In the long run this could lead to a very small footprint system without
dependencies on dynamic memory and a purely static initialization.
What are your opinions?
I fully understand your motivation and for small footprint system
the direct pointers use is most efficient option.
But in the area of smallest footprint systems there are many
alternatives to RTEMS - MBED, Nuttx etc.
my goal is to get it smaller compared to what we have now. I don't want
the smallest system on the market. This would be only a side-effect, the
main purpose of the self-contained objects is performance and an easier
configuration. I think the conditional compilation in <rtems/confdefs.h>
has reached a problematic complexity.
RTEMS layering (Score, APIs, object identifiers etc) is quite
complex and has considerable overhead. On the other hand
these layers have added value that RTEMS has options to be
used in more complex scenarios. These fact to consider
* if all locking construction use identifiers then they
are well traceable. There is some problem with pthreads
there that pthread_create does not have parameter
for thread identifier/purpose.
* use of identifiers and calling system operations with these
identifiers allows to keep applications API even for case
when operating system and applications runs in the separate
domains/CPU privilege/ring. Pasing of pointers is really problematic
in such use cases. RTEMS does not use memory spaces separation
and it is questionable if MMU contex switches overhead is appropriate
for some system use. But on the other hand there can be interesting
uses where RTEMS is ported to microkernel and multiple RTEMS
instances are run in address space separated domains even with
strict temporal separation (POK, SpikeOS). These options
has not been used in RTEMS too much yet. But there is one specific
and unique area for RTEMS and it is support for asymmetric,
heterogeneous multiprocessing. I am not sure how much is this
feature in actual use today. But RTEMS is unique in this respect
and use of task optimised CPU cores with different architectures
would play important role in the future computing - see all todays
GPUs, APUs and FPGA projects.
So my suggestion is to take all these use cases into consideration.
It should not be taken as the hard requirement, if really means
unacceptable overhead for common use cases but should be considered.
I don't want to change the existing APIs. This object identifier
infrastructure is fine, but it was designed for a specific purpose, e.g.
to enable a platform that supports asymmetric multiprocessing (the RTEMS
MPCI support). With SMP we see now its limitations. A complex SMP
application like the FreeBSD network stack uses hundreds of locks and
the protection area of the locks are quite small. The lock/unlock
sequence of an uncontested mutex is absolutely performance critical.
[...]
As for implementation, I expect that maximal optimisation is required
for lock path without contention. This path should be optimised to
be inline or simple function call inside application context.
It is question if there should be considered even mutexes which
can be used in heterogeneous setups. If such type is found then there
would be need to call "system" or other more complex function.
But I think that these uses can be left for classic semaphores.
Classic mutexes are usually considered as mechanism used inside
threaded application/subsystem and not to spread from single
address sapace.
Yes.
My feeling is that locking case with contention/wait should be
implemented the way that it allows future privilege separation
of scheduler/system core from applications as well as memory
context separation or use of hypervisors calls for wait.
So I suggest to consider architecture similar to Linux FUTEX.
http://www.akkadia.org/drepper/futex.pdf
and for mutex implementation use this. May it be, add even
in the mutex structure field for identifier/RTEMS object ID.
At least for debug build, it would be great, if there is
in TCB (or in the case of kernel/user separation) well known
TLS variable which would hold pointer to the specific thread
taken mutex chain.
For the optimized OpenMP support I use the Linux futex barrier
implementation of libgomp and added two futex calls for RTEMS (see
attached file of first e-mail). The performance is really good. For the
mutex and semaphore objects, however, I don't use the futex approach of
libgomp. Futexes have excellent properties for average case systems,
e.g. they provide for example random fairness. RTEMS is supposed to be a
real-time operating system. So, here random fairness is not enough,
instead we need FIFO fairness.
But management of this is not so easy
if mutexes can be released in the different order than locked.
We should not allow this. Such lock order reversals are bad.
So it is not simple single locked list. But mapping which
mutexes are held by given thread is required even for priority
inheritance. This structure has to be kept in user manipulated
data only if we do not want overhead of the syscall in the future.
But all that is manageable and has been solved for FUTEX base
OS API.
Yes, the current priority inheritance implementation needs to get
improved to better support resource nesting. One problem we had in
applications recently occurred in the file system. A file system
instance like JFFS2 uses a mutex to protect the instance. With this lock
held it uses malloc() and performs device operations which may also
acquire a mutex. In case a high priority task uses malloc() then this
could raise the priority of a task accessing the JFFS2 for a very long
time. Since after the malloc() the priority is not immediately restored
(resource count not zero).
So generally, I would be very happy if RTEMS is faster
but I hope that there would be found solution viable
for long term and supporting reasonably broad/broad enough
usecases scenarios. I have not all in my head and I think
that this is for more iterations in the discussion.
For the network stack, OpenMP and SMP in general its not a question of
faster. Its a question of by far too slow or good enough. We should
decide if we want to use self-contained objects for the Newlib internal
locks and the C11/C++11 thread support in GCC.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel