TLS support on ARM

2009-12-03 Thread Thomas Klein

Hello

To me it looks like, that support for Thread Local Storage exists on ARM 
cpu's.
When needed the compiler is going to fetch the base pointer by a 
internal __builtin_thread_pointer() call.
This is either a call to __aeabi_read_tp() or a Coprocessor fetch 
instruction.


If I'm going to implement __aeabi_read_tp() as a standard C-function, I 
will get in trouble since the registers r1 to r3 are not saved before.

This behaviour is commented in file arm.md:
"..
;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
(define_insn "load_tp_soft"
..
"
Dose anyone know the reason why they are not clobbered?
Is there a way to save r1-r3 at function entry? (e.g. 
__attribute__((save_noreturn_args)) )


The next point is that the __builtin_thread_pointer() call isn't 
ARM/Thumb interwork save.
To use the "hard" Coprocessor fetch instruction the calling function 
must run in ARM mode.
To use "soft" implementation caller and __aeabi_read_tp() must run in 
the same mode.


Is the implementation still incomplete?

regards
Thomas



RE: TLS support on ARM

2009-12-03 Thread Thomas Klein

Hello

> > Dose anyone know the reason why they are not clobbered?
>
> So that they don't have to be saved.  This function is supposed to be
> very fast.  If you want to use a slow implementation, write an
> assembly wrapper which saves additional registers.

This might be the initial plan.
But is this true?

Without clobbering the registers r1-r3
the compiler generates something like this:
ldr r3,[pc, #48]
bl __aeabi_read_tp
addsr7, r0, r3
..
Additional a push and pop of r1-r3 in function __aeabi_read_tp () might 
be required.


With clobbering I can see:
bl __aeabi_read_tp
ldr r3,[pc, #48]
addsr7, r0, r3
..
Here the clobbered version is faster.
Maybe there is an other reason not to clobber.

> > The next point is that the __builtin_thread_pointer() call isn't
> > ARM/Thumb interwork save.
> > To use the "hard" Coprocessor fetch instruction the calling function
> > must run in ARM mode.
>
> True (or Thumb-2, I think).
>
> > To use "soft" implementation caller and __aeabi_read_tp() must run in
> > the same mode.
>
> I don't believe that this is true.  In what way is it not safe?

A "bl __aeabi_read_tp" call does not exchanging the mode.
So the program simply crashes.
Using a "blx" instruction dose the mode exchange,
but this instruction only exists since ArchV5, so this won't help for 
ArchV4T (aka ARM7TDMI).


Long calls also seems not to be handled here.
(There might be reason not to handle this.)

That's why I'm asking.
Is the implementation still incomplete?

regards
Thomas



RE: TLS support on ARM

2009-12-03 Thread Thomas Klein

Hello

> > But is this true?

> It is true because a typical implementation of this function has no
> need to clobber registers.  For instance, glibc's calls a kernel
> helper this way:

Ah. now I understand, you require to have a virtual memory system (or 
similar) that is translating the call into a system call.

Without VM I still can use e.g.
svc #17
bx lr
as an __aeabi_read_tp() implementation.
The real work has to be done inside of the exception handler.

You are right, in this case no clobbering is needed.

> The linker is responsibe for converting bl to blx, or for inserting
> mode changing stubs.  It is also responsible for long calls.  Unless
> you're using a really old linker, I can't see why you would have any
> problems.
It's a while ago that I have had problems with long calls.
I did not realised that ld is doing that for me, great.

> Do you have a concrete problem?
I've had a problem with my __aeabi_read_tp() implementation.
This was "solved" by clobbering registers r1-r3.

Thank you for this information.
Now that I know that this is not a bug, I can decide what to do
, changing the compiler , or better changing my implementation.

regards
Thomas



Add static size report when using -fstack-usage for ARM targets

2010-11-21 Thread Thomas Klein

Hello

With GCC 4.6 a new switch -fstack-usage has been added.
Some target architectures have support for this.
To give ARM targets support for this feature only a few lines of code 
are missing.

Is it possible to add this or something similar?

regards
  Thomas


2010-11-21  Thomas Klein 

* config/arm/arm.c (arm_expand_prologue): Report the static  ..
* config/arm/arm.c (thumb1_expand_prologue): .. stack size if 
-fstack-usage is used.


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 167002)
+++ gcc/config/arm/arm.c(working copy)
@@ -15722,6 +15722,13 @@ arm_expand_prologue (void)
}
 }

+  if (flag_stack_usage)
+{
+  HOST_WIDE_INT stack_size = saved_regs;
Hello

With GCC 4.6 a new switch -fstack-usage has been added.
Some target architectures have support for this.
To give ARM targets support for this feature only a few lines of code 
are missing.

Is it possible to add this or something similar?

Regards
  Thomas


2010-11-21  Thomas Klein 

* config/arm/arm.c (arm_expand_prologue): Report the static  ..
* config/arm/arm.c (thumb1_expand_prologue): .. stack size if 
-fstack-usage is used.


Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 167002)
+++ gcc/config/arm/arm.c(working copy)
@@ -15722,6 +15722,13 @@ arm_expand_prologue (void)
}
 }

+  if (flag_stack_usage)
+{
+  HOST_WIDE_INT stack_size = saved_regs;
+  current_function_static_stack_size = stack_size;
+}
+
+
   if (offsets->outgoing_args != offsets->saved_args + saved_regs)
 {
   /* This add can produce multiple insns for a large constant, so we
@@ -15733,6 +15740,12 @@ arm_expand_prologue (void)

   insn = emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
amount));
+  if (flag_stack_usage)
+{
+   HOST_WIDE_INT stack_size = offsets->outgoing_args - 
(offsets->saved_args + saved_regs);

+   current_function_static_stack_size += stack_size;
+}
+
   do
{
  last = last ? NEXT_INSN (last) : get_insns ();
@@ -20535,6 +20548,10 @@ thumb1_expand_prologue (void)
stack_pointer_rtx);

   amount = offsets->outgoing_args - offsets->saved_regs;
+  if (flag_stack_usage)
+{
+   current_function_static_stack_size = amount;
+}
   amount -= 4 * thumb1_extra_regs_pushed (offsets, true);
   if (amount)
 {
+  current_function_static_stack_size = stack_size;
+}
+
+
   if (offsets->outgoing_args != offsets->saved_args + saved_regs)
 {
   /* This add can produce multiple insns for a large constant, so we
@@ -15733,6 +15740,12 @@ arm_expand_prologue (void)

   insn = emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
amount));
+  if (flag_stack_usage)
+{
+   HOST_WIDE_INT stack_size = offsets->outgoing_args - 
(offsets->saved_args + saved_regs);

+   current_function_static_stack_size += stack_size;
+}
+
   do
{
  last = last ? NEXT_INSN (last) : get_insns ();
@@ -20535,6 +20548,10 @@ thumb1_expand_prologue (void)
stack_pointer_rtx);

   amount = offsets->outgoing_args - offsets->saved_regs;
+  if (flag_stack_usage)
+{
+   current_function_static_stack_size = amount;
+}
   amount -= 4 * thumb1_extra_regs_pushed (offsets, true);
   if (amount)
 {



Request for clarification on how a contribution to gcc can be made

2010-12-13 Thread Thomas Klein

Hello

To me it looks like that what is described in the online document 
  is either not correct or is being 
misinterpreted at least by me.
It's not clear to me at which point the FSF is trusting an individual 
(or organization or company) and why it is mistrusting an individual per 
default.

Is there a way to suggest a code changes.
What kind of paper work is required for small code changes and what for 
huge code changes.
If a potential change is reviewed and accepted by a maintainer, who has 
to commit the change and when are they made.
(In assumption the person who is asking for a change usually did not 
have svn write permission.)

A clarification at GCC side would reduce frustration for people like me.

Regards
  Thomas


C-family stack check for threads

2011-01-13 Thread Thomas Klein

Hi

I would like to have a stack check for threads with small stack space 
for each thread.
(I'm using a ARM Cortex-M3 microcontroller with a stack size of a 1 
KByte per Thread.)

Each thread having its own limit address.
The thread scheduler can then calculate the limit and store this value 
inside of a global variable.
The compiler may generate code to check the stack for overflow at 
function entry.

In principal this can be done this way:
 - push registers as usual
 - figure out if one or two work registers, that can be used directly 
without extra push

 - if not enough registers found push required work registers to stack
 - load limit address into first working register
 - load value of limit address (into the same register)
 - if stack pointer will go to extend the stack (e.g. for local 
variables) load this size value too

   (here the second work register can be used)
 - compare for overflow
 - if overflow occur "call" stack_failure function
 - pop work registers that are pushed before
 - continue function prologue as usual e.g. extend stack pointer

The ARM target has an option "-mapcs-stack-check" but this is more or 
less not working. (implementaion missing)

There are also architecture independent options like
"-fstack-check=generic", "-fstack-limit-symbol=current_stack_limit" or 
"-fstack-limit-register=r6"

that can be used.

The generic stack check is doing a probe at end of function prologue phase
(e.g by writing 12K ahead the current stack pointer position).
If this stack space is not available the probe may generates a fault.
This require that the CPU is having a MPU or a MMU.
For machines with small memory space an additional mechanism should be 
available.


The option "-fstack-check" can be extend by the switches "direct" and 
"indirect" to emit compare code in function prologue.
If switch "direct" is given the address of "-fstack-limit-symbol" 
represents the limit itself.
If switch "indirect" is given "-fstack-limit-symbol" is a kind of global 
variable that needs be read before compare.


I have add an proposal to show how an integrateion of this behavior can 
be done at ARM architecture.


Is there interest to have such a feature at GCC side?
Is there someone with write permission who is willing to play the role 
as a volunteer for this task?
Is the code still small enough to be acceptable or is additional 
paperwork required first?

The generated code itself will be small
e.g. if using "-fstack-check=indirect -fstack-limit-symbol=stack_limit_var"
->push{r0}
->ldrr0, =stack_limit_var
->ldrr0, [r0]
->cmpsp, r0
->bhs1f
->push{lr}
->bl__thumb_stack_failure@ stack check
->.align
->.ltorg
->1:
->pop{r0}
The rest of the implementation overhead is only GCC specific.

Regards
 Thomas Klein

PS
Here are some more implementation hints.
introduce new parameters "direct" and "indirect" in gcc/opts.c and 
gcc/flag-types.h


gcc/explow.c function allocate_dynamic_stack_space:
 - suppress stack probing if parameter "direct", "indirect" or if a 
stack-limit is given
 - do additional read of limit value if parameter "indirect" and a 
stack-limit symbol is given


gcc/config/arm/arm.c
 - new function "stack_check_output_function" to write the stack check 
to the assember file
 - new function "stack_check_work_registers" to find possible working 
registers (only used by "stack check")

 - integration for ARM and Thumb-2 in function arm_expand_prologue
 - integration for Thumb-1 in function thumb1_output_function_prologue

gcc/config/arm/arm.md
 - probe_stack: do not emit code when parameters "direct" or "indirect" 
given

emit code as in gcc/explow.c
 - probe_stack_done: dummy to make sure probe_stack insns are not 
optimized away
 - check_stack: if stack-limit and parameter "generic" is given use the 
limit the same way as in function allocate_dynamic_stack_space
 - stack_check: ARM/Thumb-2 insn to output function 
stack_check_output_function

 - trap: failure call used in function allocate_dynamic_stack_space


Index: gcc/opts.c
===
--- gcc/opts.c(revision 168762)
+++ gcc/opts.c(working copy)
@@ -1616,6 +1616,12 @@ common_handle_option (struct gcc_options *opts,
: STACK_CHECK_STATIC_BUILTIN
  ? STATIC_BUILTIN_STACK_CHECK
  : GENERIC_STACK_CHECK;
+  else if (!strcmp (arg, "indirect"))
+/* This is an other stack checking method.  */
+opts->x_flag_stack_check = INDIRECT_STACK_CHECK;
+  else if (!strcmp (arg, "direct"))
+/* This is an other stack checking method.  */
+opts->