Re: A bug in vrp_meet?

2019-03-20 Thread Richard Biener
On Tue, Mar 19, 2019 at 8:53 PM Jeff Law  wrote:
>
> On 3/6/19 3:05 AM, Richard Biener wrote:
> > On Tue, Mar 5, 2019 at 10:36 PM Jeff Law  wrote:
> >>
> >> On 3/5/19 7:44 AM, Richard Biener wrote:
> >>
> >>> So fixing it properly with also re-optimize_stmt those stmts so we'd CSE
> >>> the MAX_EXPR introduced by folding makes it somewhat ugly.
> >>>
> >>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >>>
> >>> Any ideas how to make it less so?  I can split out making optimize_stmt
> >>> take a gsi * btw, in case that's a more obvious change and it makes the
> >>> patch a little smaller.
> >>>
> >>> Richard.
> >>>
> >>> 2019-03-05  Richard Biener  
> >>>
> >>> PR tree-optimization/89595
> >>> * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> >>> stmt iterator as reference, take boolean output parameter to
> >>> indicate whether the stmt was removed and thus the iterator
> >>> already advanced.
> >>> (dom_opt_dom_walker::before_dom_children): Re-iterate over
> >>> stmts created by folding.
> >>>
> >>> * gcc.dg/torture/pr89595.c: New testcase.
> >>>
> >>
> >> Well, all the real logic changs are in the before_dom_children method.
> >> The bits in optimize_stmt are trivial enough to effectively ignore.
> >>
> >> I don't see a better way to discover and process statements that are
> >> created in the bowels of fold_stmt.
> >
> > I'm not entirely happy so I created the following alternative which
> > is a bit larger and slower due to the pre-pass clearing the visited flag
> > but is IMHO easier to follow.  I guess there's plenty of TLC opportunity
> > here but then I also hope to retire the VN parts of DOM in favor
> > of the non-iterating RPO-VN code...
> >
> > So - I'd lean to this variant even though it has the extra loop over stmts,
> > would you agree?
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2019-03-06  Richard Biener  
> >
> > PR tree-optimization/89595
> > * tree-ssa-dom.c (dom_opt_dom_walker::optimize_stmt): Take
> > stmt iterator as reference, take boolean output parameter to
> > indicate whether the stmt was removed and thus the iterator
> > already advanced.
> > (dom_opt_dom_walker::before_dom_children): Re-iterate over
> > stmts created by folding.
> >
> > * gcc.dg/torture/pr89595.c: New testcase.
> This one is easier to follow from a logic standpoint.  I don't think the
> gimple_set_visited bits are going to be terribly expensive in general.
>
> Is that flag in a known state for new statements?  I'm guessing it's
> cleared by some structure-sized memset as we create the raw statement?

Yes, it's of course not documented that way but IMHo the only reasonable
state.

> Might be worth clarifying that in the comments in gimple.h.
>
> jeff
>


Re: Indicating function exit points in debug data

2019-03-20 Thread Richard Biener
On Tue, Mar 19, 2019 at 9:38 PM Justin Paston-Cooper
 wrote:
>
> Hello,
>
> In my message https://sourceware.org/ml/gdb/2019-03/msg00042.html to
> the gdb mailing list, I asked whether it would be possible to
> implement a command which breaks at all exit points of the current
> stack frame. This would be very useful for evaluating a function's
> final state before it returns its result, independent of where in its
> definition it returns from.
>
> Tom Tromey suggested in that thread that this would be quite easy on
> gdb's side if gcc indicates exit locations in the DWARF data, for

Did he indicate _how_ to represent this in DWARF?  I suppose the breakpoint
should happen before the local frame is teared down.

> instance in the C case, it would indicate the locations of return
> statements. On a related note, he mentions that the "finish" command
> does not work for inlined functions because the compiler does not emit
> the required information.

While "finish" wants a location at the caller side, after the inlined frame is
teared down.  So they are somewhat distinct.

Can you open two enhancement requests in buzilla?

Thanks,
Richard.

> It would be nice if this new break on exit command worked both for
> inlined functions, and also the case of breaking after tail-recursions
> exit. With a single stone, the "finish" bird above is also killed.

> Would it be feasible to implement such a feature in gcc? If I'm not
> the first person to ask for this, are there any architectural or
> practical reasons as to why it might not be possible to implement? As
> it stands, I don't have the level of familiarity with gcc to come to
> you with a patch, but with guidance I would certainly be interested in
> working on the C/C++ case if that would be useful.
>
> Thanks,
>
> Justin


Re: Indicating function exit points in debug data

2019-03-20 Thread Justin Paston-Cooper
On Wed, 20 Mar 2019 at 08:31, Richard Biener  wrote:
>
> On Tue, Mar 19, 2019 at 9:38 PM Justin Paston-Cooper
>  wrote:
> >
> > Hello,
> >
> > In my message https://sourceware.org/ml/gdb/2019-03/msg00042.html to
> > the gdb mailing list, I asked whether it would be possible to
> > implement a command which breaks at all exit points of the current
> > stack frame. This would be very useful for evaluating a function's
> > final state before it returns its result, independent of where in its
> > definition it returns from.
> >
> > Tom Tromey suggested in that thread that this would be quite easy on
> > gdb's side if gcc indicates exit locations in the DWARF data, for
>
> Did he indicate _how_ to represent this in DWARF?  I suppose the breakpoint
> should happen before the local frame is teared down.

Tom didn't suggest any specific implementation. Simon Marchi in the
same thread said, "I suppose that the list of all function exit points
(regardless of whether it is a jump or real return) is an information
that the compiler could theoretically produce and encode in the DWARF
information?". I had a look at the DWARF5 specification. Section 6.2.2
outlines the registers of the state machine which deals with line
number information. One of those registers is named "epilogue_begin".
Its definition is:

-
A boolean indicating that the current address is one (of possibly
many) where execution should be suspended for a breakpoint just prior
to the exit of a function.
-

Section 6.2.5.2 outlines the line number information state machine's
opcodes. One of them is "DW_LNS_set_epilogue_begin". Its definition
is:

-
The DW_LNS_set_epilogue_begin opcode takes no operands. It sets the
epilogue_begin register to “true”. When a breakpoint is set on the
exit of a function or execution steps over the last executable
statement of a function, it is generally desirable to suspend
execution after completion of the last statement but prior to tearing
down the frame (so that local variables can still be examined).
Debuggers generally cannot properly determine where this point is.
This command allows a compiler to communicate the location(s) to use.
Note that the function to which the epilogue end applies cannot be
directly determined from the line number information alone; it must be
determined in combination with the subroutine information entries of
the compilation (including inlined subroutines). In the case of a
trivial function, both prologue end and epilogue begin may occur at
the same address.
-

Shouldn't this be sufficient for gdb to work backwards and determine
the _beginning_ of any exit statement of a function? I had a look into
gcc's involvement with "epilogue_begin". The pass called
"pass_thread_prologue_and_epilogue" transitively calls a function
defined in function.c called "make_epilogue_seq", which as far as I
understand emits an "epilogue_begin" note. I'm not sure how this pass
ties in with each frontend. Does a frontend need to support this pass
explicitly? If so, is it just not supported by a lot of frontends?
Given that no one on the gdb thread mentioned this pass, is there
another problem which I am not aware of?

>
> > instance in the C case, it would indicate the locations of return
> > statements. On a related note, he mentions that the "finish" command
> > does not work for inlined functions because the compiler does not emit
> > the required information.
>
> While "finish" wants a location at the caller side, after the inlined frame is
> teared down.  So they are somewhat distinct.

Obviously I'm not familiar with gcc/gdb internals, but superficially
my idea was that if 'finish' has access to the information on the exit
points of an inlined function, then it can keep track of which
register/location the "return" value ends up in.

>
> Can you open two enhancement requests in buzilla?

Will do.

>
> Thanks,
> Richard.
>
> > It would be nice if this new break on exit command worked both for
> > inlined functions, and also the case of breaking after tail-recursions
> > exit. With a single stone, the "finish" bird above is also killed.
>
> > Would it be feasible to implement such a feature in gcc? If I'm not
> > the first person to ask for this, are there any architectural or
> > practical reasons as to why it might not be possible to implement? As
> > it stands, I don't have the level of familiarity with gcc to come to
> > you with a patch, but with guidance I would certainly be interested in
> > working on the C/C++ case if that would be useful.
> >
> > Thanks,
> >
> > Justin


RISC-V sibcall optimization with save-restore

2019-03-20 Thread Paulo Matos
Hi,

I am working on trying to get RISC-V 32 emitting sibcalls even in the
present of `-msave-restore`, for a client concerned with generated code
size.

Take a look at what current gcc generates for:
  int __attribute__ ((noinline))
  bar ()
  {
return 3;
  }

  int  __attribute__ ((noinline))
  foo ()
  {
return bar ();
  }

  int
  main ()
  {
return foo ();
  }

$ install-riscv/bin/riscv32-unknown-elf-gcc -msave-restore -Os -o- -S test.c
.file   "test.c"
.option nopic
.attribute arch, "rv32i2p0_m2p0_c2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align  2
.globl  bar
.type   bar, @function
bar:
li  a0,3
ret
.size   bar, .-bar
.align  2
.globl  foo
.type   foo, @function
foo:
callt0,__riscv_save_0
callbar
tail__riscv_restore_0
.size   foo, .-foo
.section.text.startup,"ax",@progbits
.align  2
.globl  main
.type   main, @function
main:
callt0,__riscv_save_0
callfoo
tail__riscv_restore_0
.size   main, .-main
.ident  "GCC: (GNU) 9.0.1 20190315 (experimental)"

This could clearly be:

bar:
li  a0,3
ret
.size   bar, .-bar
.align  2
.globl  foo
.type   foo, @function
foo:
tailbar
.size   foo, .-foo
.section.text.startup,"ax",@progbits
.align  2
.globl  main
.type   main, @function
main:
tailfoo
.size   main, .-main


which is what I obtain if I remove the lines 4615-4616 from riscv.c:
4610 static bool
4611 riscv_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
4612tree exp ATTRIBUTE_UNUSED)
4613 {
4614   /* Don't use sibcalls when use save-restore routine.  */
4615   if (TARGET_SAVE_RESTORE)
4616 return false;

This causes, of course, test save-restore-1.c to break because in the
case of the test, we need to save/restore s0 and therefore require the
prologue and epilogue to do its job properly.

AFAICT in cases where s0-s11 do not need to be saved, we could
potentially sibcalls in the presence of save-restore.

The problem I am facing while attempting to implement this is that the
decision to do a sibcall is done at expand time, and I only get register
liveness information much later which means I can't decide to sibcall or
not depending on which registers need to be saved.

I attempted then to always enable sibcall (by removing lines 4615-4616
above) and fixup the epilogue in `riscv_expand_epilogue' to enable a
libcall generation even if sibcall was requested.

The start of `riscv_expand_epilogue' would look like:

void
riscv_expand_epilogue (int style)
{
  struct riscv_frame_info *frame = &cfun->machine->frame;
  unsigned mask = frame->mask;
  HOST_WIDE_INT step1 = frame->total_size;
  HOST_WIDE_INT step2 = 0;
  bool use_restore_libcall = riscv_use_save_libcall (frame);
  rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
  rtx insn;

  if (use_restore_libcall)
style = NORMAL_RETURN;
...


So, use_restore_libcall does not depend on style but only if we should
use save_libcall and then if use_restore_libcall then we move the style
to NORMAL_RETURN.

I thought I was on the right path until I noticed that the CFG is messed
up because of assumptions related to emission of sibcall instead of a
libcall until the epilogue is expanded. During the pro_and_epilogue pass
I get an emergency dump and a segfault:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: in
basic block 2:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: flow
control insn inside a basic block
(jump_insn 24 23 6 2 (parallel [
(return)
(use (reg:SI 1 ra))
(const_int 0 [0])
]) "gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c":11:1 -1
 (nil))
during RTL pass: pro_and_epilogue
dump file: save-restore-1.c.285r.pro_and_epilogue
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: internal
compiler error: in rtl_verify_bb_insns, at cfgrtl.c:2705
0xf86661 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
gcc/gcc/rtl-error.c:108
0x9f0e7a rtl_verify_bb_insns
gcc/gcc/cfgrtl.c:2705
0x9f117e rtl_verify_flow_info_1
gcc/gcc/cfgrtl.c:2791
0x9f19b2 rtl_verify_flow_info
gcc/gcc/cfgrtl.c:3034
0x9d6e7c verify_flow_info()
gcc/gcc/cfghooks.c:263
0xed72e2 execute_function_todo
gcc/gcc/passes.c:1989
0xed626f do_per_function
gcc/gcc/passes.c:1638
0xed7467 execute_todo
gcc/gcc/passes.c:2031
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


Is there a way around this issue and allow the libcall to be emitted,
ev

Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-20 Thread Moritz Strübe
Hey.

Am 11.03.2019 um 12:17 schrieb Jakub Jelinek:

On Mon, Mar 11, 2019 at 11:06:37AM +, Moritz Strübe wrote:


On 11.03.2019 at 10:14 Jakub Jelinek wrote:


You could build with -fsanitize=undefined, that would tell you at runtime you
have undefined behavior in your code (if the SingleDiff has bit ever 0x20
set).


Yes, that helps. Unfortunately I'm on an embedded system, thus the code
size increase is just too big.


You can -fsanitize-undefined-trap-on-error, which doesn't increase size too
much, it is less user-friendly, but still should catch the UB.



Ok, I played around a bit. Interestingly, if I set -fsanitize=udefined and 
-fsanitize-undefined-trap-on-error the compiler detects that it will always 
trap, and optimizes the code accordingly (the code after the trap is removed).* 
 Which kind of brings me to David's argument: Shouldn't the compiler warn if 
there is undefined behavior it certainly knows of?
I do assume though that fsanitize just injects the test-code everywhere and 
relies on the compiler to remove it at unnecessary places. Would be nice, 
though. :)

Cheers
Morty

*After fixing the code, it got too big to fit.


--
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.stru...@redheads.de | Web: 
www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843



Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-20 Thread Christophe Lyon

On 20/03/2019 15:08, Moritz Strübe wrote:

Hey.

Am 11.03.2019 um 12:17 schrieb Jakub Jelinek:

On Mon, Mar 11, 2019 at 11:06:37AM +, Moritz Strübe wrote:


On 11.03.2019 at 10:14 Jakub Jelinek wrote:


You could build with -fsanitize=undefined, that would tell you at runtime you
have undefined behavior in your code (if the SingleDiff has bit ever 0x20
set).


Yes, that helps. Unfortunately I'm on an embedded system, thus the code
size increase is just too big.


You can -fsanitize-undefined-trap-on-error, which doesn't increase size too
much, it is less user-friendly, but still should catch the UB.




Wouldn't this fail to link? I thought the sanitizers need some runtime 
libraries which are only available under linux/macos/android. What do you mean 
by embedded? Isn't it arm-eabi?



Ok, I played around a bit. Interestingly, if I set -fsanitize=udefined and 
-fsanitize-undefined-trap-on-error the compiler detects that it will always 
trap, and optimizes the code accordingly (the code after the trap is removed).* 
 Which kind of brings me to David's argument: Shouldn't the compiler warn if 
there is undefined behavior it certainly knows of?
I do assume though that fsanitize just injects the test-code everywhere and 
relies on the compiler to remove it at unnecessary places. Would be nice, 
though. :)



Could you confirm in which version of the ST libraries you noticed this bug?
I'm told it was fixed on 23-march-2018.

Thanks,

Christophe



Cheers
Morty

*After fixing the code, it got too big to fit.


--
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.stru...@redheads.de | Web: 
www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843





Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-20 Thread Moritz Strübe
Hey.

Am 20.03.2019 um 15:26 schrieb Christophe Lyon:

You can -fsanitize-undefined-trap-on-error, which doesn't increase size too
much, it is less user-friendly, but still should catch the UB.



Wouldn't this fail to link? I thought the sanitizers need some runtime 
libraries which are only available under linux/macos/android. What do you mean 
by embedded? Isn't it arm-eabi?


Nope. It inserts a trap, triggering a hard fault (as the manual says). Works 
just fine.

Moritz

--
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.stru...@redheads.de | Web: 
www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843



Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-20 Thread Jakub Jelinek
On Wed, Mar 20, 2019 at 02:08:09PM +, Moritz Strübe wrote:
> Ok, I played around a bit. Interestingly, if I set -fsanitize=udefined and 
> -fsanitize-undefined-trap-on-error the compiler detects that it will always 
> trap, and optimizes the code accordingly (the code after the trap is 
> removed).*  Which kind of brings me to David's argument: Shouldn't the 
> compiler warn if there is undefined behavior it certainly knows of?

What does it mean certainly knows of?
The sanitization inserts (conditional) traps for all the constructs
that it sanitizes, you certainly don't want warning for that.
Even if the compiler can simplify or optimize out some of the guarding
conditionals around the traps, that doesn't mean it isn't in dead code that
will never be executed.
The only safe warning might be if the compiler can prove that whenever main
is called, there will be a trap executed later on, but that is not the case
in most programs, as one can't prove for most functions they actually never
loop and always return to the caller instead of say exiting, aborting, etc.
(and even if main traps immediately, one could have work done in
constructors and exit from there).  Otherwise, would you like to warn if
there is unconditional trap in some function?  That function could not be
ever called, or it could make some function calls before the trap that would
never return (exit, abort, throw exception, infinite loop).

Jakub


Re: GCC turns &~ into | due to undefined bit-shift without warning

2019-03-20 Thread Andrew Haley
On 3/20/19 2:08 PM, Moritz Strübe wrote:
> 
> Ok, I played around a bit. Interestingly, if I set
> -fsanitize=udefined and -fsanitize-undefined-trap-on-error the
> compiler detects that it will always trap, and optimizes the code
> accordingly (the code after the trap is removed).* Which kind of
> brings me to David's argument: Shouldn't the compiler warn if there
> is undefined behavior it certainly knows of?

Maybe an example would help.

Consider this code:

for (int i = start; i < limit; i++) {
  foo(i * 5);
}

Should GCC be entitled to turn it into

int limit_tmp = i * 5;
for (int i = start * 5; i < limit_tmp; i += 5) {
  foo(i);
}

If you answered "Yes, GCC should be allowed to do this", would you
want a warning? And how many such warnings might there be in a typical
program?

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. 
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


Re: Indicating function exit points in debug data

2019-03-20 Thread Segher Boessenkool
On Wed, Mar 20, 2019 at 10:13:23AM +, Justin Paston-Cooper wrote:
> Section 6.2.5.2 outlines the line number information state machine's
> opcodes. One of them is "DW_LNS_set_epilogue_begin". Its definition
> is:
> 
> -
> The DW_LNS_set_epilogue_begin opcode takes no operands. It sets the
> epilogue_begin register to “true”. When a breakpoint is set on the
> exit of a function or execution steps over the last executable
> statement of a function, it is generally desirable to suspend
> execution after completion of the last statement but prior to tearing
> down the frame (so that local variables can still be examined).
> Debuggers generally cannot properly determine where this point is.
> This command allows a compiler to communicate the location(s) to use.
> Note that the function to which the epilogue end applies cannot be
> directly determined from the line number information alone; it must be
> determined in combination with the subroutine information entries of
> the compilation (including inlined subroutines). In the case of a
> trivial function, both prologue end and epilogue begin may occur at
> the same address.
> -

How should this work with shrink-wrapping?  The whole point of that is
you do not tear down the frame after all other code, etc.  I don't see
how we can do better than putting this DW_LNS_set_epilogue_begin right
before the actual return -- and that is after all the tear down etc.

> I had a look into
> gcc's involvement with "epilogue_begin". The pass called
> "pass_thread_prologue_and_epilogue" transitively calls a function
> defined in function.c called "make_epilogue_seq", which as far as I
> understand emits an "epilogue_begin" note. I'm not sure how this pass
> ties in with each frontend. Does a frontend need to support this pass
> explicitly?

Nope.  This pass is very late in the backend, after register allocation
and everything.  A frontend will just make functions "return", and that's
all it has to do.


Segher


Re: Indicating function exit points in debug data

2019-03-20 Thread Tom Tromey
> "Segher" == Segher Boessenkool  writes:

>> Section 6.2.5.2 outlines the line number information state machine's
>> opcodes. One of them is "DW_LNS_set_epilogue_begin". Its definition
>> is:

Segher> How should this work with shrink-wrapping?  The whole point of that is
Segher> you do not tear down the frame after all other code, etc.  I don't see
Segher> how we can do better than putting this DW_LNS_set_epilogue_begin right
Segher> before the actual return -- and that is after all the tear down etc.

I think it's fine if the epilogue marker is inexact or missing from
optimized code, because (1) that's the current state, and (2) it doesn't
really make sense to talk about an epilogue in some cases.

Similarly, IMO it is fine not to worry about non-local exits.  You can
already catch exceptions and examine them in gdb -- the epilogue marker
feature is mostly to address the unmet need of wanting to set a
breakpoint at the end of a function.

Ideally, in -O0 / -Og code, the marker would be reliable where it
appears.

It would be great if there was a way to indicate the location of the
value-to-be-returned in the DWARF.  That way gdb could extract it at the
epilogue point.  AFAIK this would require a DWARF extension.

thanks,
Tom


Re: RISC-V sibcall optimization with save-restore

2019-03-20 Thread Jim Wilson

On 3/20/19 5:25 AM, Paulo Matos wrote:

I am working on trying to get RISC-V 32 emitting sibcalls even in the
present of `-msave-restore`, for a client concerned with generated code
size.


This won't work unless you define a new set of restore functions.  The 
current ones restore the return address from the stack and return, which 
is wrong if you want to do a sibcall.  This is why we tail call (jump 
to) the restore functions, because the actual function return is in the 
restore functions.  You will need a new set of restore functions that 
restore regs without restoring the ra.  You then probably also then need 
other cascading changes to make this work.


The new set of restore functions will then increase code size a bit 
offsetting the gain you get from using them.  You would have to have 
enough sibling calls that can use -msave-restore to make this 
worthwhile.  It isn't clear if this would be a win or not.



I thought I was on the right path until I noticed that the CFG is messed
up because of assumptions related to emission of sibcall instead of a
libcall until the epilogue is expanded. During the pro_and_epilogue pass
I get an emergency dump and a segfault:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: in
basic block 2:
gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: flow
control insn inside a basic block
(jump_insn 24 23 6 2 (parallel [
 (return)
 (use (reg:SI 1 ra))
 (const_int 0 [0])
 ]) "gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c":11:1 -1
  (nil))


If you look at the epilogue code, you will see that it emits a regular 
instruction which hides the call to the restore routine, and then it 
emits a special fake return insn that doesn't do anything.  You can just 
stop emitting the special fake return insn in this case.  This of course 
assumes that you have a new set of restore functions that actually 
return the caller, instead of the caller's parent.


One of the issues with -msave-restore is that the limited offset ranges 
of calls and branches means that if you don't have a tiny program then 
each save/restore call/jump is probably an auipc/lui plus the call/tail, 
which limits the code size reduction you get from using it.  If you can 
control where the -msave-restore routines are placed in memory, then 
putting them near address 0, or near the global pointer address, will 
allow linker relaxation to optimize these calls/jumps to a single 
instruction.  This will probably help more than trying to get it to work 
with sibling calls.


If you can modify the hardware, you might try adding load/store multiple 
instructions and using that instead of the -msave-restore option.  I 
don't know if anyone has tried this yet, but it would be an interesting 
experiment that might result in smaller code size.


Jim