On 14/01/2019 23:33, Chris Johns wrote:
On 14/1/19 8:22 pm, Sebastian Huber wrote:
while testing the event recording with the libbsd I noticed a GNU ld --wrap
limitation:
https://www.sourceware.org/ml/binutils/2018-12/msg00210.html
I have been watching the thread. There is a limit to what binutils or any method
can do as compiler technology improves.
An example we currently build RTEMS with a single C file on the command line, I
wonder what RTEMS's score would look like if all C files are passed to the
compiler at once and it can optimise over all files as if included in a single
source file. A number of externals we current have would not be visible and
traceable using this method.
If I use -flto in my simple test case, then the wrapping via LD doesn't
work at all.
It turned out that the wrapping doesn't work for references internal to a
translation unit.
The reach for this issue is changing as the push to better optimise the
generated code. If the compiler can remove or optimise an external call as an
internally reference it will.
This is not a compiler optimization issue. The wrapping doesn't work
with -O0 for all references internal to the translation unit. For example:
cat f.c
#include "f.h"
#include <stdio.h>
void h(void)
{
puts(__PRETTY_FUNCTION__);
}
func f(void)
{
h();
puts(__PRETTY_FUNCTION__);
return g;
}
cat f.s
.file "f.c"
.text
.globl h
.type h, @function
h:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $__PRETTY_FUNCTION__.2272, %edi
call puts
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size h, .-h
.globl f
.type f, @function
f:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
call h
movl $__PRETTY_FUNCTION__.2276, %edi
call puts
movl $g, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size f, .-f
.section .rodata
.type __PRETTY_FUNCTION__.2272, @object
.size __PRETTY_FUNCTION__.2272, 2
__PRETTY_FUNCTION__.2272:
.string "h"
.type __PRETTY_FUNCTION__.2276, @object
.size __PRETTY_FUNCTION__.2276, 2
__PRETTY_FUNCTION__.2276:
.string "f"
.ident "GCC: (SUSE Linux) 7.4.0"
.section .note.GNU-stack,"",@progbits
You see "call h" and "call puts". The h() function is defined in the
translation unit. This call is not wrapped.
My hope was that the RTEMS Trace Linker doesn't have this
limitation, but the documentation says (user manual):
"The trace linker’s major role is to wrap functions in the existing executable
with trace code. The
directions on how to wrap application functions is provided by the generator
configuration. The
wrapping function uses a GNU linker option called –wrap=symbol."
https://devel.rtems.org/wiki/Developer/Tracing/Trace_Linker#Limitation
... highlights the need for an external reference.
It says
"Functions must have external linkage to allow the linker to wrap the
symbol."
this is not the same as
"highlights the need for an external reference"
You need an undefined reference to a symbol. References inside a
translation unit are apparently not undefined references.
In the libbsd a lot of things are done through function pointer assignments,
e.g.
static struct netisr_handler ip_nh = {
.nh_name = "ip",
.nh_handler = ip_input,
.nh_proto = NETISR_IP,
#ifdef RSS
.nh_m2cpuid = rss_soft_m2cpuid_v4,
.nh_policy = NETISR_POLICY_CPU,
.nh_dispatch = NETISR_DISPATCH_HYBRID,
#else
.nh_policy = NETISR_POLICY_FLOW,
#endif
};
or
/*
* Perform common duties while attaching to interface list
*/
void
ether_ifattach(struct ifnet *ifp, const u_int8_t *lla)
{
int i;
struct ifaddr *ifa;
struct sockaddr_dl *sdl;
ifp->if_addrlen = ETHER_ADDR_LEN;
ifp->if_hdrlen = ETHER_HDR_LEN;
if_attach(ifp);
ifp->if_mtu = ETHERMTU;
ifp->if_output = ether_output;
ifp->if_input = ether_input;
This makes the tracing quite ineffective in this area.
I suspect the compiler is using a local offset to the code in the file. There
are other cases, for example C++.
I have recently been considering the role libdl can place in hooking trace code
and the effect of deferring the ability to wrap to the target. I suspect it
would not resolve the problem you face because there are no reloc records to the
internal offsets being used but I have not checked. If the DWARF info holds call
or block data maybe hot patching the code might be possible. This would need
host processing to extract the hot patch data.
I consider the wrap method of tracing as a low cost portable API tracer that is
useful for things like malloc/free. It is not like a hardware trace device that
can see everything so I consider there exists a cost/functionality curve.
It took me a while to figure out why the wrapping of ether_input() and
ether_output() didn't work. I tried to improve the LD documentation a
bit as a result.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel