[Bug gcov-profile/119719] New: Suitability of gcov for very resource-constrained systems

dmalcolm at gcc dot gnu.org via Gcc-bugs Thu, 10 Apr 2025 16:06:27 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119719


            Bug ID: 119719
           Summary: Suitability of gcov for very resource-constrained
                    systems
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: gcov-profile
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
  Target Milestone: ---

Quoting Reddit user duane11583 about coverage testing on embedded systems:
  https://www.reddit.com/r/C_Programming/comments/1jvxvfk/comment/mmeigtn/
"""
i have 32k bytes of ram i have no file system

there are far better ways to do this in an embedded system that are easier

for example (getting technical) at every test / decision point call a simple
function

that function SHOULD NOT FOLLOW THE NORMAL CALLING CONVENTION!

in the default case it should be just a return instruction

in the normal / active operational case it is a platform define function
probably written in asm and custom to the board being used. it must preserve
and restore all registers and flags - it is effectively acts like an interrupt
or a trap instruction or break point

in my case my asm code would read the program counter off the stack and write
it to a 20mhz or 50mhz spi device and return. this would be very hard coded in
asm for just that one board. i have code and ram space to do that (about 128
bytes of code and ram total on an embedded device) plus 4-8 bytes of code at
the call site very small!

gcov is utterly and mosterably huge by comparison.

the point is every embedded environment is horribly resource constrained

and i need gcov inside a driver during an interrupt with a realtime system! i
cannot run thus under a mocked simulation.

on a cortexM type chip i might use the serial wire viewer on a larger A type i
might use the STM module if it is present or i use some other high speed thing
say can bus or hdlc if i have that on the board

that code is sort of like the old Call a function at the start of every
function to do stack check

in contrast: gcov requires 20-64 bytes of ram per call sight plus a larger code
foot print i do not have that! ie: the function inserts it self as an abi call
so when you have gcov active it changes the generated code - i need it to not
effect the generated opcodes like that.

externally i would have some device that captures 32bit packets and saves it to
some big ram buffer

examples could be say a raspberry pi with a 50mhz slave spi and a dma to
transfer the data to a huge ddr ram buffer. better: a pynq board or zed board
with a little fpga helper module that captures the high speed 32bit bursts to a
ddr buffer

the point is there are often 4-5 unused pins on an embedded device that can be
reconfigured.

the next step is externally i can create call counts, i can convert PC to
address and source line, and start to draw a score board back to the source
code

another problem is code space so i might need to have one embedded app to test
the http module - so that is one capture - then another app to test the data
processing module each with their own set of data captures.

externally i could combine this data maybe create a web server that gives me
percent coverage and when i visit the web page covered lines are color coded

i can also scan the resulting elf and find all references to that CALL and
track which ones where and where not called by looking for that program counter
value in the data stream

if i change the capture module (using a pynq board with fpga) i could capture a
time stamp and create profiling time line too.

but right now i cannot do that with the embedded devices

and my customer requirement is 100% coverage for all things period there is no
exception. and partial coverage is not going to expose hardware in the loop
conditions.
"""


and https://www.reddit.com/r/C_Programming/comments/1jvxvfk/comment/mmer6eq/
"""
another example:

in a linux kernel module you want to speed up. or perhaps some section of the
kernel you want to improve

by default in libgcc ( or similar) you have a weak definition of _profile_true
and _profile_false in fact they can be the same return instruction because it
does nothing

when you want to profile /coverage a module or section of the kernel) you
compile it with a special flag and link with a platform library with alternate
definitions (ie board/chip specific library coded in asm)

that library on initialization would have allocate a few large (pages) of
buffers pre allocated and ready to go.

on the call it captures the program counter and maybe the value of a high
precision performance counter) and saves both to the buffer.

when full it schedules a ”save page/buffer operation” and switches to the next
pre allocated buffer. the point is that is an extra fast process - the code is
tight self contained and because it is small most of it would live in a few
cache lines.

the win: a high speed time accurate execution trace. this can be used to wiggle
out performance issues with drivers at speed with minimal impact.

in some ways what i am describing is a timer based sampling profile but far
more accurate
"""


Does gcov allow anything like this at the moment?

[Bug gcov-profile/119719] New: Suitability of gcov for very resource-constrained systems

Reply via email to