On 2016-04-13 15:29, Ian Jackson wrote:
Adam Borowski writes ("Re: Packaging of static libraries"):
On Tue, Apr 12, 2016 at 02:52:33PM +0100, Ian Jackson wrote:
I'm afraid that LTO is probably too dangerous to be used as a
substitute for static linking. See my comments in the recent LTO
thread here, where I referred to the problem of undefined behaviour,
and pointed at John Regehr's blog.
LTO is no different from just concatenating all source files and making
functions static. If your code blows after this, it is your fault not
LTO's. LTO just allows interprocedural optimizations to work between
functions that were originally in different source files.
This narrative of `fault' has two very serious problems.
Firstly, it is hopelessly impractical. As I have already observed
here:
Recently we have seen spectacular advances in compiler optimisation.
Spectacular in that large swathes of existing previously-working code
have been discovered, by diligent compilers, to be contrary to the
published C standard, and `optimised' into non-working machine code.
In fact, it turns out that there is practically no existing C code
which is correct according to said standards (including C compilers
themselves).
There is practically no existing code in any language which is correct
even if you exclude problems with standards. Not sure we can draw many
useful conclusions from such general statements.
To get something more specific, the paper [1] claims that their tool
STACK detected UB in 40% of wheezy packages with C/C++ code.
[1] https://pdos.csail.mit.edu/papers/stack:sosp13.pdf
[2] https://css.csail.mit.edu/stack/
Real existing code does not conform to the rules now being enforced by
compilers. Indeed often it can be very hard to write new code which
does conform to the rules, even if you know what the rules are and
take great care.
I have an impression that many complaints about problems with UB stem
from attempts to write some tricky code. Sometimes tricky (or outright
non-conforming) code is required, e.g., to work around limits of legacy
API. But in many cases it's just clever code trying to get a bit more
speed or to save a bit of memory. Clever enough to get into the area
where some advanced rules apply but not clever enough to obey these rules.
Arguing for safety over speed is somewhat strange then. Why write the
tricky code in the first place?
Two examples showing how C has been turned into a puzzle language:
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_event.c;h=02b39e6da8c65c033c99a22db4784de8d7aeeb7a;hb=HEAD#l458
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxl/libxl_internal.h;h=005fe538c6b5529447185797cc23d898c219e897;hb=HEAD#l294
Why not separate the free list from active watch_slots? Why not have an
array of flags indicating which slot is which?
If those approaches are deemed unattractive, explicitly stating an
assumption of flat memory by casting to uintptr_t before the comparison
doesn't seem very laborious.
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03340.html
http://lists.xenproject.org/archives/html/xen-devel/2015-11/threads.html#00112
Yeah, there is a bunch of misconceptions there.
1. Type-punning via unions is time-honored tradition described in all
versions of the C standard. The referenced email even links to DR 283 so
it's not clear to me why the confusion.
2. The compiler is not free to assume that padding will not be read. It
could be read as chars (even if you ignore type-punning). You mentioned
it yourself in other emails. Not that it gives you much.
3. While writing to / reading from dst->p0 you have to consider not only
the type of p0 but the type of dst too. This is a very practical
concern. For example, see
https://twitter.com/johnregehr/status/706868554222723073 .
4. uint8_t is not guaranteed to be one of the character types and,
hence, is not free to alias everything. See, e.g.,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110#c13 . Not an
immediate concern but something to keep in mind if you strive for strict
standard conformance.
I'm not familiar with Xen but why overlay data for 32- and 64-bit cases
instead of having different structs for them? Why use macros instead of
functions?
The second problem is that it is based on the idea that the C
specification is by definition right and proper.
Whether the C standard is right and proper or not, it's the only
(somewhat) widely accepted middle ground for now.
There are two ways to evaluate the the C specification's rightness and
properness.
The first is to ask what the the nominal remit of the C standards
bodies is. Well, it is and was to standardise existing practice.
Existing practice was to use C as a kind of portable assembler; the
programmer was traditionally entitled to do the kind of things which
are nowadays forbidden. So the C committee has failed at its
task. [1]
The task of the committee was to balance several principles. Why many
(especially in Free Software world) consider being a high-level
assembler a much more important principle than other ones is not clear
to me.
The second is to ask what is most useful. And there again the C
committee have clearly failed.
Apparently others disagree.
We in Debian are in a good position to defend our users from the
fallout from this problem. We could change our default compiler
options to favour safety, and provide more traditional semantics.
Debian (and other distros) have somewhat unusual stakes in the UB debate
due to the porting needs. A lone developer can choose to support only
one platform and is free then to complain that C doesn't provide full
freedom of assembler for this platform. But Debian often takes such
programs and build them for many other architectures.
As an example consider shifts by a value greater than or equal to the
width of the left operand. They are UB in C and work differently on
different CPUs. Will it benefit Debian to declare them
implementation-defined in C? Probably not. Another example is unaligned
accesses.
It looks like Debian (and Free Software community in general) should
strongly favor portability of the standard C over the ability to serve
as a high-level assembler.
We would have influence upstream (for example to further advance the
set of available safety options) if we cared to use it. But sadly it
seems that the notion that our most basic and widely-used programming
language should be one that's fit for programming in is not yet fully
accepted.
At the very least we should fiercely resist any further broadening of
the scope of the C UB problem.
Then the first thing to do is to stop upgrading gcc. Doesn't seem like a
very practical approach.
Next thing is to add options like -fwrapv or -fno-strict-overflow,
-fno-delete-null-pointer-checks, -fno-strict-aliasing but is there a
chance for consensus about it? Doubtful, but who knows...
Perhaps less controversial is fixing UB (and other bugs) in the existing
code. Several years ago this was hopeless but recently some tools
emerged that allow to tackle the problem. First of all, sanitizers --
ASan, UBSan, MSan, TSan, ... While running everything in valgrind is not
very convenient, building everything with ASAN seems quite feasible.
Recent activity related to Debian:
http://balintreczey.hu/blog/progress-report-on-hardened1-linux-amd64-a-potential-debian-port-with-pie-asan-ubsan-and-more/
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812782
https://github.com/Mohit7/Debian-ASan
The last project contains a list of several hundreds packages that fail
to build or run with ASan. Unlike UBSan problems which may or may not
lead to a bug in an executable now or in future, ASan problems are quite
real right now.
After the problems found with ASan and UBSan are dealt with, other tools
could be used to find further problems:
- STACK (mentioned above);
- tis-interpreter -- https://github.com/TrustInSoft/tis-interpreter -- a
recently released "interpreter for finding subtle bugs in programs
written in standard C";
- libcrunch -- https://github.com/stephenrkell/libcrunch -- a tool "for
fast dynamic type checking".
The tools are there, is there will to fix things?..
Perhaps some mixed approach is possible. E.g., disable some
optimizations by default and reeenable them when tests with ASan etc.
pass. Or vice versa -- disable some optimization when tests fail to pass
with ASan enabled.
--
Alexander Cherepanov