[Bug ld/32067] New: ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4

2024-08-09 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

Bug ID: 32067
   Summary: ld crash in _bfd_elf_link_keep_memory when building
statifier-1.7.4
   Product: binutils
   Version: unspecified
Status: NEW
  Severity: normal
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: sam at gentoo dot org
  Target Milestone: ---

ld crashes when building statifier-1.7.4, as reported downstream at
https://bugs.gentoo.org/937627:
```
Thread 3.1 "ld" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x774f8740 (LWP 3092614)]
0x7793aec0 in _bfd_elf_link_keep_memory (info=0x55842e80
) at
/usr/src/debug/sys-devel/binutils-/binutils/bfd/elflink.c:67
67if (bed->use_mmap)
(gdb) bt
#0  0x7793aec0 in _bfd_elf_link_keep_memory (info=0x55842e80
) at
/usr/src/debug/sys-devel/binutils-/binutils/bfd/elflink.c:67
#1  elf_link_add_object_symbols (abfd=, info=) at
/usr/src/debug/sys-devel/binutils-/binutils/bfd/elflink.c:5727
#2  bfd_elf_link_add_symbols (abfd=, info=) at
/usr/src/debug/sys-devel/binutils-/binutils/bfd/elflink.c:6363
#3  0x55692ad5 in load_symbols (entry=entry@entry=0x55848740,
place=place@entry=0x7fffdc30) at
/usr/src/debug/sys-devel/binutils-/binutils/ld/ldlang.c:3130
#4  0x556847fd in open_input_bfds (s=0x55848740, os=0x55849cc0,
mode=OPEN_BFD_NORMAL) at
/usr/src/debug/sys-devel/binutils-/binutils/ld/ldlang.c:3622
#5  0x55682209 in lang_process () at
/usr/src/debug/sys-devel/binutils-/binutils/ld/ldlang.c:8194
#6  0x5568c234 in main (argc=, argv=) at
/usr/src/debug/sys-devel/binutils-/binutils/ld/ldmain.c:529

(gdb) p bed
$1 = (const struct elf_backend_data *) 0x0
```

toralf downstream hit this w/ 2.43, I've hit it on recent (few days old) trunk:
```
$ ld --version
GNU ld (Gentoo  p1) 2.43.50.20240806
Copyright (C) 2024 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
```

Testcase coming.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4

2024-08-09 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

--- Comment #1 from Sam James  ---
Created attachment 15665
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15665&action=edit
processor.S

```
$ gcc -nostdinc -m32 processor.S -c -o dl-var.o && gcc -m32 -o dl-var dl-var.o
-Wl,--oformat,binary,--entry=0x0
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core
dumped
compilation terminated.
```

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4

2024-08-09 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

--- Comment #2 from Sam James  ---
(gdb) p info->output_bfd
$1 = (bfd *) 0x556bca80
(gdb) p *info->output_bfd
$2 = {filename = 0x556bcbd0 "dl-var", xvec = 0x77fb1c60 ,
iostream = 0x5569cee0, iovec = 0x77fa9080 , lru_prev =
0x556c8ac0,
  lru_next = 0x556c8ac0, where = 0, mtime = 0, id = 0, flags = 384, format
= bfd_object, direction = write_direction, last_io = bfd_io_seek, cacheable =
1, target_defaulted = 0,
  opened_once = 1, mtime_set = 0, no_export = 0, output_has_begun = 0,
has_armap = 0, is_thin_archive = 0, no_element_cache = 0, selective_search = 0,
is_linker_output = 1,
  is_linker_input = 0, plugin_format = bfd_plugin_unknown, lto_output = 0,
read_only = 0, lto_type = lto_non_object, in_format_matches = 0,
plugin_dummy_bfd = 0x0, origin = 0,
  proxy_origin = 0, section_htab = {table = 0x556bdbc0, newfunc =
0x77ecaafe , memory = 0x556bca20, size = 13,
count = 0, entsize = 304,
frozen = 0}, sections = 0x0, section_last = 0x0, section_count = 0,
archive_plugin_fd = -1, archive_plugin_fd_open_count = 0, archive_pass = 0,
alloc_size = 7, start_address = 0,
  outsymbols = 0x0, symcount = 0, dynsymcount = 0, arch_info = 0x77fbb8c0
, size = 0, arelt_data = 0x0, my_archive = 0x0, archive_next =
0x0, archive_head = 0x0,
  nested_archives = 0x0, link = {next = 0x556beba0, hash = 0x556beba0},
tdata = {aout_data = 0x0, aout_ar_data = 0x0, coff_obj_data = 0x0, pe_obj_data
= 0x0,
xcoff_obj_data = 0x0, ecoff_obj_data = 0x0, srec_data = 0x0, verilog_data =
0x0, ihex_data = 0x0, tekhex_data = 0x0, elf_obj_data = 0x0, mmo_data = 0x0,
trad_core_data = 0x0,
som_data = 0x0, hpux_core_data = 0x0, hppabsd_core_data = 0x0,
sgi_core_data = 0x0, lynx_core_data = 0x0, osf_core_data = 0x0, cisco_core_data
= 0x0, netbsd_core_data = 0x0,
mach_o_data = 0x0, mach_o_fat_data = 0x0, plugin_data = 0x0, pef_data =
0x0, pef_xlib_data = 0x0, sym_data = 0x0, any = 0x0}, usrdata = 0x0, memory =
0x556bca00,
  build_id = 0x0, mmapped = 0x0}
(gdb)

and then

(gdb) p *info->output_bfd.xvec
$10 = {name = 0x77f7349c "binary", flavour = bfd_target_unknown_flavour,
byteorder = BFD_ENDIAN_UNKNOWN, header_byteorder = BFD_ENDIAN_UNKNOWN,
object_flags = 2,
  section_flags = 379, symbol_leading_char = 0 '\000', ar_pad_char = 32 ' ',
ar_max_namelen = 16 '\020', match_priority = 255 '\377',
keep_unused_section_symbols = true,
[...]

so we call get_elf_backend_data even though the format is 'binary'. Oops.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4

2024-08-09 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

Sam James  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4 with -Wl,--oformat,binary

2024-08-09 Thread sam at gentoo dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

Sam James  changed:

   What|Removed |Added

Summary|ld crash in |ld crash in
   |_bfd_elf_link_keep_memory   |_bfd_elf_link_keep_memory
   |when building   |when building
   |statifier-1.7.4 |statifier-1.7.4 with
   ||-Wl,--oformat,binary

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Questions Regarding Initial Input Seeds Used for Fuzzing Binutils via OSS-Fuzz

2024-08-09 Thread Federico Cernera
Dear Binutils Team,


I hope this message finds you well.

My name is Federico Cernera, and I am a PhD student at Sapienza University
of Rome, currently collaborating with Vrije Universiteit Amsterdam.

I have been exploring the Binutils library and its integration with
OSS-Fuzz for fuzzing purposes.

I have questions regarding the initial input seeds used to fuzz the library
and would greatly appreciate your assistance.



   1. Are the current initial input seeds used to fuzz the library via
   OSS-Fuzz human-made, or are they generated due to the fuzzing iterations?
   2. Were the initial input seeds used for the first fuzzing campaign with
   OSS-Fuzz human-made?
   3. Can I access the initial input seeds used during the first fuzzing
   campaign with OSS-Fuzz?


Thank you for your time and consideration. I look forward to your response.


Best regards,

Federico Cernera


[Bug ld/32003] Specifying --package-metadata might not be possible and is too fragile

2024-08-09 Thread bdrung at posteo dot de
https://sourceware.org/bugzilla/show_bug.cgi?id=32003

--- Comment #23 from Benjamin Drung  ---
(In reply to H.J. Lu from comment #14)
> (In reply to Benjamin Drung from comment #13)
> 
> > > Will "%[string]" escape work?
> > 
> > Like this?
> > 
> > -Wl,--encoded-package-metadata={%[quot]type%[quot]:
> > %[quot]deb%[quot]%[comma]%[quot]os%[quot]:
> > %[quot]ubuntu%[quot]%[comma]%[quot]name%[quot]:
> > %[quot]dpkg%[quot]%[comma]%[quot]version%[quot]:%[quot]1.22.
> > 6ubuntu15%[quot]%[comma]%[quot]architecture%[quot]:%[quot]amd64%[quot]}
> 
> It should be %[quote]".

You suggested to borrow from HTML's Named Character References and
https://dev.w3.org/html5/spec-LC/named-character-references.html says that
U+00022 has the name "quot" (not "quote").

> Will adding support for "%[string]" to existing
> --package-metadata option break anything?

It might theoretical break existing use cases. 

--package-metadata='{"version":"1.0%2"}'

The only safe option that I could come up with is to use a marker that would be
invalid JSON. For example: If the string starts with a percent character,
decode it:

--package-metadata='%{"foo":"bar"%[comma]"baz":42}'

would be invalid JSON and decode to:

--package-metadata='{"foo":"bar","baz":42}'

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32003] Specifying --package-metadata might not be possible and is too fragile

2024-08-09 Thread bluca at debian dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32003

--- Comment #24 from Luca Boccassi  ---
(In reply to Benjamin Drung from comment #23)
> (In reply to H.J. Lu from comment #14)
> > (In reply to Benjamin Drung from comment #13)
> > 
> > > > Will "%[string]" escape work?
> > > 
> > > Like this?
> > > 
> > > -Wl,--encoded-package-metadata={%[quot]type%[quot]:
> > > %[quot]deb%[quot]%[comma]%[quot]os%[quot]:
> > > %[quot]ubuntu%[quot]%[comma]%[quot]name%[quot]:
> > > %[quot]dpkg%[quot]%[comma]%[quot]version%[quot]:%[quot]1.22.
> > > 6ubuntu15%[quot]%[comma]%[quot]architecture%[quot]:%[quot]amd64%[quot]}
> > 
> > It should be %[quote]".
> 
> You suggested to borrow from HTML's Named Character References and
> https://dev.w3.org/html5/spec-LC/named-character-references.html says that
> U+00022 has the name "quot" (not "quote").
> 
> > Will adding support for "%[string]" to existing
> > --package-metadata option break anything?
> 
> It might theoretical break existing use cases. 
> 
> --package-metadata='{"version":"1.0%2"}'

Are there distros where '%' is an allowed character in a version string or a
package name? I care about backward compatibility, but we can be sensible about
it, and if in practice it's not a problem, then it's fine to do such a change

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32003] Specifying --package-metadata might not be possible and is too fragile

2024-08-09 Thread bdrung at posteo dot de
https://sourceware.org/bugzilla/show_bug.cgi?id=32003

--- Comment #25 from Benjamin Drung  ---
(In reply to Luca Boccassi from comment #24)
> (In reply to Benjamin Drung from comment #23)
> > (In reply to H.J. Lu from comment #14)
> > > (In reply to Benjamin Drung from comment #13)
> > > 
> > > Will adding support for "%[string]" to existing
> > > --package-metadata option break anything?
> > 
> > It might theoretical break existing use cases. 
> > 
> > --package-metadata='{"version":"1.0%2"}'
> 
> Are there distros where '%' is an allowed character in a version string or a
> package name? I care about backward compatibility, but we can be sensible
> about it, and if in practice it's not a problem, then it's fine to do such a
> change

For all Debian-based distros: % is neither allowed in the package name nor in
the package version. Who wants to check the other 400 distributions?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32003] Specifying --package-metadata might not be possible and is too fragile

2024-08-09 Thread bluca at debian dot org
https://sourceware.org/bugzilla/show_bug.cgi?id=32003

--- Comment #26 from Luca Boccassi  ---
(In reply to Benjamin Drung from comment #25)
> (In reply to Luca Boccassi from comment #24)
> > (In reply to Benjamin Drung from comment #23)
> > > (In reply to H.J. Lu from comment #14)
> > > > (In reply to Benjamin Drung from comment #13)
> > > > 
> > > > Will adding support for "%[string]" to existing
> > > > --package-metadata option break anything?
> > > 
> > > It might theoretical break existing use cases. 
> > > 
> > > --package-metadata='{"version":"1.0%2"}'
> > 
> > Are there distros where '%' is an allowed character in a version string or a
> > package name? I care about backward compatibility, but we can be sensible
> > about it, and if in practice it's not a problem, then it's fine to do such a
> > change
> 
> For all Debian-based distros: % is neither allowed in the package name nor
> in the package version. Who wants to check the other 400 distributions?

We don't need to check all of them individually fortunately, the package format
is enough - deb is out, I don't think it's allowed in rpm? So if Arch doesn't
allow it either, I'd say we are good? I'm the most invested in retaining
backward compat, but in a pragmatic way

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32003] Specifying --package-metadata might not be possible and is too fragile

2024-08-09 Thread bdrung at posteo dot de
https://sourceware.org/bugzilla/show_bug.cgi?id=32003

--- Comment #27 from Benjamin Drung  ---
Taking all comments into account, here is my implementation proposals:

Encoding schema
===

Option 1: Support percent-encoding of the JSON data. Percent-encoding is widely
used and supported (for example, Python provides urllib.parse.unquote for
decoding and urllib.parse.quote for encoding). Example encoded JSON:
'{%22foo%22:%22bar%22}' or '%7B%22foo%22%3A%22bar%22%7D'

This option has the benefit of being easy to implement. Either the encoded
string can be read directly (I can spot package names and version in there) or
decoded using Python's urllib.parse.unquote or online tools.

Option 2: Support percent-encoding and %[string] (where string refers to the
name in HTML's Named Character References) the JSON data. Example encoded JSON:
'{%[quot]foo%[quot]:%[quot]bar%[quot]}' or '{%22foo%22:%22bar%22}'

This option allow people to use %[string] encoding in case they dislike
percent-encoding. The drawback is that it is more work to implement since there
must be a list of names. To make the code simpler, the list of allowed names
might be restricted to, e. g. quot, comma, lbrace, rbrace and maybe add space.

These are the two options that I would be happy about to implement. Supporting
only %[string] would not satisfy me. Using base64 encoding would make the
string shorter, but would not be human readable. Quoted-printable would be an
alternative, but the problematic characters like comma and quotes would not be
encoded by quoted-printable.

Parameter usage
===

Option A: Introduce a new --encoded-package-metadata parameter that takes the
encoded string.

Option B: Extend --package-metadata to always decode the given string. As
previously discussed, package names and version should not contain percent
characters. So this change should not break backward compatibility.

Which of those proposals do you want to see implemented? My initial
implementation is option 1A.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4 with -Wl,--oformat,binary

2024-08-09 Thread amodra at gmail dot com
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

Alan Modra  changed:

   What|Removed |Added

   Assignee|unassigned at sourceware dot org   |amodra at gmail dot com
 Status|NEW |ASSIGNED

-- 
You are receiving this mail because:
You are on the CC list for the bug.


[Bug ld/32067] ld crash in _bfd_elf_link_keep_memory when building statifier-1.7.4 with -Wl,--oformat,binary

2024-08-09 Thread cvs-commit at gcc dot gnu.org
https://sourceware.org/bugzilla/show_bug.cgi?id=32067

--- Comment #3 from Sourceware Commits  ---
The master branch has been updated by Alan Modra :

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=ec8f5671b4e70806fe3053636426a8d179dfef55

commit ec8f5671b4e70806fe3053636426a8d179dfef55
Author: Alan Modra 
Date:   Sat Aug 10 08:41:16 2024 +0930

PR32067, ld -Wl,--oformat,binary crash in _bfd_elf_link_keep_memory

The direct fix for this segfault is to test for a non-NULL bed in
_bfd_elf_link_keep_memory, but also there isn't much point in running
code for LTO if the output is binary.

PR 32067
* elflink.c (_bfd_elf_link_keep_memory): Test for non-NULL bed.
(elf_link_add_object_symbols): Don't run the loop setting
non_ir_ref_regular if the output hash table is not ELF.

-- 
You are receiving this mail because:
You are on the CC list for the bug.