Re: Inquiry about Development Features and Integration of Source Code Version Control Systems
Hi - > [...] Specifically, I am interested in understanding how the > Debuginfod Server can facilitate the direct download of source code > from Git repositories. [...] Can you explain under what situations you think this would be helpful? The main challenge is that compilers don't generally have such information (e.g. git repo + file commit/hash) available to them when they build the object files, so it doesn't show up in DWARF, and therefore it doesn't show up in packaged binaries. Thus debuginfod can't pass the info to clients or a version control system. - FChE
Re: Inquiry about Development Features and Integration of Source Code Version Control Systems
Hi Tobias, I wanted to provide an alternative solution that may work for you (with or without support from debuginfod). Support for reading source code directly from a code hosting service (such as GitHub) has been supported for years by Microsoft via SourceLink[1]. In 2018, a proposal[2] was submitted to the DWARF standard to include sourcelink information in the DWARF format. In 2022, a modification to the proposal[2] was accepted and is in the current working draft for DWARF6[3]. The proposal is to place the source URL for any file with an available URL in the DWARF information. There is still a lot of work to do. The DWARF6 specification needs to be published, compilers (or external tools) need to support adding the URLs into the DWARF info, and debuggers need to support downloading the source files. But it could provide a path for you (or debuginfod) to support reading directly from the upstream source repository. > Can you explain under what situations you think this would be helpful? I can only think about situations that involve closed source software. In my experience, companies shipping closed source software have restrictions about where source code can be placed. -Matt [1]: https://learn.microsoft.com/en-us/cpp/build/reference/sourcelink?view=msvc-170 [2]: https://dwarfstd.org/issues/181223.1.html [3]: https://snapshots.sourceware.org/dwarfstd/dwarf-spec/latest/dwarf6-20240227-2154.pdf
Re: Inquiry about Development Features and Integration of Source Code Version Control Systems
Hi - > Support for reading source code directly from a code hosting service > (such as GitHub) has been supported for years by Microsoft via > SourceLink[1]. [...] Thanks for that reminder. Yeah, eventually that could be an alternate way. > [...] > > Can you explain under what situations you think this would be helpful? > > I can only think about situations that involve closed source software. > In my experience, companies shipping closed source software have > restrictions about where source code can be placed. i.e., a situation where even an internal debuginfod instance cannot be trusted with the source code, but some version control system can be? - FChE
Re: [PATCH] libdw: dwarf_getsrcfiles should not imply dwarf_getsrclines
Hi Aaron, On Tue, 2024-04-09 at 23:45 -0400, Aaron Merey wrote: > dwarf_getsrcfiles causes line data to be read in addition to file data. > This is wasteful for programs which only need file or directory names. > Debuginfod server is one such example. > > Fix this by moving the srcfile reading in read_srclines into a separate > function read_srcfiles. This change improves debuginfod server's max > resident set size by up to 75% during rpm indexing. > > * libdw/dwarf_getsrcfiles.c (dwarf_getsrcfiles): Replace > dwarf_getsrclines and __libdw_getsrclines with > __libdw_getsrcfiles. > * libdw/dwarf_getsrclines.c (read_line_header): New function. > (read_srcfiles): New function. > (read_srclines): Move srcfile reading into read_srcfiles. > Add parameter to use cached srcfiles if available. > Also merge srcfiles with any files from DW_LNE_define_file. > (__libdw_getsrclines): Changed to call get_lines_or_files. > (__libdw_getsrcfiles): New function. Calls get_lines_or_files. > (get_lines_or_files): New function based on the old > __libdw_getsrclines. Call read_srcfiles if linesp is NULL, > otherwise call read_srclines. Pass previously read srcfiles > to read_srclines if available. > * libdw/dwarf_macro_getsrcfiles.c (dwarf_macro_getsrcfiles): > Replace __libdw_getsrclines with __libdw_getsrcfiles. > * libdw/libdwP.h (__libdw_getsrcfiles): New declaration. > * tests/.gitignore: Add new test binary. > * tests/get-files.c: Verify that dwarf_getsrcfiles does > not cause srclines to be read. > * tests/get-lines.c: Verify that srclines can be read > after srcfiles have been read. > * tests/Makefile.am: Add new testfiles. > * tests/get-files-define-file.c: Print file names before > and after reading DW_LNE_define_file. > * tests/run-get-files.sh: Add get-files-define-file test. > * tests/testfile-define-file.bz2: New testfile. Copy of > testfile36.debug but with a line program consisting of two > DW_LNE_define_file opcodes. > > https://sourceware.org/bugzilla/show_bug.cgi?id=27405 > > Signed-off-by: Aaron Merey > --- > > v2 changes: > Restored support for DW_LNE_define_file. Great. And sorry I first suggested to just drop it and then said I would like it back. This was more work than I though. > Added DW_LNE_define_file testcase and testfile. Nice. How did you edit this file? > Added new function get_lines_or_files which is based on the old > __libdw_getsrclines. __libdw_getsrclines and __libdw_getsrcfiles now > call get_lines_or_files. It is just a simple rename, but so much nicer to my eyes. Thanks. > Mentioned max resident set size improvement in the commit message. I > did not mention a 5% speed up to rpm indexing that I have previously > talked about since I could not reliably reproduce it with v2 of the patch. > I suspect that speed up may have been due to the smaller max RSS reducing > swapping during that round of testing. Makes sense. And a reduction of 75% of the max resident size it pretty huge in itself. > +case DW_LNE_define_file: > + { > +char *fname = (char *) linep; > +uint8_t *endp = memchr (linep, '\0', lineendp - linep); > +if (endp == NULL) > + goto invalid_data; > +size_t fnamelen = endp - linep; > +linep = endp + 1; > + > +unsigned int diridx; > +if (unlikely (linep >= lineendp)) > + goto invalid_data; > +get_uleb128 (diridx, linep, lineendp); > + > + size_t ndirs = (*filesp)->ndirs; This one line uses tabs for indentation, all others spaces. > +if (unlikely (diridx >= ndirs)) > + { > +__libdw_seterrno (DWARF_E_INVALID_DIR_IDX); > +goto invalid_data; > + } > +Dwarf_Word mtime; > +if (unlikely (linep >= lineendp)) > + goto invalid_data; > +get_uleb128 (mtime, linep, lineendp); > +Dwarf_Word filelength; > +if (unlikely (linep >= lineendp)) > + goto invalid_data; > +get_uleb128 (filelength, linep, lineendp); > + > + /* Add new_file to filelist that will be merged with filesp. */ > +struct filelist *new_file = malloc (sizeof (struct > filelist)); > + if (unlikely (new_file == NULL)) > { > - __libdw_seterrno (DWARF_E_INVALID_DIR_IDX); > - goto invalid_data; > + __libdw_seterrno (DWARF_E_NOMEM); > + goto out; > } And this last block uses tabs again. > - Dwarf_Word mtime; > - if (unlikely (linep >= lineendp)) > - goto invali
Re: [PATCH] libdw: dwarf_getsrcfiles should not imply dwarf_getsrclines
Hi Mark, On Wed, Apr 10, 2024 at 3:48 PM Mark Wielaard wrote: > > v2 changes: > > Restored support for DW_LNE_define_file. > > Great. And sorry I first suggested to just drop it and then said I > would like it back. This was more work than I though. No problem, better if we support this just in case. > This one line uses tabs for indentation, all others spaces. > [...] > And this last block uses tabs again. > [...] > Here tabs again, then the rest spaces. > [...] > So inconsistent indentation, but the code looks good. Fixed the indentation. > > - /* Pass the file data structure to the caller. */ > > - if (filesp != NULL) > > -*filesp = files; > > + const char **newdirs = (void *) &newfiles->info[nnewfiles]; > > + const char **prevdirs = (void *) &prevfiles->info[nprevfiles]; > > + > > + /* Copy prevdirs to newdirs. */ > > + for (size_t n = 0; n < ndirs; n++) > > + newdirs[n] = prevdirs[n]; > > Again slightly off indentation. > But I also don't fully follow the prevdirs/newdirs copying. > Why is this? No newdirs are defined here, are there? > Maybe I don't understand the data-structure used here. The directories are the same but we still need to copy them so that dwarf_getsrcdirs can find newfiles' dir names. Dwarf_Files is an unusual structure since it doesn't contain a member specifically for the array of dirnames. Instead they're stored at the end of the Dwarf_Fileinfo array member. > So testfile-define-file is actually testfile36.debug but with a new > line program? How did you edit/insert that one? I used xxd to create a hexdump of testfile36.debug and modified the line program by hand with a text editor. I converted the modified hexdump back to a binary with xxd -r. I chose testfile36.debug because its .debug_line includes multiple directory and file names. It also contains a single short line program that was easy to replace with two DW_LNE_define_file opcodes without corrupting things. Aaron
Re: [rfc] [patch] PR28204: debuginfod ima signature verification
Hi, Mark - > > - to drop "permissive" mode > > We discussed a bit on irc about "wording". But I think it isn't really > how it is worded, but that there is just different features. What is > called "enforcing" is an authenticity scheme. While "permissive" is > more like an (optional) error-detecting mode. IMHO it makes sense to > simply separate those. For the record, on IRC, I presented these two additional arguments: in the case of federated debuginfod servers, such as debuginfod.elfutils.org, a user is stuck with "ignore" mode operation, because not all upstream servers will have ima signatures to pass. this sacrifices all possible assurance that could come from the federated server relaying ima signatures from those servers that have them even in non-federated cases such as fedoraproject.org, the ideal case for "enforcing" mode as a default, it will fail the moment the user happens to try to debug some pre-fedora-38 binary that may be sitting on the system --- because those rpms just don't have signatures available at all for both these scenarios, the original "permissive" mode would be suitable IOW, without a "permissive" mode being available at all, we could not ask users to enable this new code at all for our own federated servers, nor even for fedora. That's because no server can guarantee the availability of signatures for all content they can serve. > That way you don't have a authentication scheme that is easily > defeated (when put in "permissive" mode). This "easily defeated" theory needs a threat model. It sounds like you are thinking of - an evil debuginfod instance that - you trust enough to list in DEBUGINFOD_URLS - but not enough to label it with "ima:enforcing" - which may strip the X-DEBUGINFOD-FILE-SIGNATURE header" en route then yeah, but sounds far-fetched rather than easy. > And you can implement a simpler error-detection mode that can work > in more cases (by using the executable .gnu_debuglink CRC) No, we already know that this is incapable of checking e.g. source files. And there is no separate CRC for executables vs. debuginfo files. And note that this provides zero protection in the same threat model above (as the CRC field could be MITM'd). > [...] > One "big picture" question is whether this should be a per server URL > policy or something that is enabled/disabled for all server URLs? > That makes it less "flexible" but should simplify things a bit for the > user (and the server urls parsing). Blame some guy named Mark for getting Ryan to build that out last year. :-) https://sourceware.org/pipermail/elfutils-devel/2023q3/006299.html > Also is this all rpm/koji specific? Or do other distros also use ima > signatures, but encode/store them differently? As far as I know, Fedora/RHEL are the first. Yeah it's all package/style specific. > [...] > We now also have a fish profile. I'm afraid I don't speak fish. :-) > > +debuginfod_ima_verification_enabled="no" > > +if test "$enable_ima_verification" = "xrpmimaevmcrypto"; then > > + debuginfod_ima_verification_enabled="yes" > > + default_ima_cert_path=`eval echo > > "/etc/keys/ima:/etc/pki/rpm-ima:$sysconfdir/debuginfod/ima-certs"` # expand > > $prefix too > > + AC_DEFINE([ENABLE_IMA_VERIFICATION], [1], [Define if the required ima > > verification libraries are available]) > > + AC_DEFINE_UNQUOTED(DEBUGINFOD_IMA_CERT_PATH_DEFAULT, > > "$default_ima_cert_path", [Default IMA certificate path]) > > +fi > > +AM_CONDITIONAL([ENABLE_IMA_VERIFICATION],[test "$enable_ima_verification" > > = "xrpmimaevmcrypto"]) > > + > > Not all these are needed anymore now are they? > [...] The elaborate default_ima_cert_path? Yeah probably not, just force Fedora etc. to use the configure parameter. The AM_CONDITIONAL "xrpmimaevmcrypto" part? Yeah we need those. (The iamevm part is just for the headers.) At some point, I'd like to reformat the debuginfod code base, it's become a bit of a mishmash. Separate commit later, I guess? - FChE
Re: Inquiry about Development Features and Integration of Source Code Version Control Systems
> i.e., a situation where even an internal debuginfod instance cannot be > trusted with the source code, but some version control system can be? That was my thought exactly. On Wed, Apr 10, 2024 at 9:07 AM Frank Ch. Eigler wrote: > > Hi - > > > Support for reading source code directly from a code hosting service > > (such as GitHub) has been supported for years by Microsoft via > > SourceLink[1]. [...] > > Thanks for that reminder. Yeah, eventually that could be an alternate > way. > > > [...] > > > Can you explain under what situations you think this would be helpful? > > > > I can only think about situations that involve closed source software. > > In my experience, companies shipping closed source software have > > restrictions about where source code can be placed. > > i.e., a situation where even an internal debuginfod instance cannot be > trusted with the source code, but some version control system can be? > > > - FChE >