[PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x
From: Omar Sandoval This is v3 of my patch series optimizing debuginfod for kernel debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple of minor bugs and adds test cases. Changes from v2 to v3: - Added a test case with seekable rpm and deb files. - Added a couple of independent fixes uncovered while adding tests. - Added a few more prometheus metrics. - Fixed passive mode. Patches 1 and 2 fix existing bugs that the were uncovered by adding new test package files. Patch 3 is a preparatory refactor. Patch 4 makes the schema changes. Patch 5 implements the seekable xz extraction. Patch 6 populates the table of seekable entries at scan time and adds a test. Patch 7 does it for pre-existing files at request time. Here is the background copied and pasted from v1: drgn [1] currently uses debuginfod with great success for debugging userspace processes. However, for debugging the Linux kernel (drgn's main use case), we have had some performance issues with debuginfod, so we intentionally avoid using it. Specifically, it sometimes takes over a minute for debuginfod to respond to queries for vmlinux and kernel modules (not including the actual download time). The reason for the slowness is that Linux kernel debuginfo packages are very large and contain lots of files. To respond to a query for a Linux kernel debuginfo file, debuginfod has to decompress and iterate through the whole package until it finds that file. If the file is towards the end of the package, this can take a very long time. This was previously reported for vdso files [2][3], which debuginfod was able to mitigate with improved caching and prefetching. However, kernel modules are far greater in number, vary drastically by hardware and workload, and can be spread all over the package, so in practice I've still been seeing long delays. This was also discussed on the drgn issue tracker [4]. The fundamental limitation is that Linux packages, which are essentially compressed archives with extra metadata headers, don't support random access to specific files. However, the multi-threaded xz compression format does actually support random access. And, luckily, the kernel debuginfo packages on Fedora, Debian, and Ubuntu all happen to use multi-threaded xz compression! debuginfod can take advantage of this: when it scans a package, if it is a seekable xz archive, it can save the uncompressed offset and size of each file. Then, when it needs a file, it can seek to that offset and extract it from there. This requires some understanding of the xz format and low-level liblzma code, but the speedup is massive: where the worst case was previously about 50 seconds just to find a file in a kernel debuginfo package, with this change the worst case is 0.25 seconds, a ~200x improvement! This works for both .rpm and .deb files. I tested this by requesting and verifying the digest of every file from a few kernel debuginfo rpms and debs [5]. P.S. The biggest downside of this change is that it depends on a very specific compression format that is only used by kernel packages incidentally. I think this is something we should formalize with Linux distributions: large debuginfo packages should use a seekable format. Currently, xz in multi-threaded mode is the only option, but Zstandard also has an experimental seekable format that is worth looking into [6]. Thanks, Omar 1: https://github.com/osandov/drgn 2: https://sourceware.org/bugzilla/show_bug.cgi?id=29478 3: https://bugzilla.redhat.com/show_bug.cgi?id=1970578 4: https://github.com/osandov/drgn/pull/380 5: https://gist.github.com/osandov/89d521fdc6c9a07aa8bb0ebf91974346 6: https://github.com/facebook/zstd/tree/dev/contrib/seekable_format 7: https://sourceware.org/pipermail/elfutils-devel/2024q3/007191.html 8: https://sourceware.org/pipermail/elfutils-devel/2024q3/007208.html Omar Sandoval (7): debuginfod: fix skipping source file tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit check debuginfod: factor out common code for responding from an archive debugifod: add new table and views for seekable archives debuginfod: optimize extraction from seekable xz archives debuginfod: populate _r_seekable on scan debuginfod: populate _r_seekable on request configure.ac | 5 + debuginfod/Makefile.am| 2 +- debuginfod/debuginfod.cxx | 928 +++--- tests/Makefile.am | 4 +- ...pressme-seekable-xz-dbgsym_1.0-1_amd64.deb | Bin 0 -> 6288 bytes ...compressme-seekable-xz_1.0-1.debian.tar.xz | Bin 0 -> 1440 bytes .../compressme-seekable-xz_1.0-1.dsc | 19 + .../compressme-seekable-xz_1.0-1_amd64.deb| Bin 0 -> 6208 bytes .../compressme-seekable-xz_1.0.orig.tar.xz| Bin 0 -> 7160 bytes .../compressme-seekable-xz-1.0-1.src.rpm | Bin 0 -> 15880 bytes .../compressme-seekable-xz-1.0-1.x86_64.rpm | Bin 0 -> 31873 byte
[PATCH v3 4/7] debugifod: add new table and views for seekable archives
From: Omar Sandoval In order to extract a file from a seekable archive, we need to know where in the uncompressed archive the file data starts and its size. Additionally, in order to populate the response headers, we need the file modification time (since we won't be able to get it from the archive metadata). Add a new table, _r_seekable, keyed on the archive file id and entry file id and containing the size, offset, and mtime. It also contains the compression type just in case new seekable formats are supported in the future. In order to search this table when we get a request, we need the file ids available. Add the ids to the _query_d and _query_e views, and rename them to _query_d2 and _query_e2. This schema change is backward compatible and doesn't require reindexing. _query_d2 and _query_e2 can be renamed back the next time BUILDIDS needs to be bumped. Before this change, the database for a single kernel debuginfo RPM (kernel-debuginfo-6.9.6-200.fc40.x86_64.rpm) was about 15MB. This change increases that by about 70kB, only a 0.5% increase. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 24702c23..b3d80090 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -265,25 +265,39 @@ static const char DEBUGINFOD_SQLITE_DDL[] = "foreign key (content) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" "primary key (content, file, mtime)\n" ") " WITHOUT_ROWID ";\n" + "create table if not exists " BUILDIDS "_r_seekable (\n" // seekable rpm contents + "file integer not null,\n" + "content integer not null,\n" + "type text not null,\n" + "size integer not null,\n" + "offset integer not null,\n" + "mtime integer not null,\n" + "foreign key (file) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" + "foreign key (content) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" + "primary key (file, content)\n" + ") " WITHOUT_ROWID ";\n" // create views to glue together some of the above tables, for webapi D queries - "create view if not exists " BUILDIDS "_query_d as \n" + // NB: _query_d2 and _query_e2 were added to replace _query_d and _query_e + // without updating BUILDIDS. They can be renamed back the next time BUILDIDS + // is updated. + "create view if not exists " BUILDIDS "_query_d2 as \n" "select\n" - "b.hex as buildid, n.mtime, 'F' as sourcetype, f0.name as source0, n.mtime as mtime, null as source1\n" + "b.hex as buildid, 'F' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, null as id1, null as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_f_de n\n" "where b.id = n.buildid and f0.id = n.file and n.debuginfo_p = 1\n" "union all select\n" - "b.hex as buildid, n.mtime, 'R' as sourcetype, f0.name as source0, n.mtime as mtime, f1.name as source1\n" + "b.hex as buildid, 'R' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, n.content as id1, f1.name as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_files_v f1, " BUILDIDS "_r_de n\n" "where b.id = n.buildid and f0.id = n.file and f1.id = n.content and n.debuginfo_p = 1\n" ";" // ... and for E queries - "create view if not exists " BUILDIDS "_query_e as \n" + "create view if not exists " BUILDIDS "_query_e2 as \n" "select\n" - "b.hex as buildid, n.mtime, 'F' as sourcetype, f0.name as source0, n.mtime as mtime, null as source1\n" + "b.hex as buildid, 'F' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, null as id1, null as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_f_de n\n" "where b.id = n.buildid and f0.id = n.file and n.executable_p = 1\n" "union all select\n" - "b.hex as buildid, n.mtime, 'R' as sourcetype, f0.name as source0, n.mtime as mtime, f1.name as source1\n" + "b.hex as buildid, 'R' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, n.content as id1, f1.name as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_files_v f1, " BUILDIDS "_r_de n\n" "where b.id = n.buildid and f0.id = n.file and f1.id = n.content and n.executable_p = 1\n" ";" @@ -2557,7 +2571,7 @@ handle_buildid (MHD_Connection* conn, if (atype_code == "D") { pp = new sqlite_ps (thisdb, "mhd-query-d", - "select mtime, sourcetype, source0, source1 from " BUILDIDS "_query_d where buildid = ? " + "select mtime, sourcetype, sour
[PATCH v3 5/7] debuginfod: optimize extraction from seekable xz archives
From: Omar Sandoval The kernel debuginfo packages on Fedora, Debian, and Ubuntu, and many of their downstreams, are all compressed with xz in multi-threaded mode, which allows random access. We can use this to bypass the full archive extraction and dramatically speed up kernel debuginfo requests (from ~50 seconds in the worst case to < 0.25 seconds). This works because multi-threaded xz compression splits up the stream into many independently compressed blocks. The stream ends with an index of blocks. So, to seek to an offset, we find the block containing that offset in the index and then decompress and throw away data until we reach the offset within the block. We can then decompress the desired amount of data, possibly from subsequent blocks. There's no high-level API in liblzma to do this, but we can do it by stitching together a few low-level APIs. We need to pass down the file ids then look up the size, uncompressed offset, and mtime in the _r_seekable table. Note that this table is not yet populated, so this commit has no functional change on its own. Signed-off-by: Omar Sandoval --- configure.ac | 5 + debuginfod/Makefile.am| 2 +- debuginfod/debuginfod.cxx | 456 +- 3 files changed, 457 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index 24e68d94..9c5f7e51 100644 --- a/configure.ac +++ b/configure.ac @@ -441,8 +441,13 @@ eu_ZIPLIB(bzlib,BZLIB,bz2,BZ2_bzdopen,bzip2) # We need this since bzip2 doesn't have a pkgconfig file. BZ2_LIB="$LIBS" AC_SUBST([BZ2_LIB]) +save_LIBS="$LIBS" +LIBS= eu_ZIPLIB(lzma,LZMA,lzma,lzma_auto_decoder,[LZMA (xz)]) +lzma_LIBS="$LIBS" +LIBS="$lzma_LIBS $save_LIBS" AS_IF([test "x$with_lzma" = xyes], [LIBLZMA="liblzma"], [LIBLZMA=""]) +AC_SUBST([lzma_LIBS]) AC_SUBST([LIBLZMA]) eu_ZIPLIB(zstd,ZSTD,zstd,ZSTD_decompress,[ZSTD (zst)]) AS_IF([test "x$with_zstd" = xyes], [LIBZSTD="libzstd"], [LIBLZSTD=""]) diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am index b74e3673..e199dc0c 100644 --- a/debuginfod/Makefile.am +++ b/debuginfod/Makefile.am @@ -70,7 +70,7 @@ bin_PROGRAMS += debuginfod-find endif debuginfod_SOURCES = debuginfod.cxx -debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl +debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) $(lzma_LIBS) -lpthread -ldl debuginfod_find_SOURCES = debuginfod-find.c debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(jsonc_LIBS) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index b3d80090..cf7f48ab 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -63,6 +63,10 @@ extern "C" { #undef __attribute__ /* glibc bug - rhbz 1763325 */ #endif +#ifdef USE_LZMA +#include +#endif + #include #include #include @@ -1961,6 +1965,385 @@ handle_buildid_f_match (bool internal_req_t, return r; } + +#ifdef USE_LZMA +struct lzma_exception: public reportable_exception +{ + lzma_exception(int rc, const string& msg): +// liblzma doesn't have a lzma_ret -> string conversion function, so just +// report the value. +reportable_exception(string ("lzma error: ") + msg + ": error " + to_string(rc)) { + inc_metric("error_count","lzma",to_string(rc)); +} +}; + +// Neither RPM nor deb files support seeking to a specific file in the package. +// Instead, to extract a specific file, we normally need to read the archive +// sequentially until we find the file. This is very slow for files at the end +// of a large package with lots of files, like kernel debuginfo. +// +// However, if the compression format used in the archive supports seeking, we +// can accelerate this. As of July 2024, xz is the only widely-used format that +// supports seeking, and usually only in multi-threaded mode. Luckily, the +// kernel-debuginfo package in Fedora and its downstreams, and the +// linux-image-*-dbg package in Debian and its downstreams, all happen to use +// this. +// +// The xz format [1] ends with an index of independently compressed blocks in +// the stream. In RPM and deb files, the xz stream is the last thing in the +// file, so we assume that the xz Stream Footer is at the end of the package +// file and do everything relative to that. For each file in the archive, we +// remember the size and offset of the file data in the uncompressed xz stream, +// then we use the index to seek to that offset when we need that file. +// +// 1: https://xz.tukaani.org/format/xz-file-format.txt + +// Read the Index at the end of an xz file. +static lzma_index* +read_xz_index (int fd) +{ + off_t footer_pos = -LZMA_STREAM_HEADER_SIZE; + if (lseek (fd, footer
[PATCH v3 7/7] debuginfod: populate _r_seekable on request
From: Omar Sandoval Since the schema change adding _r_seekable was done in a backward compatible way, seekable archives that were previously scanned will not be in _r_seekable. Whenever an archive is going to be extracted to satisfy a request, check if it is seekable. If so, populate _r_seekable while extracting it so that future requests use the optimized path. The next time that BUILDIDS is bumped, all archives will be checked at scan time. At that point, checking again will be unnecessary and this commit (including the test case modification) can be reverted. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx| 76 +--- tests/run-debuginfod-seekable.sh | 45 +++ 2 files changed, 115 insertions(+), 6 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 677eca30..d8a02fb5 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -2740,6 +2740,7 @@ handle_buildid_r_match (bool internal_req_p, } // no match ... look for a seekable entry + bool populate_seekable = ! passive_p; unique_ptr pp (new sqlite_ps (internal_req_p ? db : dbq, "rpm-seekable-query", "select type, size, offset, mtime from " BUILDIDS "_r_seekable " @@ -2749,6 +2750,9 @@ handle_buildid_r_match (bool internal_req_p, { if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step"); + // if we found a match in _r_seekable but we fail to extract it, don't + // bother populating it again + populate_seekable = false; const char* seekable_type = (const char*) sqlite3_column_text (*pp, 0); if (seekable_type != NULL && strcmp (seekable_type, "xz") == 0) { @@ -2840,16 +2844,39 @@ handle_buildid_r_match (bool internal_req_p, throw archive_exception(a, "cannot open archive from pipe"); } - // archive traversal is in three stages, no, four stages: - // 1) skip entries whose names do not match the requested one - // 2) extract the matching entry name (set r = result) - // 3) extract some number of prefetched entries (just into fdcache) - // 4) abort any further processing + // If the archive was scanned in a version without _r_seekable, then we may + // need to populate _r_seekable now. This can be removed the next time + // BUILDIDS is updated. + if (populate_seekable) +{ + populate_seekable = is_seekable_archive (b_source0, a); + if (populate_seekable) +{ + // NB: the names are already interned + pp.reset(new sqlite_ps (db, "rpm-seekable-insert2", + "insert or ignore into " BUILDIDS "_r_seekable (file, content, type, size, offset, mtime) " + "values (?, " + "(select id from " BUILDIDS "_files " + "where dirname = (select id from " BUILDIDS "_fileparts where name = ?) " + "and basename = (select id from " BUILDIDS "_fileparts where name = ?) " + "), 'xz', ?, ?, ?)")); +} +} + + // archive traversal is in five stages: + // 1) before we find a matching entry, insert it into _r_seekable if needed or + //skip it otherwise + // 2) extract the matching entry (set r = result). Also insert it into + //_r_seekable if needed + // 3) extract some number of prefetched entries (just into fdcache). Also + //insert them into _r_seekable if needed + // 4) if needed, insert all of the remaining entries into _r_seekable + // 5) abort any further processing struct MHD_Response* r = 0; // will set in stage 2 unsigned prefetch_count = internal_req_p ? 0 : fdcache_prefetch;// will decrement in stage 3 - while(r == 0 || prefetch_count > 0) // stage 1, 2, or 3 + while(r == 0 || prefetch_count > 0 || populate_seekable) // stage 1-4 { if (interrupted) break; @@ -2863,6 +2890,43 @@ handle_buildid_r_match (bool internal_req_p, continue; string fn = canonicalized_archive_entry_pathname (e); + + if (populate_seekable) +{ + string dn, bn; + size_t slash = fn.rfind('/'); + if (slash == std::string::npos) { +dn = ""; +bn = fn; + } else { +dn = fn.substr(0, slash); +bn = fn.substr(slash + 1); + } + + int64_t seekable_size = archive_entry_size (e); + int64_t seekable_offset = archive_filter_bytes (a, 0); + time_t seekable_mtime = archive_entry_mtime (e); + + pp->reset(); + pp->bind(1, b_id0); + pp->bind(2, dn); + pp->bind(3, bn); + pp->bind(4, seekable_size); + pp->bind(5, seekable_offset); + pp->bind(6, seekable_mtime); + rc = pp->step(); +
[PATCH v3 6/7] debuginfod: populate _r_seekable on scan
From: Omar Sandoval Whenever a new archive is scanned, check if it is seekable with a little liblzma magic, and populate _r_seekable if so. With this, newly scanned seekable archives will used the optimized extraction path added in the previous commit. Also add a test case using some artificial packages. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 150 +- tests/Makefile.am | 4 +- ...pressme-seekable-xz-dbgsym_1.0-1_amd64.deb | Bin 0 -> 6288 bytes ...compressme-seekable-xz_1.0-1.debian.tar.xz | Bin 0 -> 1440 bytes .../compressme-seekable-xz_1.0-1.dsc | 19 +++ .../compressme-seekable-xz_1.0-1_amd64.deb| Bin 0 -> 6208 bytes .../compressme-seekable-xz_1.0.orig.tar.xz| Bin 0 -> 7160 bytes .../compressme-seekable-xz-1.0-1.src.rpm | Bin 0 -> 15880 bytes .../compressme-seekable-xz-1.0-1.x86_64.rpm | Bin 0 -> 31873 bytes ...sme-seekable-xz-debuginfo-1.0-1.x86_64.rpm | Bin 0 -> 21917 bytes ...e-seekable-xz-debugsource-1.0-1.x86_64.rpm | Bin 0 -> 7961 bytes tests/run-debuginfod-seekable.sh | 141 12 files changed, 309 insertions(+), 5 deletions(-) create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz-dbgsym_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.debian.tar.xz create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.dsc create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0.orig.tar.xz create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.src.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debuginfo-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debugsource-1.0-1.x86_64.rpm create mode 100755 tests/run-debuginfod-seekable.sh diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index cf7f48ab..677eca30 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -1998,6 +1998,109 @@ struct lzma_exception: public reportable_exception // // 1: https://xz.tukaani.org/format/xz-file-format.txt +// Return whether an archive supports seeking. +static bool +is_seekable_archive (const string& rps, struct archive* a) +{ + // Only xz supports seeking. + if (archive_filter_code (a, 0) != ARCHIVE_FILTER_XZ) +return false; + + int fd = open (rps.c_str(), O_RDONLY); + if (fd < 0) +return false; + defer_dtor fd_closer (fd, close); + + // Seek to the xz Stream Footer. We assume that it's the last thing in the + // file, which is true for RPM and deb files. + off_t footer_pos = -LZMA_STREAM_HEADER_SIZE; + if (lseek (fd, footer_pos, SEEK_END) == -1) +return false; + + // Decode the Stream Footer. + uint8_t footer[LZMA_STREAM_HEADER_SIZE]; + size_t footer_read = 0; + while (footer_read < sizeof (footer)) +{ + ssize_t bytes_read = read (fd, footer + footer_read, + sizeof (footer) - footer_read); + if (bytes_read < 0) +{ + if (errno == EINTR) +continue; + return false; +} + if (bytes_read == 0) +return false; + footer_read += bytes_read; +} + + lzma_stream_flags stream_flags; + lzma_ret ret = lzma_stream_footer_decode (&stream_flags, footer); + if (ret != LZMA_OK) +return false; + + // Seek to the xz Index. + if (lseek (fd, footer_pos - stream_flags.backward_size, SEEK_END) == -1) +return false; + + // Decode the Number of Records in the Index. liblzma doesn't have an API for + // this if you don't want to decode the whole Index, so we have to do it + // ourselves. + // + // We need 1 byte for the Index Indicator plus 1-9 bytes for the + // variable-length integer Number of Records. + uint8_t index[10]; + size_t index_read = 0; + while (index_read == 0) { + ssize_t bytes_read = read (fd, index, sizeof (index)); + if (bytes_read < 0) +{ + if (errno == EINTR) +continue; + return false; +} + if (bytes_read == 0) +return false; + index_read += bytes_read; + } + // The Index Indicator must be 0. + if (index[0] != 0) +return false; + + lzma_vli num_records; + size_t pos = 0; + size_t in_pos = 1; + while (true) +{ + if (in_pos >= index_read) +{ + ssize_t bytes_read = read (fd, index, sizeof (index)); + if (bytes_read < 0) + { +if (errno == EINTR) + continue; +return false; + } + if (bytes_read == 0) +return false; + index_read = bytes_read; + in_pos = 0; +
[PATCH v3 1/7] debuginfod: fix skipping source file
From: Omar Sandoval dwarf_extract_source_paths explicitly skips source files that equal "", but dwarf_filesrc may return a path like "dir/". Check for and skip that case, too. In particular, the test debuginfod RPMs have paths like this. However, the test cases didn't catch this because they have a bug, too: they follow symlinks, which results in double-counting every file. Fix that, too. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 3 ++- tests/run-debuginfod-archive-groom.sh | 2 +- tests/run-debuginfod-extraction.sh| 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 305edde8..92022f3d 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -3446,7 +3446,8 @@ dwarf_extract_source_paths (Elf *elf, set& debug_sourcefiles) if (hat == NULL) continue; - if (string(hat) == "") // gcc intrinsics, don't bother record + if (string(hat) == "" + || string_endswith(hat, "")) // gcc intrinsics, don't bother record continue; string waldo; diff --git a/tests/run-debuginfod-archive-groom.sh b/tests/run-debuginfod-archive-groom.sh index e2c394ef..0131158f 100755 --- a/tests/run-debuginfod-archive-groom.sh +++ b/tests/run-debuginfod-archive-groom.sh @@ -109,7 +109,7 @@ for i in $newrpms; do rpm2cpio ../$i | cpio -ivd; cd ..; done -sourcefiles=$(find -name \*\\.debug \ +sourcefiles=$(find -name \*\\.debug -type f \ | env LD_LIBRARY_PATH=$ldpath xargs \ ${abs_top_builddir}/src/readelf --debug-dump=decodedline \ | grep mtime: | wc --lines) diff --git a/tests/run-debuginfod-extraction.sh b/tests/run-debuginfod-extraction.sh index da6b25cf..f49dc6f6 100755 --- a/tests/run-debuginfod-extraction.sh +++ b/tests/run-debuginfod-extraction.sh @@ -94,7 +94,7 @@ for i in $newrpms; do rpm2cpio ../$i | cpio -ivd; cd ..; done -sourcefiles=$(find -name \*\\.debug \ +sourcefiles=$(find -name \*\\.debug -type f \ | env LD_LIBRARY_PATH=$ldpath xargs \ ${abs_top_builddir}/src/readelf --debug-dump=decodedline \ | grep mtime: | wc --lines) -- 2.45.2
[PATCH v3 3/7] debuginfod: factor out common code for responding from an archive
From: Omar Sandoval handle_buildid_r_match has two very similar branches where it optionally extracts a section and then creates a microhttpd response. In preparation for adding a third one, factor it out into a function. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 213 +- 1 file changed, 96 insertions(+), 117 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 92022f3d..24702c23 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -1965,6 +1965,81 @@ string canonicalized_archive_entry_pathname(struct archive_entry *e) } +// NB: takes ownership of, and may reassign, fd. +static struct MHD_Response* +create_buildid_r_response (int64_t b_mtime0, + const string& b_source0, + const string& b_source1, + const string& section, + const string& ima_sig, + const char* tmppath, + int& fd, + off_t size, + time_t mtime, + const string& metric, + const struct timespec& extract_begin) +{ + if (tmppath != NULL) +{ + struct timespec extract_end; + clock_gettime (CLOCK_MONOTONIC, &extract_end); + double extract_time = (extract_end.tv_sec - extract_begin.tv_sec) ++ (extract_end.tv_nsec - extract_begin.tv_nsec)/1.e9; + fdcache.intern(b_source0, b_source1, tmppath, size, true, extract_time); +} + + if (!section.empty ()) +{ + int scn_fd = extract_section (fd, b_mtime0, +b_source0 + ":" + b_source1, +section, extract_begin); + close (fd); + if (scn_fd >= 0) +fd = scn_fd; + else +{ + if (verbose) +obatched (clog) << "cannot find section " << section +<< " for archive " << b_source0 +<< " file " << b_source1 << endl; + return 0; +} + + struct stat fs; + if (fstat (fd, &fs) < 0) +{ + close (fd); + throw libc_exception (errno, +string ("fstat ") + b_source0 + string (" ") + section); +} + size = fs.st_size; +} + + struct MHD_Response* r = MHD_create_response_from_fd (size, fd); + if (r == 0) +{ + if (verbose) +obatched(clog) << "cannot create fd-response for " << b_source0 << endl; + close(fd); +} + else +{ + inc_metric ("http_responses_total","result",metric); + add_mhd_response_header (r, "Content-Type", "application/octet-stream"); + add_mhd_response_header (r, "X-DEBUGINFOD-SIZE", to_string(size).c_str()); + add_mhd_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str()); + add_mhd_response_header (r, "X-DEBUGINFOD-FILE", b_source1.c_str()); + if(!ima_sig.empty()) add_mhd_response_header(r, "X-DEBUGINFOD-IMASIGNATURE", ima_sig.c_str()); + add_mhd_last_modified (r, mtime); + if (verbose > 1) +obatched(clog) << "serving " << metric << " " << b_source0 + << " file " << b_source1 + << " section=" << section + << " IMA signature=" << ima_sig << endl; + /* libmicrohttpd will close fd. */ +} + return r; +} static struct MHD_Response* handle_buildid_r_match (bool internal_req_p, @@ -2142,57 +2217,15 @@ handle_buildid_r_match (bool internal_req_p, break; // branch out of if "loop", to try new libarchive fetch attempt } - if (!section.empty ()) - { - int scn_fd = extract_section (fd, fs.st_mtime, - b_source0 + ":" + b_source1, - section, extract_begin); - close (fd); - if (scn_fd >= 0) - fd = scn_fd; - else - { - if (verbose) - obatched (clog) << "cannot find section " << section - << " for archive " << b_source0 - << " file " << b_source1 << endl; - return 0; - } - - rc = fstat(fd, &fs); - if (rc < 0) - { - close (fd); - throw libc_exception (errno, - string ("fstat archive ") + b_source0 + string (" file ") + b_source1 - + string (" section ") + section); - } - } - - struct MHD_Response* r = MHD_create_response_from_fd (fs.st_size, fd); + struct MHD_Response* r = create_buildid_r_response (b_mtime, b_source0, + b_source1, section, + ima_sig, NULL, fd, + f
[PATCH v3 2/7] tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit check
From: Omar Sandoval Since commit acd9525e93d7 ("PR31265 - rework debuginfod archive-extract fdcache"), the fdcache limit is only applied when a new file is interned and it has been at least 10 seconds since the limit was last applied. This means that the fdcache can go over the limit temporarily. run-debuginfod-fd-prefetch-caches.sh happens to avoid tripping over this because of lucky sizes of the files used in the test. However, adding new files for an upcoming test exposed this failure. Disable this part of the test for now. Signed-off-by: Omar Sandoval --- tests/run-debuginfod-fd-prefetch-caches.sh | 4 1 file changed, 4 insertions(+) diff --git a/tests/run-debuginfod-fd-prefetch-caches.sh b/tests/run-debuginfod-fd-prefetch-caches.sh index 3db78ade..90730555 100755 --- a/tests/run-debuginfod-fd-prefetch-caches.sh +++ b/tests/run-debuginfod-fd-prefetch-caches.sh @@ -99,6 +99,9 @@ kill $PID1 wait $PID1 PID1=0 +# Since we now only limit the fd cache every 10 seconds, it can temporarily go +# over the limit. That makes this part of the test unreliable. +if false; then # # Test mb limit on fd cache # @@ -148,3 +151,4 @@ kill $PID1 wait $PID1 PID1=0 exit 0 +fi -- 2.45.2
Re: [PATCH 7/9 v2] libdw: Make libdw_findcu thread-safe
Hi, On Wed, 2024-07-17 at 18:34 -0400, Aaron Merey wrote: > From: Heather McIntyre > > * libdw/libdw_findcu.c (__libdw_findcu): Use eu_tfind > and dwarf_lock > (__libdw_intern_next_unit): Use per-Dwarf_CU locks. > > Signed-off-by: Heather S. McIntyre > Signed-off-by: Aaron Merey > Signed-off-by: Mark Wielaard > > --- > v2 changes: > Use per-Dwarf_CU lock instead of a global lock. Are you sure this description and the ChangeLog entry are correct? This patch doesn't contain a change to use eu_tfind (there is already one). And it seems to use a per-Dwarf lock, not a per-Dwarf_CU lock. > libdw/libdw_findcu.c | 47 > 1 file changed, 30 insertions(+), 17 deletions(-) > > diff --git a/libdw/libdw_findcu.c b/libdw/libdw_findcu.c > index 72cf261c..8acff448 100644 > --- a/libdw/libdw_findcu.c > +++ b/libdw/libdw_findcu.c > @@ -177,6 +177,8 @@ __libdw_intern_next_unit (Dwarf *dbg, bool debug_types) >newp->startp = data->d_buf + newp->start; >newp->endp = data->d_buf + newp->end; >eu_search_tree_init (&newp->locs_tree); > + rwlock_init (newp->abbrev_lock); > + rwlock_init (newp->split_lock); > >/* v4 debug type units have version == 4 and unit_type == DW_UT_type. */ >if (debug_types) Neither of these locks are used in the rest of the patch. > @@ -243,27 +245,38 @@ __libdw_findcu (Dwarf *dbg, Dwarf_Off start, bool > v4_debug_types) >/* Maybe we already know that CU. */ >struct Dwarf_CU fake = { .start = start, .end = 0 }; >struct Dwarf_CU **found = eu_tfind (&fake, tree, findcu_cb); > + struct Dwarf_CU *result = NULL; >if (found != NULL) > return *found; > > - if (start < *next_offset) > -{ > - __libdw_seterrno (DWARF_E_INVALID_DWARF); > - return NULL; > -} > + rwlock_wrlock (dbg->dwarf_lock); > > - /* No. Then read more CUs. */ > - while (1) > -{ > - struct Dwarf_CU *newp = __libdw_intern_next_unit (dbg, v4_debug_types); > - if (newp == NULL) > - return NULL; > - > - /* Is this the one we are looking for? */ > - if (start < *next_offset || start == newp->start) > - return newp; > -} > - /* NOTREACHED */ > + if (start < *next_offset) > +__libdw_seterrno (DWARF_E_INVALID_DWARF); > + else > +{ > + /* No. Then read more CUs. */ > + while (1) > +{ > + struct Dwarf_CU *newp = __libdw_intern_next_unit (dbg, > +v4_debug_types); > + if (newp == NULL) > +{ > + result = NULL; > + break; > +} > + > + /* Is this the one we are looking for? */ > + if (start < *next_offset || start == newp->start) > +{ > + result = newp; > + break; > +} > +} > +} > + > + rwlock_unlock (dbg->dwarf_lock); > + return result; > } > > struct Dwarf_CU * This uses the global Dwarf structure lock, not a per Dwarf_CU lock. Cheers, Mark
Re: [PATCH 3/9 v2] lib: Add eu_tsearch, eu_tfind, eu_tdelete and eu_tdestroy
Hi, On Wed, 2024-07-17 at 18:34 -0400, Aaron Merey wrote: > From: Heather McIntyre > > Add new struct search_tree to hold tree root and lock. Add new eu_t* > functions for ensuring synchronized tree access. > > Replace tsearch, tfind, etc with eu_t* equivalents. > > Move the rwlock_* macros out of eu-config.h and into a new header file > locks.h. This was done so that the rwlock_* macros can be included > in libdwP.h without having to also include the rest of eu-config.h. > > Signed-off-by: Heather S. McIntyre > Signed-off-by: Aaron Merey > Signed-off-by: Mark Wielaard > > v2 changes: > > This patch replaces v1 03/16 and 14/16. In this case I missed a ChangeLog entry which would have helped knowing which changes were deliberate. In general this looks good. Mostly the same "style" comment that local includes should use #include "lock.h", not (we aren't totally consistent here and the <> variant does work). Besides the new search_tree (which includes a lock object) it does seem to introduce various locks that aren't used in the rest of code. It would be better to introduce them in a later patch were those are actually used. > --- > lib/Makefile.am | 5 ++- > lib/eu-config.h | 30 + > lib/eu-search.c | 85 +++ > lib/eu-search.h | 64 ++ > lib/locks.h | 62 + > libdw/cfi.h | 6 +-- > libdw/cie.c | 10 +++-- > libdw/dwarf_begin_elf.c | 7 +-- > libdw/dwarf_end.c | 17 +++ > libdw/dwarf_getcfi.c | 5 ++- > libdw/dwarf_getlocation.c | 24 +- > libdw/dwarf_getmacros.c | 6 +-- > libdw/dwarf_getsrclines.c | 8 ++-- > libdw/fde.c | 6 +-- > libdw/frame-cache.c | 8 ++-- > libdw/libdwP.h| 26 --- > libdw/libdw_find_split_unit.c | 10 ++--- > libdw/libdw_findcu.c | 18 > libdwfl/cu.c | 8 ++-- > libdwfl/dwfl_module.c | 4 +- > libdwfl/libdwflP.h| 3 +- > libelf/elf_begin.c| 2 + > libelf/elf_end.c | 13 +++--- > libelf/elf_getdata_rawchunk.c | 12 ++--- > libelf/libelfP.h | 10 +++-- > 25 files changed, 331 insertions(+), 118 deletions(-) > create mode 100644 lib/eu-search.c > create mode 100644 lib/eu-search.h > create mode 100644 lib/locks.h > > diff --git a/lib/Makefile.am b/lib/Makefile.am > index b3bb929f..e324c18d 100644 > --- a/lib/Makefile.am > +++ b/lib/Makefile.am > @@ -34,10 +34,11 @@ AM_CPPFLAGS += -I$(srcdir)/../libelf > noinst_LIBRARIES = libeu.a > > libeu_a_SOURCES = xasprintf.c xstrdup.c xstrndup.c xmalloc.c next_prime.c \ > - crc32.c crc32_file.c \ > + crc32.c crc32_file.c eu-search.c \ > color.c error.c printversion.c > > noinst_HEADERS = fixedsizehash.h libeu.h system.h dynamicsizehash.h list.h \ >eu-config.h color.h printversion.h bpf.h \ > - atomics.h stdatomic-fbsd.h dynamicsizehash_concurrent.h > + atomics.h stdatomic-fbsd.h dynamicsizehash_concurrent.h \ > + eu-search.h locks.h > EXTRA_DIST = dynamicsizehash.c dynamicsizehash_concurrent.c OK. > diff --git a/lib/eu-config.h b/lib/eu-config.h > index feb079db..a38d75da 100644 > --- a/lib/eu-config.h > +++ b/lib/eu-config.h > @@ -29,35 +29,7 @@ > #ifndef EU_CONFIG_H > #define EU_CONFIG_H 1 > > -#ifdef USE_LOCKS > -# include > -# include > -# define rwlock_define(class,name) class pthread_rwlock_t name > -# define once_define(class,name) class pthread_once_t name = > PTHREAD_ONCE_INIT > -# define RWLOCK_CALL(call) \ > - ({ int _err = pthread_rwlock_ ## call; assert_perror (_err); }) > -# define ONCE_CALL(call) \ > - ({ int _err = pthread_ ## call; assert_perror (_err); }) > -# define rwlock_init(lock) RWLOCK_CALL (init (&lock, NULL)) > -# define rwlock_fini(lock) RWLOCK_CALL (destroy (&lock)) > -# define rwlock_rdlock(lock) RWLOCK_CALL (rdlock (&lock)) > -# define rwlock_wrlock(lock) RWLOCK_CALL (wrlock (&lock)) > -# define rwlock_unlock(lock) RWLOCK_CALL (unlock (&lock)) > -# define once(once_control, init_routine) \ > - ONCE_CALL (once (&once_control, init_routine)) > -#else > -/* Eventually we will allow multi-threaded applications to use the > - libraries. Therefore we will add the necessary locking although > - the macros used expand to nothing for now. */ > -# define rwlock_define(class,name) class int name > -# define rwlock_init(lock) ((void) (lock)) > -# define rwlock_fini(lock) ((void) (lock)) > -# define rwlock_rdlock(lock) ((void) (lock)) > -# define rwlock_wrlock(lock) ((void) (lock)) > -# define rwlock_unlock(lock) ((void) (lock)) > -# define once_define(class,name) > -# define once(once_control, init_routine)init_routine(
Re: [PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x
Hi - > This is v3 of my patch series optimizing debuginfod for kernel > debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple > of minor bugs and adds test cases. [...] Thanks, LGTM, running through try-buildbots to make sure. - FChE
Re: [PATCH v3 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x
On Fri, Jul 19, 2024 at 01:34:48PM -0400, Frank Ch. Eigler wrote: > Hi - > > > This is v3 of my patch series optimizing debuginfod for kernel > > debuginfo. v1 is here [7], v2 is here [8]. This version fixes a couple > > of minor bugs and adds test cases. [...] > > Thanks, LGTM, running through try-buildbots to make sure. Sorry about the distcheck failures, looks like I forgot to add the new test files to EXTRA_DIST. I'll be sure to run distcheck next time. I'll send v4 shortly. Thanks, Omar
[PATCH v4 0/7] debuginfod: speed up extraction from kernel debuginfo packages by 200x
From: Omar Sandoval This is v4 of my patch series optimizing debuginfod for kernel debuginfo. v1 is here [1], v2 is here [2], v3 is here [3]. The only changes from v3 in this version are fixing a bogus maybe-uninitialized error on the Debian build and adding the new test files to EXTRA_DIST so that make distcheck passes. Thanks, Omar 1: https://sourceware.org/pipermail/elfutils-devel/2024q3/007191.html 2: https://sourceware.org/pipermail/elfutils-devel/2024q3/007208.html 3: https://sourceware.org/pipermail/elfutils-devel/2024q3/007243.html Omar Sandoval (7): debuginfod: fix skipping source file tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit check debuginfod: factor out common code for responding from an archive debugifod: add new table and views for seekable archives debuginfod: optimize extraction from seekable xz archives debuginfod: populate _r_seekable on scan debuginfod: populate _r_seekable on request configure.ac | 5 + debuginfod/Makefile.am| 2 +- debuginfod/debuginfod.cxx | 923 +++--- tests/Makefile.am | 13 +- ...pressme-seekable-xz-dbgsym_1.0-1_amd64.deb | Bin 0 -> 6288 bytes ...compressme-seekable-xz_1.0-1.debian.tar.xz | Bin 0 -> 1440 bytes .../compressme-seekable-xz_1.0-1.dsc | 19 + .../compressme-seekable-xz_1.0-1_amd64.deb| Bin 0 -> 6208 bytes .../compressme-seekable-xz_1.0.orig.tar.xz| Bin 0 -> 7160 bytes .../compressme-seekable-xz-1.0-1.src.rpm | Bin 0 -> 15880 bytes .../compressme-seekable-xz-1.0-1.x86_64.rpm | Bin 0 -> 31873 bytes ...sme-seekable-xz-debuginfo-1.0-1.x86_64.rpm | Bin 0 -> 21917 bytes ...e-seekable-xz-debugsource-1.0-1.x86_64.rpm | Bin 0 -> 7961 bytes tests/run-debuginfod-archive-groom.sh | 2 +- tests/run-debuginfod-extraction.sh| 2 +- tests/run-debuginfod-fd-prefetch-caches.sh| 4 + tests/run-debuginfod-seekable.sh | 186 17 files changed, 1011 insertions(+), 145 deletions(-) create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz-dbgsym_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.debian.tar.xz create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.dsc create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0.orig.tar.xz create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.src.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debuginfo-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debugsource-1.0-1.x86_64.rpm create mode 100755 tests/run-debuginfod-seekable.sh -- 2.45.2
[PATCH v4 1/7] debuginfod: fix skipping source file
From: Omar Sandoval dwarf_extract_source_paths explicitly skips source files that equal "", but dwarf_filesrc may return a path like "dir/". Check for and skip that case, too. In particular, the test debuginfod RPMs have paths like this. However, the test cases didn't catch this because they have a bug, too: they follow symlinks, which results in double-counting every file. Fix that, too. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 3 ++- tests/run-debuginfod-archive-groom.sh | 2 +- tests/run-debuginfod-extraction.sh| 2 +- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 305edde8..92022f3d 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -3446,7 +3446,8 @@ dwarf_extract_source_paths (Elf *elf, set& debug_sourcefiles) if (hat == NULL) continue; - if (string(hat) == "") // gcc intrinsics, don't bother record + if (string(hat) == "" + || string_endswith(hat, "")) // gcc intrinsics, don't bother record continue; string waldo; diff --git a/tests/run-debuginfod-archive-groom.sh b/tests/run-debuginfod-archive-groom.sh index e2c394ef..0131158f 100755 --- a/tests/run-debuginfod-archive-groom.sh +++ b/tests/run-debuginfod-archive-groom.sh @@ -109,7 +109,7 @@ for i in $newrpms; do rpm2cpio ../$i | cpio -ivd; cd ..; done -sourcefiles=$(find -name \*\\.debug \ +sourcefiles=$(find -name \*\\.debug -type f \ | env LD_LIBRARY_PATH=$ldpath xargs \ ${abs_top_builddir}/src/readelf --debug-dump=decodedline \ | grep mtime: | wc --lines) diff --git a/tests/run-debuginfod-extraction.sh b/tests/run-debuginfod-extraction.sh index da6b25cf..f49dc6f6 100755 --- a/tests/run-debuginfod-extraction.sh +++ b/tests/run-debuginfod-extraction.sh @@ -94,7 +94,7 @@ for i in $newrpms; do rpm2cpio ../$i | cpio -ivd; cd ..; done -sourcefiles=$(find -name \*\\.debug \ +sourcefiles=$(find -name \*\\.debug -type f \ | env LD_LIBRARY_PATH=$ldpath xargs \ ${abs_top_builddir}/src/readelf --debug-dump=decodedline \ | grep mtime: | wc --lines) -- 2.45.2
[PATCH v4 2/7] tests/run-debuginfod-fd-prefetch-caches.sh: disable fdcache limit check
From: Omar Sandoval Since commit acd9525e93d7 ("PR31265 - rework debuginfod archive-extract fdcache"), the fdcache limit is only applied when a new file is interned and it has been at least 10 seconds since the limit was last applied. This means that the fdcache can go over the limit temporarily. run-debuginfod-fd-prefetch-caches.sh happens to avoid tripping over this because of lucky sizes of the files used in the test. However, adding new files for an upcoming test exposed this failure. Disable this part of the test for now. Signed-off-by: Omar Sandoval --- tests/run-debuginfod-fd-prefetch-caches.sh | 4 1 file changed, 4 insertions(+) diff --git a/tests/run-debuginfod-fd-prefetch-caches.sh b/tests/run-debuginfod-fd-prefetch-caches.sh index 3db78ade..90730555 100755 --- a/tests/run-debuginfod-fd-prefetch-caches.sh +++ b/tests/run-debuginfod-fd-prefetch-caches.sh @@ -99,6 +99,9 @@ kill $PID1 wait $PID1 PID1=0 +# Since we now only limit the fd cache every 10 seconds, it can temporarily go +# over the limit. That makes this part of the test unreliable. +if false; then # # Test mb limit on fd cache # @@ -148,3 +151,4 @@ kill $PID1 wait $PID1 PID1=0 exit 0 +fi -- 2.45.2
[PATCH v4 4/7] debugifod: add new table and views for seekable archives
From: Omar Sandoval In order to extract a file from a seekable archive, we need to know where in the uncompressed archive the file data starts and its size. Additionally, in order to populate the response headers, we need the file modification time (since we won't be able to get it from the archive metadata). Add a new table, _r_seekable, keyed on the archive file id and entry file id and containing the size, offset, and mtime. It also contains the compression type just in case new seekable formats are supported in the future. In order to search this table when we get a request, we need the file ids available. Add the ids to the _query_d and _query_e views, and rename them to _query_d2 and _query_e2. This schema change is backward compatible and doesn't require reindexing. _query_d2 and _query_e2 can be renamed back the next time BUILDIDS needs to be bumped. Before this change, the database for a single kernel debuginfo RPM (kernel-debuginfo-6.9.6-200.fc40.x86_64.rpm) was about 15MB. This change increases that by about 70kB, only a 0.5% increase. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 24702c23..b3d80090 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -265,25 +265,39 @@ static const char DEBUGINFOD_SQLITE_DDL[] = "foreign key (content) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" "primary key (content, file, mtime)\n" ") " WITHOUT_ROWID ";\n" + "create table if not exists " BUILDIDS "_r_seekable (\n" // seekable rpm contents + "file integer not null,\n" + "content integer not null,\n" + "type text not null,\n" + "size integer not null,\n" + "offset integer not null,\n" + "mtime integer not null,\n" + "foreign key (file) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" + "foreign key (content) references " BUILDIDS "_files(id) on update cascade on delete cascade,\n" + "primary key (file, content)\n" + ") " WITHOUT_ROWID ";\n" // create views to glue together some of the above tables, for webapi D queries - "create view if not exists " BUILDIDS "_query_d as \n" + // NB: _query_d2 and _query_e2 were added to replace _query_d and _query_e + // without updating BUILDIDS. They can be renamed back the next time BUILDIDS + // is updated. + "create view if not exists " BUILDIDS "_query_d2 as \n" "select\n" - "b.hex as buildid, n.mtime, 'F' as sourcetype, f0.name as source0, n.mtime as mtime, null as source1\n" + "b.hex as buildid, 'F' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, null as id1, null as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_f_de n\n" "where b.id = n.buildid and f0.id = n.file and n.debuginfo_p = 1\n" "union all select\n" - "b.hex as buildid, n.mtime, 'R' as sourcetype, f0.name as source0, n.mtime as mtime, f1.name as source1\n" + "b.hex as buildid, 'R' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, n.content as id1, f1.name as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_files_v f1, " BUILDIDS "_r_de n\n" "where b.id = n.buildid and f0.id = n.file and f1.id = n.content and n.debuginfo_p = 1\n" ";" // ... and for E queries - "create view if not exists " BUILDIDS "_query_e as \n" + "create view if not exists " BUILDIDS "_query_e2 as \n" "select\n" - "b.hex as buildid, n.mtime, 'F' as sourcetype, f0.name as source0, n.mtime as mtime, null as source1\n" + "b.hex as buildid, 'F' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, null as id1, null as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_f_de n\n" "where b.id = n.buildid and f0.id = n.file and n.executable_p = 1\n" "union all select\n" - "b.hex as buildid, n.mtime, 'R' as sourcetype, f0.name as source0, n.mtime as mtime, f1.name as source1\n" + "b.hex as buildid, 'R' as sourcetype, n.file as id0, f0.name as source0, n.mtime as mtime, n.content as id1, f1.name as source1\n" "from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f0, " BUILDIDS "_files_v f1, " BUILDIDS "_r_de n\n" "where b.id = n.buildid and f0.id = n.file and f1.id = n.content and n.executable_p = 1\n" ";" @@ -2557,7 +2571,7 @@ handle_buildid (MHD_Connection* conn, if (atype_code == "D") { pp = new sqlite_ps (thisdb, "mhd-query-d", - "select mtime, sourcetype, source0, source1 from " BUILDIDS "_query_d where buildid = ? " + "select mtime, sourcetype, sour
[PATCH v4 5/7] debuginfod: optimize extraction from seekable xz archives
From: Omar Sandoval The kernel debuginfo packages on Fedora, Debian, and Ubuntu, and many of their downstreams, are all compressed with xz in multi-threaded mode, which allows random access. We can use this to bypass the full archive extraction and dramatically speed up kernel debuginfo requests (from ~50 seconds in the worst case to < 0.25 seconds). This works because multi-threaded xz compression splits up the stream into many independently compressed blocks. The stream ends with an index of blocks. So, to seek to an offset, we find the block containing that offset in the index and then decompress and throw away data until we reach the offset within the block. We can then decompress the desired amount of data, possibly from subsequent blocks. There's no high-level API in liblzma to do this, but we can do it by stitching together a few low-level APIs. We need to pass down the file ids then look up the size, uncompressed offset, and mtime in the _r_seekable table. Note that this table is not yet populated, so this commit has no functional change on its own. Signed-off-by: Omar Sandoval --- configure.ac | 5 + debuginfod/Makefile.am| 2 +- debuginfod/debuginfod.cxx | 456 +- 3 files changed, 457 insertions(+), 6 deletions(-) diff --git a/configure.ac b/configure.ac index 24e68d94..9c5f7e51 100644 --- a/configure.ac +++ b/configure.ac @@ -441,8 +441,13 @@ eu_ZIPLIB(bzlib,BZLIB,bz2,BZ2_bzdopen,bzip2) # We need this since bzip2 doesn't have a pkgconfig file. BZ2_LIB="$LIBS" AC_SUBST([BZ2_LIB]) +save_LIBS="$LIBS" +LIBS= eu_ZIPLIB(lzma,LZMA,lzma,lzma_auto_decoder,[LZMA (xz)]) +lzma_LIBS="$LIBS" +LIBS="$lzma_LIBS $save_LIBS" AS_IF([test "x$with_lzma" = xyes], [LIBLZMA="liblzma"], [LIBLZMA=""]) +AC_SUBST([lzma_LIBS]) AC_SUBST([LIBLZMA]) eu_ZIPLIB(zstd,ZSTD,zstd,ZSTD_decompress,[ZSTD (zst)]) AS_IF([test "x$with_zstd" = xyes], [LIBZSTD="libzstd"], [LIBLZSTD=""]) diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am index b74e3673..e199dc0c 100644 --- a/debuginfod/Makefile.am +++ b/debuginfod/Makefile.am @@ -70,7 +70,7 @@ bin_PROGRAMS += debuginfod-find endif debuginfod_SOURCES = debuginfod.cxx -debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl +debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) $(lzma_LIBS) -lpthread -ldl debuginfod_find_SOURCES = debuginfod-find.c debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(jsonc_LIBS) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index b3d80090..cf7f48ab 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -63,6 +63,10 @@ extern "C" { #undef __attribute__ /* glibc bug - rhbz 1763325 */ #endif +#ifdef USE_LZMA +#include +#endif + #include #include #include @@ -1961,6 +1965,385 @@ handle_buildid_f_match (bool internal_req_t, return r; } + +#ifdef USE_LZMA +struct lzma_exception: public reportable_exception +{ + lzma_exception(int rc, const string& msg): +// liblzma doesn't have a lzma_ret -> string conversion function, so just +// report the value. +reportable_exception(string ("lzma error: ") + msg + ": error " + to_string(rc)) { + inc_metric("error_count","lzma",to_string(rc)); +} +}; + +// Neither RPM nor deb files support seeking to a specific file in the package. +// Instead, to extract a specific file, we normally need to read the archive +// sequentially until we find the file. This is very slow for files at the end +// of a large package with lots of files, like kernel debuginfo. +// +// However, if the compression format used in the archive supports seeking, we +// can accelerate this. As of July 2024, xz is the only widely-used format that +// supports seeking, and usually only in multi-threaded mode. Luckily, the +// kernel-debuginfo package in Fedora and its downstreams, and the +// linux-image-*-dbg package in Debian and its downstreams, all happen to use +// this. +// +// The xz format [1] ends with an index of independently compressed blocks in +// the stream. In RPM and deb files, the xz stream is the last thing in the +// file, so we assume that the xz Stream Footer is at the end of the package +// file and do everything relative to that. For each file in the archive, we +// remember the size and offset of the file data in the uncompressed xz stream, +// then we use the index to seek to that offset when we need that file. +// +// 1: https://xz.tukaani.org/format/xz-file-format.txt + +// Read the Index at the end of an xz file. +static lzma_index* +read_xz_index (int fd) +{ + off_t footer_pos = -LZMA_STREAM_HEADER_SIZE; + if (lseek (fd, footer
[PATCH v4 3/7] debuginfod: factor out common code for responding from an archive
From: Omar Sandoval handle_buildid_r_match has two very similar branches where it optionally extracts a section and then creates a microhttpd response. In preparation for adding a third one, factor it out into a function. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 213 +- 1 file changed, 96 insertions(+), 117 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 92022f3d..24702c23 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -1965,6 +1965,81 @@ string canonicalized_archive_entry_pathname(struct archive_entry *e) } +// NB: takes ownership of, and may reassign, fd. +static struct MHD_Response* +create_buildid_r_response (int64_t b_mtime0, + const string& b_source0, + const string& b_source1, + const string& section, + const string& ima_sig, + const char* tmppath, + int& fd, + off_t size, + time_t mtime, + const string& metric, + const struct timespec& extract_begin) +{ + if (tmppath != NULL) +{ + struct timespec extract_end; + clock_gettime (CLOCK_MONOTONIC, &extract_end); + double extract_time = (extract_end.tv_sec - extract_begin.tv_sec) ++ (extract_end.tv_nsec - extract_begin.tv_nsec)/1.e9; + fdcache.intern(b_source0, b_source1, tmppath, size, true, extract_time); +} + + if (!section.empty ()) +{ + int scn_fd = extract_section (fd, b_mtime0, +b_source0 + ":" + b_source1, +section, extract_begin); + close (fd); + if (scn_fd >= 0) +fd = scn_fd; + else +{ + if (verbose) +obatched (clog) << "cannot find section " << section +<< " for archive " << b_source0 +<< " file " << b_source1 << endl; + return 0; +} + + struct stat fs; + if (fstat (fd, &fs) < 0) +{ + close (fd); + throw libc_exception (errno, +string ("fstat ") + b_source0 + string (" ") + section); +} + size = fs.st_size; +} + + struct MHD_Response* r = MHD_create_response_from_fd (size, fd); + if (r == 0) +{ + if (verbose) +obatched(clog) << "cannot create fd-response for " << b_source0 << endl; + close(fd); +} + else +{ + inc_metric ("http_responses_total","result",metric); + add_mhd_response_header (r, "Content-Type", "application/octet-stream"); + add_mhd_response_header (r, "X-DEBUGINFOD-SIZE", to_string(size).c_str()); + add_mhd_response_header (r, "X-DEBUGINFOD-ARCHIVE", b_source0.c_str()); + add_mhd_response_header (r, "X-DEBUGINFOD-FILE", b_source1.c_str()); + if(!ima_sig.empty()) add_mhd_response_header(r, "X-DEBUGINFOD-IMASIGNATURE", ima_sig.c_str()); + add_mhd_last_modified (r, mtime); + if (verbose > 1) +obatched(clog) << "serving " << metric << " " << b_source0 + << " file " << b_source1 + << " section=" << section + << " IMA signature=" << ima_sig << endl; + /* libmicrohttpd will close fd. */ +} + return r; +} static struct MHD_Response* handle_buildid_r_match (bool internal_req_p, @@ -2142,57 +2217,15 @@ handle_buildid_r_match (bool internal_req_p, break; // branch out of if "loop", to try new libarchive fetch attempt } - if (!section.empty ()) - { - int scn_fd = extract_section (fd, fs.st_mtime, - b_source0 + ":" + b_source1, - section, extract_begin); - close (fd); - if (scn_fd >= 0) - fd = scn_fd; - else - { - if (verbose) - obatched (clog) << "cannot find section " << section - << " for archive " << b_source0 - << " file " << b_source1 << endl; - return 0; - } - - rc = fstat(fd, &fs); - if (rc < 0) - { - close (fd); - throw libc_exception (errno, - string ("fstat archive ") + b_source0 + string (" file ") + b_source1 - + string (" section ") + section); - } - } - - struct MHD_Response* r = MHD_create_response_from_fd (fs.st_size, fd); + struct MHD_Response* r = create_buildid_r_response (b_mtime, b_source0, + b_source1, section, + ima_sig, NULL, fd, + f
[PATCH v4 7/7] debuginfod: populate _r_seekable on request
From: Omar Sandoval Since the schema change adding _r_seekable was done in a backward compatible way, seekable archives that were previously scanned will not be in _r_seekable. Whenever an archive is going to be extracted to satisfy a request, check if it is seekable. If so, populate _r_seekable while extracting it so that future requests use the optimized path. The next time that BUILDIDS is bumped, all archives will be checked at scan time. At that point, checking again will be unnecessary and this commit (including the test case modification) can be reverted. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx| 76 +--- tests/run-debuginfod-seekable.sh | 45 +++ 2 files changed, 115 insertions(+), 6 deletions(-) diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index 5fe2db0c..fb7873ae 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -2740,6 +2740,7 @@ handle_buildid_r_match (bool internal_req_p, } // no match ... look for a seekable entry + bool populate_seekable = ! passive_p; unique_ptr pp (new sqlite_ps (internal_req_p ? db : dbq, "rpm-seekable-query", "select type, size, offset, mtime from " BUILDIDS "_r_seekable " @@ -2749,6 +2750,9 @@ handle_buildid_r_match (bool internal_req_p, { if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step"); + // if we found a match in _r_seekable but we fail to extract it, don't + // bother populating it again + populate_seekable = false; const char* seekable_type = (const char*) sqlite3_column_text (*pp, 0); if (seekable_type != NULL && strcmp (seekable_type, "xz") == 0) { @@ -2840,16 +2844,39 @@ handle_buildid_r_match (bool internal_req_p, throw archive_exception(a, "cannot open archive from pipe"); } - // archive traversal is in three stages, no, four stages: - // 1) skip entries whose names do not match the requested one - // 2) extract the matching entry name (set r = result) - // 3) extract some number of prefetched entries (just into fdcache) - // 4) abort any further processing + // If the archive was scanned in a version without _r_seekable, then we may + // need to populate _r_seekable now. This can be removed the next time + // BUILDIDS is updated. + if (populate_seekable) +{ + populate_seekable = is_seekable_archive (b_source0, a); + if (populate_seekable) +{ + // NB: the names are already interned + pp.reset(new sqlite_ps (db, "rpm-seekable-insert2", + "insert or ignore into " BUILDIDS "_r_seekable (file, content, type, size, offset, mtime) " + "values (?, " + "(select id from " BUILDIDS "_files " + "where dirname = (select id from " BUILDIDS "_fileparts where name = ?) " + "and basename = (select id from " BUILDIDS "_fileparts where name = ?) " + "), 'xz', ?, ?, ?)")); +} +} + + // archive traversal is in five stages: + // 1) before we find a matching entry, insert it into _r_seekable if needed or + //skip it otherwise + // 2) extract the matching entry (set r = result). Also insert it into + //_r_seekable if needed + // 3) extract some number of prefetched entries (just into fdcache). Also + //insert them into _r_seekable if needed + // 4) if needed, insert all of the remaining entries into _r_seekable + // 5) abort any further processing struct MHD_Response* r = 0; // will set in stage 2 unsigned prefetch_count = internal_req_p ? 0 : fdcache_prefetch;// will decrement in stage 3 - while(r == 0 || prefetch_count > 0) // stage 1, 2, or 3 + while(r == 0 || prefetch_count > 0 || populate_seekable) // stage 1-4 { if (interrupted) break; @@ -2863,6 +2890,43 @@ handle_buildid_r_match (bool internal_req_p, continue; string fn = canonicalized_archive_entry_pathname (e); + + if (populate_seekable) +{ + string dn, bn; + size_t slash = fn.rfind('/'); + if (slash == std::string::npos) { +dn = ""; +bn = fn; + } else { +dn = fn.substr(0, slash); +bn = fn.substr(slash + 1); + } + + int64_t seekable_size = archive_entry_size (e); + int64_t seekable_offset = archive_filter_bytes (a, 0); + time_t seekable_mtime = archive_entry_mtime (e); + + pp->reset(); + pp->bind(1, b_id0); + pp->bind(2, dn); + pp->bind(3, bn); + pp->bind(4, seekable_size); + pp->bind(5, seekable_offset); + pp->bind(6, seekable_mtime); + rc = pp->step(); +
[PATCH v4 6/7] debuginfod: populate _r_seekable on scan
From: Omar Sandoval Whenever a new archive is scanned, check if it is seekable with a little liblzma magic, and populate _r_seekable if so. With this, newly scanned seekable archives will used the optimized extraction path added in the previous commit. Also add a test case using some artificial packages. Signed-off-by: Omar Sandoval --- debuginfod/debuginfod.cxx | 145 +- tests/Makefile.am | 13 +- ...pressme-seekable-xz-dbgsym_1.0-1_amd64.deb | Bin 0 -> 6288 bytes ...compressme-seekable-xz_1.0-1.debian.tar.xz | Bin 0 -> 1440 bytes .../compressme-seekable-xz_1.0-1.dsc | 19 +++ .../compressme-seekable-xz_1.0-1_amd64.deb| Bin 0 -> 6208 bytes .../compressme-seekable-xz_1.0.orig.tar.xz| Bin 0 -> 7160 bytes .../compressme-seekable-xz-1.0-1.src.rpm | Bin 0 -> 15880 bytes .../compressme-seekable-xz-1.0-1.x86_64.rpm | Bin 0 -> 31873 bytes ...sme-seekable-xz-debuginfo-1.0-1.x86_64.rpm | Bin 0 -> 21917 bytes ...e-seekable-xz-debugsource-1.0-1.x86_64.rpm | Bin 0 -> 7961 bytes tests/run-debuginfod-seekable.sh | 141 + 12 files changed, 313 insertions(+), 5 deletions(-) create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz-dbgsym_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.debian.tar.xz create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1.dsc create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0-1_amd64.deb create mode 100644 tests/debuginfod-debs/seekable-xz/compressme-seekable-xz_1.0.orig.tar.xz create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.src.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debuginfo-1.0-1.x86_64.rpm create mode 100644 tests/debuginfod-rpms/seekable-xz/compressme-seekable-xz-debugsource-1.0-1.x86_64.rpm create mode 100755 tests/run-debuginfod-seekable.sh diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx index cf7f48ab..5fe2db0c 100644 --- a/debuginfod/debuginfod.cxx +++ b/debuginfod/debuginfod.cxx @@ -1998,6 +1998,109 @@ struct lzma_exception: public reportable_exception // // 1: https://xz.tukaani.org/format/xz-file-format.txt +// Return whether an archive supports seeking. +static bool +is_seekable_archive (const string& rps, struct archive* a) +{ + // Only xz supports seeking. + if (archive_filter_code (a, 0) != ARCHIVE_FILTER_XZ) +return false; + + int fd = open (rps.c_str(), O_RDONLY); + if (fd < 0) +return false; + defer_dtor fd_closer (fd, close); + + // Seek to the xz Stream Footer. We assume that it's the last thing in the + // file, which is true for RPM and deb files. + off_t footer_pos = -LZMA_STREAM_HEADER_SIZE; + if (lseek (fd, footer_pos, SEEK_END) == -1) +return false; + + // Decode the Stream Footer. + uint8_t footer[LZMA_STREAM_HEADER_SIZE]; + size_t footer_read = 0; + while (footer_read < sizeof (footer)) +{ + ssize_t bytes_read = read (fd, footer + footer_read, + sizeof (footer) - footer_read); + if (bytes_read < 0) +{ + if (errno == EINTR) +continue; + return false; +} + if (bytes_read == 0) +return false; + footer_read += bytes_read; +} + + lzma_stream_flags stream_flags; + lzma_ret ret = lzma_stream_footer_decode (&stream_flags, footer); + if (ret != LZMA_OK) +return false; + + // Seek to the xz Index. + if (lseek (fd, footer_pos - stream_flags.backward_size, SEEK_END) == -1) +return false; + + // Decode the Number of Records in the Index. liblzma doesn't have an API for + // this if you don't want to decode the whole Index, so we have to do it + // ourselves. + // + // We need 1 byte for the Index Indicator plus 1-9 bytes for the + // variable-length integer Number of Records. + uint8_t index[10]; + size_t index_read = 0; + while (index_read == 0) { + ssize_t bytes_read = read (fd, index, sizeof (index)); + if (bytes_read < 0) +{ + if (errno == EINTR) +continue; + return false; +} + if (bytes_read == 0) +return false; + index_read += bytes_read; + } + // The Index Indicator must be 0. + if (index[0] != 0) +return false; + + lzma_vli num_records; + size_t pos = 0; + size_t in_pos = 1; + while (true) +{ + if (in_pos >= index_read) +{ + ssize_t bytes_read = read (fd, index, sizeof (index)); + if (bytes_read < 0) + { +if (errno == EINTR) + continue; +return false; + } + if (bytes_read == 0) +return false; + index_read = bytes_read; + in_pos = 0; +