Re: PR29472: debuginfod metadata query
Hi Frank, On Wed, 2023-04-12 at 16:31 -0400, Frank Ch. Eigler via Elfutils-devel wrote: > At long last, for your review, an updated & merged patch for $SUBJECT. > "git diff -s" against git/master follows, code also on > users/fche/try-pr29472e. Apologies for the delay in review. This is mainly because I am a json noob, and don't fully understand what interface guarantees we are/can make using it. > Author: Frank Ch. Eigler > Date: Tue Apr 11 23:35:25 2023 -0400 > > PR29472: debuginfod: add metadata query webapi, C api, client > > This patch extends the debuginfod API with a "metadata query" > operation. It allows clients to request an enumeration of file names > known to debuginfod servers, returning a JSON response including the > matching buildids. This lets clients later download debuginfo for a > range of versions of the same named binaries, in case they need to to > prospective work (like systemtap-based live-patching). It also lets > server operators implement prefetch triggering operations for popular > but slow debuginfo slivers like kernel vdso.debug files on fedora. > > Implementation requires a modern enough json-c library. The configure check looks for json_object_from_fd. So that is json-c- 0.13 or higher (which has been out for 5 year). They seem to use symbol versioning (but also bump SO_NAME from time to time?). > % debuginfod-find metadata file /bin/ls > % debuginfod-find metadata glob "/usr/local/bin/c*" Would be nice to have some example output here. If only to explain what the used can expect. It gives the following for me: /home/mark/.cache/debuginfod_client/metadata/key=file&value=%2Fbin%2Fls /home/mark/.cache/debuginfod_client/metadata/key=glob&value=%2Fusr%2Flo cal%2Fbin%2Fc%2A Probably because I don't have a DEBUGINFOD_URLS configured that understands the new requests, but it is a little confusing imho. > Documentation and testing are included. > > Signed-off-by: Ryan Goldberg > Signed-off-by: Frank Ch. Eigler > > diff --git a/ChangeLog b/ChangeLog > index 10c23002279e..c35a19dd51c4 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,9 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > > + > + PR29472: debuginfod metadata query > + * NEWS: Mention this. > + * configure.ac: Look for json-c library. > + > 2023-03-03 Mark Wielaard > > * NEWS: Add ELFCOMPRESS_ZSTD support for libelf and elfcompress. > diff --git a/NEWS b/NEWS > index 16e37eca5731..1bf38a69f64c 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,3 +1,7 @@ > +Version 0.190 > + > +debuginfod: New API for metadata queries: file name -> buildid. Clearly this calls for the new "code name" to be "meta" :) > Version 0.189 "Don't deflate!" > > configure: eu-nm, eu-addr2line and eu-stack can provide demangled symbols > diff --git a/configure.ac b/configure.ac > index 4b67c84425fa..b319a0119180 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -665,6 +665,12 @@ case "$ac_cv_search__obstack_free" in > esac > AC_SUBST([obstack_LIBS]) > > +dnl formerly checked for json_object_array_add, but debuginfod needs a newer > function > +AC_CHECK_LIB(json-c, json_object_from_fd, [ > + AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine]) > + AC_SUBST(jsonc_LIBS, '-ljson-c') > +]) > + > dnl The directories with content. So this is an optional feature in all configurations? Maybe also include it in the summary at the end under OTHER FEATURES? > dnl Documentation. > diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog > index f13c28d5c6f7..ac3fd6aa862f 100644 > --- a/debuginfod/ChangeLog > +++ b/debuginfod/ChangeLog > @@ -1,3 +1,26 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > > + > + PR29472: debuginfod metadata query > + * Makefile.am: Add json-c usage. > + * debuginfod-client.c (debuginfod_find_metadata): New function. > + (handle_data): New fields to hold metadata being received. > + (debuginfod_clean_cache): Clean metadata too. > + (header_callback): Simplify to realloc only. > + (metadata_callback): New function. > + (init_server_urls, init_handle, perform_queries, make_cache_path): > + New refactored functions. > + (debuginfod_query_server_by_buildid): Renamed, refactored. Update > + callers. Still going through these refactors. If you have a high level description why/what was refactored that would be helpful. > + * debuginfod-find.c (main): Handle metadata queries. > + * debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two. > + (metadata_maxtime_s, parse_opt): New parameter for load control. > + (add_client_federation_headers): New refactored function. > + (handle_metadata): New function. > + (handler_cb): Call it for /metadata URL. Trace it. > + (groom): Tweak sqlite_ps object lifetimes. > + * debuginfod.h.in (debuginfod_find_metadata): New decl. > + * libdebuginfod.map: Export it under
Re: PR29472: debuginfod metadata query
Hi Frank, On Wed, Apr 12, 2023 at 04:31:49PM -0400, Frank Ch. Eigler via Elfutils-devel wrote: > index f13c28d5c6f7..ac3fd6aa862f 100644 > --- a/debuginfod/ChangeLog > +++ b/debuginfod/ChangeLog > @@ -1,3 +1,26 @@ > +2023-04-12 Ryan Golberg , Frank Ch. Eigler > > + > + PR29472: debuginfod metadata query > + * Makefile.am: Add json-c usage. > + * debuginfod-client.c (debuginfod_find_metadata): New function. > + (handle_data): New fields to hold metadata being received. > + (debuginfod_clean_cache): Clean metadata too. > + (header_callback): Simplify to realloc only. > + (metadata_callback): New function. > + (init_server_urls, init_handle, perform_queries, make_cache_path): > + New refactored functions. > + (debuginfod_query_server_by_buildid): Renamed, refactored. Update > + callers. > + * debuginfod-find.c (main): Handle metadata queries. > + * debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two. > + (metadata_maxtime_s, parse_opt): New parameter for load control. > + (add_client_federation_headers): New refactored function. > + (handle_metadata): New function. > + (handler_cb): Call it for /metadata URL. Trace it. > + (groom): Tweak sqlite_ps object lifetimes. > + * debuginfod.h.in (debuginfod_find_metadata): New decl. > + * libdebuginfod.map: Export it under ELFUTILS_0.190. > + > [...] > +/* > + * This function initializes a CURL handle. It takes optional callbacks for > the write > + * function and the header function, which if defined will use userdata of > type struct handle_data*. > + * Specifically the data[i] within an array of struct handle_data's. > + * Returns 0 on success and -Posix error on faliure. > + */ > +int > +init_handle(debuginfod_client *client, > + size_t (*w_callback)(char *buffer,size_t size,size_t nitems,void > *userdata), > + size_t (*h_callback)(char *buffer,size_t size,size_t nitems,void > *userdata), > + struct handle_data *data, int i, long timeout, > + int vfd) > +{ > + data->handle = curl_easy_init(); > + if (data->handle == NULL) > + { > +return -ENETUNREACH; > + } > + > + if (vfd >= 0) > +dprintf (vfd, "url %d %s\n", i, data->url); > + > + /* Only allow http:// + https:// + file:// so we aren't being > +redirected to some unsupported protocol. */ > +#if CURL_AT_LEAST_VERSION(7, 85, 0) > + curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS_STR, > "http,https,file"); > +#else > + curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS, > +(CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE)); > +#endif > + curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url); > + if (vfd >= 0) > +curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER, > + data->errbuf); > + if(w_callback) { > +curl_easy_setopt_ck(data->handle, > + CURLOPT_WRITEFUNCTION, w_callback); > +curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data); > + } > + if (timeout > 0) > + { > +/* Make sure there is at least some progress, > + try to get at least 100K per timeout seconds. */ > +curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME, > +timeout); > +curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT, > +100 * 1024L); > + } > + curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1); > + curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1); > + curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1); > + curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1); > + if(h_callback){ > +curl_easy_setopt_ck(data->handle, > + CURLOPT_HEADERFUNCTION, h_callback); > +curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data); > + } > + #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */ > + curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1); > + #else > + /* On old curl; no big deal, canonicalization here is almost the > + same, except perhaps for ? # type decorations at the tail. */ > + #endif > + curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1); > + curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, ""); > + curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers); > + > + return 0; > +} OK, extracted from debuginfod_query_server with parameterized header, writer callback functions. Also explains why curl_easy_setopt_ck can just return now, cleanup is done in the caller on failure. > +/* > + * This function busy-waits on one or more curl queries to complete. This can > + * be controled via only_one, which, if true, will find the first winner and > exit > + * once found. If positive maxtime and maxsize dictate the maximum allowed > wait times > + * and download sizes respectivly. Returns 0 on success and -Posix error on > faliure. > + */ > +int > +perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data > *data, debuginfod_client *c, > + int num_urls, long maxtime, lon