Re: PR29472: debuginfod metadata query

2023-07-04 Thread Mark Wielaard
Hi Frank,

On Wed, 2023-04-12 at 16:31 -0400, Frank Ch. Eigler via Elfutils-devel
wrote:
> At long last, for your review, an updated & merged patch for $SUBJECT.
> "git diff -s" against git/master follows, code also on
> users/fche/try-pr29472e.

Apologies for the delay in review. This is mainly because I am a json
noob, and don't fully understand what interface guarantees we are/can
make using it.

> Author: Frank Ch. Eigler 
> Date:   Tue Apr 11 23:35:25 2023 -0400
> 
> PR29472: debuginfod: add metadata query webapi, C api, client
> 
> This patch extends the debuginfod API with a "metadata query"
> operation.  It allows clients to request an enumeration of file names
> known to debuginfod servers, returning a JSON response including the
> matching buildids.  This lets clients later download debuginfo for a
> range of versions of the same named binaries, in case they need to to
> prospective work (like systemtap-based live-patching).  It also lets
> server operators implement prefetch triggering operations for popular
> but slow debuginfo slivers like kernel vdso.debug files on fedora.
> 
> Implementation requires a modern enough json-c library.

The configure check looks for json_object_from_fd. So that is json-c-
0.13 or higher (which has been out for 5 year). They seem to use symbol
versioning (but also bump SO_NAME from time to time?).

> % debuginfod-find metadata file /bin/ls
> % debuginfod-find metadata glob "/usr/local/bin/c*"

Would be nice to have some example output here. If only to explain what
the used can expect. It gives the following for me:

/home/mark/.cache/debuginfod_client/metadata/key=file&value=%2Fbin%2Fls
/home/mark/.cache/debuginfod_client/metadata/key=glob&value=%2Fusr%2Flo
cal%2Fbin%2Fc%2A

Probably because I don't have a DEBUGINFOD_URLS configured that
understands the new requests, but it is a little confusing imho.

> Documentation and testing are included.
> 
> Signed-off-by: Ryan Goldberg 
> Signed-off-by: Frank Ch. Eigler 
> 
> diff --git a/ChangeLog b/ChangeLog
> index 10c23002279e..c35a19dd51c4 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,9 @@
> +2023-04-12  Ryan Golberg , Frank Ch. Eigler 
> 
> +
> + PR29472: debuginfod metadata query
> + * NEWS: Mention this.
> + * configure.ac: Look for json-c library.
> +
>  2023-03-03  Mark Wielaard  
>  
>   * NEWS: Add ELFCOMPRESS_ZSTD support for libelf and elfcompress.
> diff --git a/NEWS b/NEWS
> index 16e37eca5731..1bf38a69f64c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,7 @@
> +Version 0.190
> +
> +debuginfod: New API for metadata queries: file name -> buildid.

Clearly this calls for the new "code name" to be "meta" :)

>  Version 0.189 "Don't deflate!"
>  
>  configure: eu-nm, eu-addr2line and eu-stack can provide demangled symbols
> diff --git a/configure.ac b/configure.ac
> index 4b67c84425fa..b319a0119180 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -665,6 +665,12 @@ case "$ac_cv_search__obstack_free" in
>  esac
>  AC_SUBST([obstack_LIBS])
>  
> +dnl formerly checked for json_object_array_add, but debuginfod needs a newer 
> function
> +AC_CHECK_LIB(json-c, json_object_from_fd, [
> +  AC_DEFINE([HAVE_JSON_C], [1], [Define if json-c is on the machine])
> +  AC_SUBST(jsonc_LIBS, '-ljson-c')
> +])
> +
>  dnl The directories with content.

So this is an optional feature in all configurations?
Maybe also include it in the summary at the end under OTHER FEATURES?

>  dnl Documentation.
> diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
> index f13c28d5c6f7..ac3fd6aa862f 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -1,3 +1,26 @@
> +2023-04-12  Ryan Golberg , Frank Ch. Eigler 
> 
> +
> + PR29472: debuginfod metadata query
> + * Makefile.am: Add json-c usage.
> + * debuginfod-client.c (debuginfod_find_metadata): New function.
> + (handle_data): New fields to hold metadata being received.
> + (debuginfod_clean_cache): Clean metadata too.
> + (header_callback): Simplify to realloc only.
> + (metadata_callback): New function.
> + (init_server_urls, init_handle, perform_queries, make_cache_path):
> + New refactored functions.
> + (debuginfod_query_server_by_buildid): Renamed, refactored.  Update
> + callers.

Still going through these refactors. If you have a high level
description why/what was refactored that would be helpful.

> + * debuginfod-find.c (main): Handle metadata queries.
> + * debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two.
> + (metadata_maxtime_s, parse_opt): New parameter for load control.
> + (add_client_federation_headers): New refactored function.
> + (handle_metadata): New function.
> + (handler_cb): Call it for /metadata URL.  Trace it.
> + (groom): Tweak sqlite_ps object lifetimes.
> + * debuginfod.h.in (debuginfod_find_metadata): New decl.
> + * libdebuginfod.map: Export it under 

Re: PR29472: debuginfod metadata query

2023-07-04 Thread Mark Wielaard
Hi Frank,

On Wed, Apr 12, 2023 at 04:31:49PM -0400, Frank Ch. Eigler via Elfutils-devel 
wrote:

> index f13c28d5c6f7..ac3fd6aa862f 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -1,3 +1,26 @@
> +2023-04-12  Ryan Golberg , Frank Ch. Eigler 
> 
> +
> + PR29472: debuginfod metadata query
> + * Makefile.am: Add json-c usage.
> + * debuginfod-client.c (debuginfod_find_metadata): New function.
> + (handle_data): New fields to hold metadata being received.
> + (debuginfod_clean_cache): Clean metadata too.
> + (header_callback): Simplify to realloc only.
> + (metadata_callback): New function.
> + (init_server_urls, init_handle, perform_queries, make_cache_path):
> + New refactored functions.
> + (debuginfod_query_server_by_buildid): Renamed, refactored.  Update
> + callers.
> + * debuginfod-find.c (main): Handle metadata queries.
> + * debuginfod.cxx (DEBUGINFOD_SQLITE_DDL): Add an index or two.
> + (metadata_maxtime_s, parse_opt): New parameter for load control.
> + (add_client_federation_headers): New refactored function.
> + (handle_metadata): New function.
> + (handler_cb): Call it for /metadata URL.  Trace it.
> + (groom): Tweak sqlite_ps object lifetimes.
> + * debuginfod.h.in (debuginfod_find_metadata): New decl.
> + * libdebuginfod.map: Export it under ELFUTILS_0.190.
> +
> [...]
> +/*
> + * This function initializes a CURL handle. It takes optional callbacks for 
> the write
> + * function and the header function, which if defined will use userdata of 
> type struct handle_data*.
> + * Specifically the data[i] within an array of struct handle_data's.
> + * Returns 0 on success and -Posix error on faliure.
> + */
> +int
> +init_handle(debuginfod_client *client,
> +  size_t (*w_callback)(char *buffer,size_t size,size_t nitems,void 
> *userdata),
> +  size_t (*h_callback)(char *buffer,size_t size,size_t nitems,void 
> *userdata),
> +  struct handle_data *data, int i, long timeout,
> +  int vfd)
> +{
> +  data->handle = curl_easy_init();
> +  if (data->handle == NULL)
> +  {
> +return -ENETUNREACH;
> +  }
> +
> +  if (vfd >= 0)
> +dprintf (vfd, "url %d %s\n", i, data->url);
> +
> +  /* Only allow http:// + https:// + file:// so we aren't being
> +redirected to some unsupported protocol.  */
> +#if CURL_AT_LEAST_VERSION(7, 85, 0)
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS_STR, 
> "http,https,file");
> +#else
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS,
> +(CURLPROTO_HTTP | CURLPROTO_HTTPS | CURLPROTO_FILE));
> +#endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url);
> +  if (vfd >= 0)
> +curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER,
> +  data->errbuf);
> +  if(w_callback) {
> +curl_easy_setopt_ck(data->handle,
> +  CURLOPT_WRITEFUNCTION, w_callback);
> +curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data);
> +  }
> +  if (timeout > 0)
> +  {
> +/* Make sure there is at least some progress,
> +  try to get at least 100K per timeout seconds.  */
> +curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME,
> +timeout);
> +curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT,
> +100 * 1024L);
> +  }
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1);
> +  if(h_callback){
> +curl_easy_setopt_ck(data->handle,
> +  CURLOPT_HEADERFUNCTION, h_callback);
> +curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data);
> +  }
> +  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1);
> +  #else
> +  /* On old curl; no big deal, canonicalization here is almost the
> +  same, except perhaps for ? # type decorations at the tail. */
> +  #endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, "");
> +  curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers);
> +
> +  return 0;
> +}

OK, extracted from debuginfod_query_server with parameterized header,
writer callback functions. Also explains why curl_easy_setopt_ck can
just return now, cleanup is done in the caller on failure.

> +/*
> + * This function busy-waits on one or more curl queries to complete. This can
> + * be controled via only_one, which, if true, will find the first winner and 
> exit
> + * once found. If positive maxtime and maxsize dictate the maximum allowed 
> wait times
> + * and download sizes respectivly. Returns 0 on success and -Posix error on 
> faliure.
> + */
> +int
> +perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data 
> *data, debuginfod_client *c,
> +  int num_urls, long maxtime, lon