Re: oss-fuzz

2019-12-26 Thread Mark Wielaard
Hi Berkeley,

On Mon, Dec 23, 2019 at 08:06:54AM +0200, Berkeley Churchill wrote:
> Great, thanks for the feedback!
> 
> One of my first tasks will be to support llvm/clang builds.  I've seen some
> prior discussion on what's needed for that, but if you have any extra tips
> I'll take them.  I'll be sure to create a build target for the fuzzers so
> they can be run standalone.

clang is slightly inconvenient because it doesn't implement various
GNU C extensions. We even have a configure check for them now so it is
clear what we require from a C/gnu99 compiler:
https://sourceware.org/git/?p=elfutils.git;a=blob;f=configure.ac;hb=HEAD#l98

In theory when clang support that, everything should just compile.

There have been some attempts to rewrite some source code to get clang
to accept it:
https://sourceware.org/git/?p=elfutils.git&a=search&h=HEAD&st=commit&s=clang

But there is just too much code clang simply doesn't parse.

I don't know how much work there is left to get clang to accept
everything. But Matthias (CCed) said he got somewhat further on irc
once. Maybe he can share his patches.

A simpler approach would be to see if oss-fuzz really needs clang at
all. As far as I know the only thing needed is a compiler supporting
inserting tracing calls into every basic block and/or comparison
operations and linking to some (C++) library that intercepts those. gcc
can do that with -fsanitize-coverage=trace-pc and/or
-fsanitize-coverage=trace-cmp (which I believe is command line
compatible with what clang uses).

Cheers,

Mark

> On Mon, Dec 23, 2019 at 3:12 AM Mark Wielaard  wrote:
> 
> > Hi Berkeley,
> >
> > On Fri, 2019-12-20 at 17:21 +0200, Berkeley Churchill wrote:
> > > Any interest in integrating with oss-fuzz?  It's a google project
> > > that supports open source projects by fuzzing. It allows Google to
> > > find and report bugs, especially security bugs, to the project.
> > > I'm willing to work on writing fuzzers and performing the integration,
> > > if this would be welcome by the maintainers.   Thoughts?
> >
> > Certainly interested. I have been running afl-fuzz on various utilities
> > and test cases. That has found lots of issues. But it isn't very
> > structured. And it often needs to go through a completely valid ELF
> > file before fuzzing the more interesting data structures inside it.
> >
> > The only request I would have is that if the fuzzer targets are added
> > to elfutils itself then they should also be made to work locally. So
> > someone could also use them with e.g. afl-fuzz or some other fuzzing
> > framework, or simply as extra testcase.
> >
> > Please also see:
> > https://sourceware.org/git/?p=elfutils.git;f=CONTRIBUTING;hb=HEAD
> >
> > Cheers,
> >
> > Mark
> >


PATCH: debuginfod: extracted-file cache

2019-12-26 Thread Frank Ch. Eigler
Hi -

A debuginfod optimization, including docs & tests.
Also on fche/debuginfod-fd-cache branch in git.

debuginfod: extracted-from-archive file cache

Add a facility to service webapi and dwz/altdebug requests that
resolve to archives via a $TMPDIR file cache.  This permits
instantaneous dwz resolution during -debuginfo rpm scanning, and also
instantanous duplicate webapi requests.  The cache is limited both in
number of entries and in storage space.  Heuristics provide
serviceable defaults.

diff --git a/config/ChangeLog b/config/ChangeLog
index cc4187bf0325..b56c2c158ae3 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,7 @@
+2019-12-26  Frank Ch. Eigler  
+
+   * debuginfod.service: Set PrivateTmp=yes.
+
 2019-12-22  Frank Ch. Eigler  
 
* elfutils.spec.in (debuginfod): Add BuildRequire dpkg
diff --git a/config/debuginfod.service b/config/debuginfod.service
index d8ef072be9ef..8fca343fb70e 100644
--- a/config/debuginfod.service
+++ b/config/debuginfod.service
@@ -10,6 +10,7 @@ Group=debuginfod
 #CacheDirectory=debuginfod
 ExecStart=/usr/bin/debuginfod -d /var/cache/debuginfod/debuginfod.sqlite -p 
$DEBUGINFOD_PORT $DEBUGINFOD_VERBOSE $DEBUGINFOD_PRAGMAS $DEBUGINFOD_PATHS
 TimeoutStopSec=10
+PrivateTmp=yes
 
 [Install]
 WantedBy=multi-user.target
diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
index 1582eba5bc0e..61e9a7b9ba68 100644
--- a/debuginfod/ChangeLog
+++ b/debuginfod/ChangeLog
@@ -1,3 +1,15 @@
+2019-12-26  Frank Ch. Eigler  
+
+   * debuginfod.cxx (libarchive_fdcache): New class/facility to own a
+   cache of temporary files that were previously extracted from an
+   archive.  If only it could store just unlinked fd's instead of
+   filenames.
+   (handle_buildid_r_match): Use it to answer dwz/altdebug and webapi
+   requests.
+   (groom): Clean it.
+   (main): Initialize the cache control parameters from heuristics.
+   Use a consistent tmpdir for these and tmp files elsewhere.
+
 2019-12-22  Frank Ch. Eigler  
 
* debuginfod.cxx (*_rpm_*): Rename to *_archive_* throughout.
diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 70cb95fecd65..f308703e14ab 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -52,6 +52,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -76,6 +77,7 @@ extern "C" {
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -333,8 +335,8 @@ static const struct argp_option options[] =
{ NULL, 0, NULL, 0, "Scanners:", 1 },
{ "scan-file-dir", 'F', NULL, 0, "Enable ELF/DWARF file scanning threads.", 
0 },
{ "scan-rpm-dir", 'R', NULL, 0, "Enable RPM scanning threads.", 0 },
-   { "scan-deb-dir", 'U', NULL, 0, "Enable DEB scanning threads.", 0 },   
-   // "source-oci-imageregistry"  ... 
+   { "scan-deb-dir", 'U', NULL, 0, "Enable DEB scanning threads.", 0 },
+   // "source-oci-imageregistry"  ...
 
{ NULL, 0, NULL, 0, "Options:", 2 },
{ "logical", 'L', NULL, 0, "Follow symlinks, default=ignore.", 0 },
@@ -348,7 +350,10 @@ static const struct argp_option options[] =
{ "database", 'd', "FILE", 0, "Path to sqlite database.", 0 },
{ "ddl", 'D', "SQL", 0, "Apply extra sqlite ddl/pragma to connection.", 0 },
{ "verbose", 'v', NULL, 0, "Increase verbosity.", 0 },
-
+#define ARGP_KEY_FDCACHE_FDS 0x1001
+   { "fdcache-fds", ARGP_KEY_FDCACHE_FDS, "NUM", 0, "Maximum number of archive 
files to keep in fdcache.", 0 },
+#define ARGP_KEY_FDCACHE_MBS 0x1002
+   { "fdcache-mbs", ARGP_KEY_FDCACHE_MBS, "MB", 0, "Maximum total size of 
archive file fdcache.", 0 },
{ NULL, 0, NULL, 0, NULL, 0 }
   };
 
@@ -377,7 +382,7 @@ static volatile sig_atomic_t sigusr2 = 0;
 static unsigned http_port = 8002;
 static unsigned rescan_s = 300;
 static unsigned groom_s = 86400;
-static unsigned maxigroom = false;
+static bool maxigroom = false;
 static unsigned concurrency = std::thread::hardware_concurrency() ?: 1;
 static set source_paths;
 static bool scan_files = false;
@@ -386,6 +391,9 @@ static vector extra_ddl;
 static regex_t file_include_regex;
 static regex_t file_exclude_regex;
 static bool traverse_logical;
+static long fdcache_fds;
+static long fdcache_mbs;
+static string tmpdir;
 
 static void set_metric(const string& key, int64_t value);
 // static void inc_metric(const string& key);
@@ -449,6 +457,12 @@ parse_opt (int key, char *arg,
   if (rc != 0)
 argp_failure(state, 1, EINVAL, "regular expession");
   break;
+case ARGP_KEY_FDCACHE_FDS:
+  fdcache_fds = atol (arg);
+  break;
+case ARGP_KEY_FDCACHE_MBS:
+  fdcache_mbs = atol (arg);
+  break;
 case ARGP_KEY_ARG:
   source_paths.insert(string(arg));
   break;
@@ -723,8 +737,6 @@ struct defer_dtor
 
 
 
-
-
 static string
 conninfo (struct MHD_Connection * conn)
 {
@@ -849,6 +861,148 @@ shell_escape(const string& str)
 }
 
 
+// A map-lik