Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
Hi,

Sorry, this took way too long. But I really like this code.

On Thu, 2019-04-18 at 13:17 +0200, Florian Weimer wrote:
> * Florian Weimer:
> 
> > > BTW. Florian, the extra options are certainly not required for you to
> > > implement to get eu-elfclassify accepted. They are just suggestions,
> > > which we might decide not to do/add. Or they can be added by others if
> > > they think they are useful.
> > 
> > Understood.  I would rather fix the command line syntax as a priority,
> > implement --unstripped, and add a test suite.
> 
> The patch below, also available here:
> 
>   
> 
> reworks the command line parser, implements filtering of file lists, and
> adds the --unstripped option.

That looks really good. I went ahead an fixed a couple of nits and
added some of my suggestions. I'll respond to your other email
explaining some of my reasoning. The changes I made are:

  elfclassify: Fix bre -> be typo in "unstripped" option help text.
  elfclassify: When reading stdin make sure paths don't include newline.
  elfclassify: Allow inspecting compressed or (kernel) image files with -z.
  elfclassify: Always clean up ELF file and descriptor if one is still open.
  elfclassify: Don't treat errors in elf_open or run_classify as fatal.
  elfclassify: Add --quiet/-q to suppress error messages.
  elfclassify: Add \n to fputs debug output.
  elfclassify: Add --file/-f for testing just regular files.
  elfclassify: Require --elf by default. Add more classifications.
  elfclassify: Add elf_kind and elf_type strings for verbose output.
  elfclassify: Require PT_LOAD for loadable classification.
  elfclassify: Add --program classification.
  elfclassify: Don't use ARGP_NO_EXIT and document exit code expectation.
  elfclassify: Add --linux-kernel-module classification.
  elfclassify: Add --debug-only classification.

Attached is the new version. The individual commits can be found here:
https://code.wildebeest.org/git/user/mjw/elfutils/log/?h=elfclassify

Please let me know if any of this looks bad or unusual.

I'll write some testcases.

Thanks,

Mark
diff --git a/src/Makefile.am b/src/Makefile.am
index 2b1c0dcb..69ac4dbe 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -26,7 +26,8 @@ AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
 AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw
 
 bin_PROGRAMS = readelf nm size strip elflint findtextrel addr2line \
-	   elfcmp objdump ranlib strings ar unstrip stack elfcompress
+	   elfcmp objdump ranlib strings ar unstrip stack elfcompress \
+	   elfclassify
 
 noinst_LIBRARIES = libar.a
 
@@ -83,6 +84,7 @@ ar_LDADD = libar.a $(libelf) $(libeu) $(argp_LDADD)
 unstrip_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl
 stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl $(demanglelib)
 elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
+elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 
 installcheck-binPROGRAMS: $(bin_PROGRAMS)
 	bad=0; pid=; list="$(bin_PROGRAMS)"; for p in $$list; do \
diff --git a/src/elfclassify.c b/src/elfclassify.c
new file mode 100644
index ..83a97d47
--- /dev/null
+++ b/src/elfclassify.c
@@ -0,0 +1,989 @@
+/* Classification of ELF files.
+   Copyright (C) 2019 Red Hat, Inc.
+   This file is part of elfutils.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include ELFUTILS_HEADER(elf)
+#include ELFUTILS_HEADER(dwelf)
+#include "printversion.h"
+
+/* Name and version of program.  */
+ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
+
+/* Bug report address.  */
+ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
+
+/* Set by parse_opt.  */
+static int verbose;
+
+/* Set by the main function.  */
+static const char *current_path;
+
+/* Set by open_file.  */
+static int file_fd = -1;
+
+/* Set by issue or elf_issue.  */
+static bool issue_found;
+
+/* Non-fatal issue occured while processing the current_path.  */
+static void
+issue (int e, const char *msg)
+{
+  if (verbose >= 0)
+{
+  if (current_path == NULL)
+	error (0, e, "%s", msg);
+  else
+	error (0, e, "%s '%s'", msg, current_path);
+}
+  issue_found = true;
+}
+
+/* Non-fatal issue occured whi

Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
Hi,

Some answers to this older discussion to explain some of my recent
commits suggested for elfclassify.

On Tue, 2019-04-16 at 13:38 +0200, Florian Weimer wrote:
> * Mark Wielaard:
> > --elf PATH return 0 whenever the file can be opened and a minimal ELF
> > header can be read (it might not be a completely valid ELF file). Do we
> > want or need to do any more verification (e.g. try to get the full ELF
> > header, walk through all phdrs and shdrs)?
> If we ever need that, I think we should add it as separate options,
> detecting both separate debuginfo and regular ELF files.

You are probably right that a real verification isn't the task of
elfclassify. We already have elflint. But I did make --elf required by
default (with as check simply that libelf could open the file without
marking it as ELF_K_NONE). The user can disable it again with --not-
elf. It felt slightly odd to not have any filter active by default.

I did add two "sub" classifications --elf-file and --elf-archive
because I believe people often make a distinction between "normal" ELF
files and ELF archives (containers of ELF objects).

I also added detection of --debug-only ELF file using the heuristic
that Frank also came up with for the dbgserver (contains .debug_*
sections, but no allocated SHT_PROGBIT sections).

> > --unstripped (not yet implemented) would be a classification that
> > indicates whether the ELF file can be stripped (further), that is has a
> > .symtab (symbol table), .debug_* sections (and possibly any non-
> > loadable sections -- "file" only detects the first two).
> 
> Some non-allocated sections are expected in stripped binaries:
> .gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
> context.  I'm not sure if we should flag any other non-allocated section
> in this way.

I don't think those sections need special flagging.

> > I am not sure --file=PATH is a useful option.
> 
> It's useful for determining if the file exists and can be mapped.
> 
> > But maybe we need some way to indicate whether a file is a real file or
> > a symlink? But the current implementation returns 0 even for symlinks.
> > As do every other option (if the file is a symlink to an ELF file of
> > the requested classification). Is this what we want? I would suggest
> > that we return 1 for anything that is not a regular file. But that
> > would mean that for example eu-elfclassify --executable=/proc/$$/exe
> > would also return 1 (currently it returns 0, which might be helpful in
> > some cases).
> 
> I don't know what RPM needs in this context.  I expect that it can
> easily filter out non-regular files.  My problem with symbolic link
> detection is that it is so inconsistent—it generally applies to the
> final pathname component, and that does not look useful to me.

Since --file was available again I reused it to indicate that the final
pathname component should be a regular file (not a symlink or special
device). I think that is useful in general. But you are right there are
other tools to filter out symlinks/special device files. It was useful
for quick and dirty testing though.

> > --loadable basically checks whether the given ELF file is not an object
> > (ET_REL) file, so it will return 0 for either an executable, a shared
> > object or core file, but not check whether any other attribute (like
> > whether it has program headers and/or loadable segments). Personally I
> > would like it if this at least included a check for a PT_LOAD segment.
> 
> Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
> It is possible to have mostly empty objects, after all.

I did add this extra check, because I am a little paranoid about having
to deal with totally broken ELF files. My reasoning was basically that
without a PT_LOAD there is nothing in memory to load/execute.

> > This does not classify kernel modules as loadable objects.
> > rpm does contain a check for that, it might make sense to include that
> > as a separate classification in elfclassify --kernel-module.
> > 
> > Kernel modules are also special because they can be compressed ELF
> > files. Do we want to support that? (It is easy with gelf_elf_begin).
> > That could for example be an flag/option like --compressed which can be
> > combined with any other classification option?
> 
> How relevant are kernel modules to eu-elfclassify?
> 
> Is path-based detection feasible for kernel modules?

Sadly kernel modules often need special treatment that you wouldn't
need for "normal" ET_REL files. path-based detection is only partially
possible (rpm used to use the extension name, but that was fragile). So
I added an option --linux-kernel-module that detects them. I also added
a -z option to try to detect ELF files inside compressed images because
it was easy and because these days kernel modules are often compressed.

> > I think another useful classification would be --debugfile which
> > succeeds if the primary function of the given ELF file is being a
> 

Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
[...]
> +static bool
> +is_shared (void)
> +{
> +  if (!is_loadable ())
> +return false;
> +
> +  /* The ELF type is very clear: this is an executable.  */
> +  if (elf_type == ET_EXEC)
> +return false;
> +
> +  /* If the object is marked as PIE, it is definitely an executable,
> + and not a loadlable shared object.  */
> +  if (has_pie_flag)
> +return false;
> +
> +  /* Treat a DT_SONAME tag as a strong indicator that this is a shared
> + object.  */
> +  if (has_soname)
> +return true;

I'm not sure DT_SONAME is a reliable indicator.

I've seen many cases of DT_SONAME being erroneously applied to 
non-libraries, e.g. lib.so was used as soname in openjdk executables.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark
On Fri, 2019-07-19 at 16:43 +0300, Dmitry V. Levin wrote:
> On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
> [...]
> > +static bool
> > +is_shared (void)
> > +{
> > +  if (!is_loadable ())
> > +return false;
> > +
> > +  /* The ELF type is very clear: this is an executable.  */
> > +  if (elf_type == ET_EXEC)
> > +return false;
> > +
> > +  /* If the object is marked as PIE, it is definitely an
> > executable,
> > + and not a loadlable shared object.  */
> > +  if (has_pie_flag)
> > +return false;
> > +
> > +  /* Treat a DT_SONAME tag as a strong indicator that this is a
> > shared
> > + object.  */
> > +  if (has_soname)
> > +return true;
> 
> I'm not sure DT_SONAME is a reliable indicator.
> 
> I've seen many cases of DT_SONAME being erroneously applied to 
> non-libraries, e.g. lib.so was used as soname in openjdk executables.

I didn't know. Is this really common?
I did find one java binary on my system that indeed has this problem.
$ eu-readelf -d /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-
0.el7_6.x86_64/jre/bin/policytool

Dynamic segment contains 39 entries:
 Addr: 0x00600d88  Offset: 0x000d88  Link to section: [ 7]
'.dynstr'
  Type  Value
  NEEDEDShared library: [libpthread.so.0]
  NEEDEDShared library: [libz.so.1]
  NEEDEDShared library: [libX11.so.6]
  NEEDEDShared library: [libjli.so]
  NEEDEDShared library: [libdl.so.2]
  NEEDEDShared library: [libc.so.6]
  SONAMELibrary soname: [lib.so]
  RPATH Library rpath:
[$ORIGIN/../lib/amd64/jli:$ORIGIN/../lib/amd64]
[...]

But even so eu-elfclassify still doesn't treat it as a shared library,
because:
$ eu-elfclassify -v --shared policytool; echo $?
info: policytool: ELF kind: ELF_K_ELF (0x3)
info: policytool: ELF type: ET_EXEC (0x2)
info: policytool: PT_LOAD found
info: policytool: allocated PROGBITS section found
info: policytool: program interpreter found
info: policytool: dynamic segment found
info: policytool: soname found
info: policytool: DT_DEBUG found
1

So other characteristics like it being ET_EXEC mark it as an
executable. And I assume if it was PIE (ET_DYN) the PIE DT_FLAGS would
have caught it.

So, I don't think the code is wrong. We might want to tweak the comment
a bit though, to make it less definitive?

Cheers,

Mark


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 04:21:53PM +0200, Mark wrote:
> On Fri, 2019-07-19 at 16:43 +0300, Dmitry V. Levin wrote:
> > On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
> > [...]
> > > +static bool
> > > +is_shared (void)
> > > +{
> > > +  if (!is_loadable ())
> > > +return false;
> > > +
> > > +  /* The ELF type is very clear: this is an executable.  */
> > > +  if (elf_type == ET_EXEC)
> > > +return false;
> > > +
> > > +  /* If the object is marked as PIE, it is definitely an
> > > executable,
> > > + and not a loadlable shared object.  */
> > > +  if (has_pie_flag)
> > > +return false;
> > > +
> > > +  /* Treat a DT_SONAME tag as a strong indicator that this is a
> > > shared
> > > + object.  */
> > > +  if (has_soname)
> > > +return true;
> > 
> > I'm not sure DT_SONAME is a reliable indicator.
> > 
> > I've seen many cases of DT_SONAME being erroneously applied to 
> > non-libraries, e.g. lib.so was used as soname in openjdk executables.
> 
> I didn't know. Is this really common?

I don't think it is very common, but the mistake is very easy to make
(-Wl,-soname,lib.so) and it doesn't really break anything.  Apparently,
some projects apply the same linker flags that add DT_SONAME to all
generated files.

> I did find one java binary on my system that indeed has this problem.
> $ eu-readelf -d /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-
> 0.el7_6.x86_64/jre/bin/policytool
> 
> Dynamic segment contains 39 entries:
>  Addr: 0x00600d88  Offset: 0x000d88  Link to section: [ 7]
> '.dynstr'
>   Type  Value
>   NEEDEDShared library: [libpthread.so.0]
>   NEEDEDShared library: [libz.so.1]
>   NEEDEDShared library: [libX11.so.6]
>   NEEDEDShared library: [libjli.so]
>   NEEDEDShared library: [libdl.so.2]
>   NEEDEDShared library: [libc.so.6]
>   SONAMELibrary soname: [lib.so]
>   RPATH Library rpath:
> [$ORIGIN/../lib/amd64/jli:$ORIGIN/../lib/amd64]
> [...]
> 
> But even so eu-elfclassify still doesn't treat it as a shared library,
> because:
> $ eu-elfclassify -v --shared policytool; echo $?
> info: policytool: ELF kind: ELF_K_ELF (0x3)
> info: policytool: ELF type: ET_EXEC (0x2)
> info: policytool: PT_LOAD found
> info: policytool: allocated PROGBITS section found
> info: policytool: program interpreter found
> info: policytool: dynamic segment found
> info: policytool: soname found
> info: policytool: DT_DEBUG found
> 1
> 
> So other characteristics like it being ET_EXEC mark it as an
> executable. And I assume if it was PIE (ET_DYN) the PIE DT_FLAGS would
> have caught it.

Yes, the checks above has_soname are much more definitive.

> So, I don't think the code is wrong. We might want to tweak the comment
> a bit though, to make it less definitive?

What I'm saying is that has_soname is just a hint which is probably even
less reliable than has_program_interpreter.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Florian Weimer
* Dmitry V. Levin:

>> So, I don't think the code is wrong. We might want to tweak the comment
>> a bit though, to make it less definitive?
>
> What I'm saying is that has_soname is just a hint which is probably even
> less reliable than has_program_interpreter.

If I recall correctly, I added the soname check to classify
/lib64/libc.so.6 as a library, not an executable.  So it didn't come
completely out of nowhere.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> * Dmitry V. Levin:
> 
> >> So, I don't think the code is wrong. We might want to tweak the comment
> >> a bit though, to make it less definitive?
> >
> > What I'm saying is that has_soname is just a hint which is probably even
> > less reliable than has_program_interpreter.
> 
> If I recall correctly, I added the soname check to classify
> /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> completely out of nowhere.

Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.

If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
could we come up with a more reliable heuristics than DT_SONAME and PT_INTERP?


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
On Sat, Jul 20, 2019 at 12:23:08AM +0300, Dmitry V. Levin wrote:
> On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> > * Dmitry V. Levin:
> > 
> > >> So, I don't think the code is wrong. We might want to tweak the comment
> > >> a bit though, to make it less definitive?
> > >
> > > What I'm saying is that has_soname is just a hint which is probably even
> > > less reliable than has_program_interpreter.
> > 
> > If I recall correctly, I added the soname check to classify
> > /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> > completely out of nowhere.
> 
> Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.
> 
> If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
> could we come up with a more reliable heuristics than DT_SONAME and PT_INTERP?

Why do you feel it is unreliable? Do you have any examples of files
misidentified? I tested a bit and --shared seems to correctly
indentify all shared libraries. I did add --program as a counterpart
to --executable if you really want to identify such "libraries" as
programs. But in general it looks like --shared and --executable come
up with the correct classification.

The only two examples I could find were the glibc and Qt binaries
which have "dual use" library/executables. And I believe --shared
corrrectly identifies them as primarily shared libraries.

Cheers,

Mark


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 11:36:53PM +0200, Mark Wielaard wrote:
> On Sat, Jul 20, 2019 at 12:23:08AM +0300, Dmitry V. Levin wrote:
> > On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> > > * Dmitry V. Levin:
> > > 
> > > >> So, I don't think the code is wrong. We might want to tweak the comment
> > > >> a bit though, to make it less definitive?
> > > >
> > > > What I'm saying is that has_soname is just a hint which is probably even
> > > > less reliable than has_program_interpreter.
> > > 
> > > If I recall correctly, I added the soname check to classify
> > > /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> > > completely out of nowhere.
> > 
> > Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.
> > 
> > If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
> > could we come up with a more reliable heuristics than DT_SONAME and 
> > PT_INTERP?
> 
> Why do you feel it is unreliable? Do you have any examples of files
> misidentified?

No, I don't have such examples because most (if not all) ET_DYN
non-DF_1_PIE objects we have nowadays are actually libraries regardless
of their DT_SONAME or PT_INTERP.

What actually disqualifies these objects from being libraries, besides
missing PT_DYNAMIC?

The only reason why I feel uncomfortable to rely on this has_soname check
is that DT_SONAME is so easily added unnoticed by mistake.

btw, I think it would be appropriate to move the has_dynamic check before
the first check in is_shared that returns true.


-- 
ldv


signature.asc
Description: PGP signature