On Fri, 2022-06-24 at 00:00 +0530, Mir Immad wrote: Thanks for the updated patch.
This is close to being ready. One big issue is the target hook idea for isolating the target's definition of the O_* flags as per Joseph's suggestion here: https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html Given Joseph's comment here: https://gcc.gnu.org/pipermail/gcc/2022-June/238954.html I'm inclined to think that the target hook stuff could be done as a followup. Indentation has gone a bit wrong (presumably due to issue with the mailer); the main thing to watch out for is to use tabs for 8 space indentations within the file. Enabling "visual whitespace" (or whatever it's called) in your editor can help with this. Every patch that adds any options (e.g. to gcc/analyzer/analyzer.opt) needs to document those options, in gcc/doc/invoke.texi. Grep for one of the existing analyzer options in invoke.texi to see what you need to add for the new options. You'll need a ChangeLog entry before this can be pushed. For better or worse, the project requires ChangeLog entries. You can use the script ./contrib/mklog.py to help generate this. Some more information is here: https://www.gnu.org/prep/standards/html_node/Change-Logs.html In GCC our ChangeLog entries are part of the git commit message, and we have a serverside script to update the ChangeLog files in the source tree (to avoid constantly having merge conflicts). The Subject line for the commit should have an "analyzer: " prefix and reference the bugzilla problem report (PR 106003) Before you can push the patch, you'll have to test it with a full bootstrap and regression test run. Let me know if you need help with that. Various other comments inline below... [...snip...] > diff --git gcc/analyzer/analyzer.opt gcc/analyzer/analyzer.opt > index 23dfc797cea..e2a629bb42c 100644 > --- gcc/analyzer/analyzer.opt > +++ gcc/analyzer/analyzer.opt [...snip...] > +Wanalyzer-fd-use-without-check > +Common Var(warn_analyzer_fd_use_without_check) Init(1) Warning > +warn about code paths in which a possibly invalid file descriptor is > passed to a must-be-a-valid file descriptor function argument. These should be capitalized as English sentences, and the "must-be-a- valid" wording doesn't feel quite right to me. Reword this to something like: "Warn about code paths in which a file descriptor is used without being checked for validity." or somesuch (all on one line). [...snip...] > diff --git gcc/analyzer/sm-fd.cc gcc/analyzer/sm-fd.cc > new file mode 100644 > index 00000000000..048014d7a26 > --- /dev/null > +++ gcc/analyzer/sm-fd.cc > @@ -0,0 +1,770 @@ > +/* A state machine for detecting misuses of POSIX file descriptor APIs. > + Copyright (C) 2019-2022 Free Software Foundation, Inc. > + Contributed by Immad Mir <mirimma...@gmail.com>. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it > +under the terms of the GNU General Public License as published by > +the Free Software Foundation; either version 3, or (at your option) > +any later version. > + > +GCC is distributed in the hope that it will be useful, but > +WITHOUT ANY WARRANTY; without even the implied warranty of > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +General Public License for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "tree.h" > +#include "function.h" > +#include "basic-block.h" > +#include "gimple.h" > +#include "options.h" > +#include "diagnostic-path.h" > +#include "diagnostic-metadata.h" > +#include "function.h" > +#include "json.h" > +#include "analyzer/analyzer.h" > +#include "diagnostic-event-id.h" > +#include "analyzer/analyzer-logging.h" > +#include "analyzer/sm.h" > +#include "analyzer/pending-diagnostic.h" > +#include "analyzer/function-set.h" > +#include "analyzer/analyzer-selftests.h" > +#include "tristate.h" > +#include "selftest.h" > +#include "analyzer/call-string.h" > +#include "analyzer/program-point.h" > +#include "analyzer/store.h" > +#include "analyzer/region-model.h" > + > +#include <unistd.h> We'll want to drop the #include <unistd.h> once we have a target hook. [...snip...] > + > + state_machine::state_t > + get_default_state (const svalue *sval) const final override > + { > + if (tree cst = sval->maybe_get_constant ()) > + { > + int val = TREE_INT_CST_LOW (cst); > + if (val < 0) > + { > + return m_invalid; > + } > + } > + return m_start; > + } Please add coverage to the testsuite for the case of accessing negative and non-negative constant integer values: what happens when we hit the above cases? In particular, IIRC 0, 1, and 2 are stdin, stdout, and stderr respectively, and are already open at the start of "main" (though not necessarily at the start of any given *analysis path*, so we can't assume that) - so there may be code out there that has hardcoded accesses using these constant integer values. [...snip...] > + > + /* State for reperesenting a file descriptor that is known to be valid (>= > 0), > + for three different access */ Add " modes." to the end of the comment. [...snip...] > +/* Base diagnostic class relative to fd_state_machine. */ > +class fd_diagnostic : public pending_diagnostic > +{ > +public: > + fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm), m_arg > (arg) > + { > + } > + > + bool > + subclass_equal_p (const pending_diagnostic &base_other) const override > + { > + return same_tree_p (m_arg, ((const fd_diagnostic &)base_other).m_arg); > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + if (change.m_old_state == m_sm.get_start_state () > + && m_sm.is_unchecked (change.m_new_state)) > + { > + if (change.m_new_state == m_sm.m_unchecked_read_write) > + return change.formatted_print ("opened here as %<read-write%>"); > + > + if (change.m_new_state == m_sm.m_unchecked_read_only) > + return change.formatted_print ("opened here as %<read-only%>"); > + > + if (change.m_new_state == m_sm.m_unchecked_write_only) > + return change.formatted_print ("opened here as %<write-only%>"); "read-write" etc in these messages are part of the text of the messages, rather than quoting a source-language construct, so drop the %< and %> here. > + } > + > + if (change.m_new_state == m_sm.m_closed) > + return change.formatted_print ("closed here"); > + > + if (m_sm.is_unchecked (change.m_old_state) > + && m_sm.is_valid (change.m_new_state)) > + if (change.m_expr) > + return change.formatted_print ( > + "assuming %qE is a valid file descriptor (>= 0)", > change.m_expr); > + else > + return change.formatted_print ("assuming a valid file descriptor"); > + > + if (m_sm.is_unchecked (change.m_old_state) > + && change.m_new_state == m_sm.m_invalid) > + if (change.m_expr) > + return change.formatted_print ( > + "assuming %qE is an invalid file descriptor (< 0)", > change.m_expr); > + else > + return change.formatted_print ("assuming an invalid file > descriptor"); > + > + return label_text (); > + } > + > +protected: > + const fd_state_machine &m_sm; > + tree m_arg; > +}; > + > +class fd_access_mode_mismatch : public fd_diagnostic > +{ > +public: > + /* mode specfies whether read on write was attempted or vice versa. > + */ > + fd_access_mode_mismatch (const fd_state_machine &sm, tree arg, > + enum access_mode mode, const tree callee_fndecl) > + : m_mode (mode), m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, > arg) > + { > + } Every subclass of fd_diagnostic that adds fields ought to have its own override of: subclass_equal_p (const pending_diagnostic &base_other) const override which chains up to the parent class implementation, and also compares the new fields. > + > + const char * > + get_kind () const final override > + { > + return "fd_access_mode_mismatch"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_access_mode_mismatch; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on %qs file descriptor %qE", m_callee_fndecl, > + m_sm.get_string_for_access_mode (m_mode), m_arg); It's bad for localization to be constructing strings from fragments like this. It's a pain, but please eliminate get_string_for_access_mode in favor of switch statements wherever you'd use it, like here. > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + return fd_diagnostic::describe_state_change (change); > + } > + > + label_text > + describe_final_event (const evdesc::final_event &ev) final override > + { > + return ev.formatted_print ("%qE on %qs file descriptor %qE", > + m_callee_fndecl, > + m_sm.get_string_for_access_mode (m_mode), > m_arg); As above, please eliminate get_string_for_access_mode and convert to a switch. > + } > + > +private: > + enum access_mode m_mode; > + const tree m_callee_fndecl; > +}; > + [...snip...] > + > +class fd_use_after_close : public fd_diagnostic > +{ > +public: > + fd_use_after_close (const fd_state_machine &sm, tree arg, > + const tree callee_fndecl) > + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg) > + { > + } This subclass has its own fields, and so needs to implement: subclass_equal_p (const pending_diagnostic &base_other) const override [...snip...] > +class unchecked_use_of_fd : public fd_diagnostic > +{ > +public: > + unchecked_use_of_fd (const fd_state_machine &sm, tree arg, > + const tree callee_fndecl) > + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg) > + { > + } > + > + const char * > + get_kind () const final override > + { > + return "unchecked_use_of_fd"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_use_without_check; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on possibly invalid file descriptor %qE", > + m_callee_fndecl, m_arg); How about: "%qE on file descriptor %qE without checking %<open%> succeeded" > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + if (m_sm.is_unchecked (change.m_new_state)) > + { > + m_first_open_event = change.m_event_id; > + return label_text::borrow ("opened here"); > + } > + > + return fd_diagnostic::describe_state_change (change); > + } > + > + label_text > + describe_final_event (const evdesc::final_event &ev) final override > + { > + return ev.formatted_print ("%qE could be invalid: unchecked value from > %@", > + m_arg, &m_first_open_event); > + } > + > +private: > + diagnostic_event_id_t m_first_open_event; > + const tree m_callee_fndecl; > +}; > + [...snip...] > +bool > +fd_state_machine::is_unchecked (state_t s) const > +{ > + return s == m_unchecked_read_write > + || s == m_unchecked_read_only > + || s == m_unchecked_write_only; > +} Thanks for splitting these onto multiple lines, but please wrap them with parentheses, so they look like: return (s == m_unchecked_read_write || s == m_unchecked_read_only || s == m_unchecked_write_only); > + > +bool > +fd_state_machine::is_valid (state_t s) const > +{ > + return s == m_valid_read_write > + || s == m_valid_read_only > + || s == m_valid_write_only; > +} Likewise here. > + > +const char * > +fd_state_machine::get_string_for_access_mode (enum access_mode mode) const > +{ > + if (mode == READ_ONLY) > + return "read-only"; > + else if (mode == WRITE_ONLY) > + return "write-only"; > + else > + return "read-write"; > +} As noted above it's bad for localization to be constructing strings from fragments like this, so please eliminate the above function, in favor of switch statements wherever you'd use it. > + > +enum access_mode > +fd_state_machine::get_access_mode_from_flag (int flag) const > +{ > + /* FIXME: this code assumes the access modes on the host and > + target are the same, which in practice might not be the case. */ Thanks for moving this into a subroutine. Joseph says we should introduce a target hook for this: https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html You can see an example of adding a target hook in commit 8cdcea51c0fd753e6a652c9b236e91b3a6e0911c. As noted above, I think we can leave adding the target hook until a followup patch, if this is only going to be an issue with cross- compilation between Hurd and non-Hurd systems. > + if (flag == O_RDONLY) > + { > + return READ_ONLY; > + } > + else if (flag == O_WRONLY) > + { > + return WRITE_ONLY; > + } > + return READ_WRITE; > +} > + > +enum access_mode > +fd_state_machine::get_access_mode_from_state (state_t state) const > +{ > + if (state == m_unchecked_read_only || state == m_valid_read_only) > + return READ_ONLY; > + else if (state == m_unchecked_write_only || state == m_valid_write_only) > + return WRITE_ONLY; > + else if (state == m_unchecked_read_write || state == m_valid_write_only) I think there's a copy-and-paste error in the above line: ^^^^^^^^^^^^^^^^^^ > + return READ_WRITE; > +} [...snip...] > +void > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call) const > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs) > + { > + tree arg = gimple_call_arg (call, 1); > + if (TREE_CODE (arg) == INTEGER_CST) > + { > + int flag = TREE_INT_CST_LOW (arg); > + enum access_mode mode = get_access_mode_from_flag (flag); > + if (mode == READ_ONLY) > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_only); > + else if (mode == WRITE_ONLY) > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_write_only); > + else > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_write); I think this can be a switch (mode). > + } > + } > + else > + { > + /* FIXME: add leak reporting */ > + } > +} [...snip...] > +void > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_mode mode = get_access_mode_from_state (state); > + > + if (check_for_closed_fd (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, > callee_fndecl)); > + } > + > + if (!check_for_closed_fd (state)) > + { Why not just an "else" here? > + > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > + { > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } > + > + if (mode == WRITE_ONLY) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, mode, > + callee_fndecl)); > + } > + } > +} > +void > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_mode mode = get_access_mode_from_state (state); > + > + if (check_for_closed_fd (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, > callee_fndecl)); > + } > + > + if (!check_for_closed_fd (state)) > + { Again, I think this can be another "else" here. > + > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > + { > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } ...but I think it's slightly cleaner to move all of this repeated logic into a "check_for_open_fd" subroutine, that checks both for fd_use_after_close and for unchecked_use_of_fd. > + > + if (mode == READ_ONLY) > + { > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, > mode, > + callee_fndecl)); > + } > + } > +} > + [...snip...] > diff --git gcc/testsuite/gcc.dg/analyzer/fd-1.c > gcc/testsuite/gcc.dg/analyzer/fd-1.c > new file mode 100644 > index 00000000000..985e9ac75de > --- /dev/null > +++ gcc/testsuite/gcc.dg/analyzer/fd-1.c > @@ -0,0 +1,26 @@ > +int open(const char *, int mode); > +#define O_RDONLY 0 > +#define O_WRONLY 1 > +#define O_RDWR 2 If we're going with a target hook, then we should use the target's <unistd.h> to get at the O_ macros. But as noted above, that can be deferred to a follow-up patch. [...snip...] Hope this makes sense and is constructive; thanks again for the updated patch Dave