Re: Revisiting Error handling (errexit)

Yair Lenga Fri, 08 Jul 2022 02:53:40 -0700

Greetings,

First, I wanted to thank all the people that took the time to provide
comments on the proposed improvements to the error handling. I believe that
the final proposal is ready for evaluation (both behavior and
implementation).


Summary:
* Proposing adding new option 'errfail' (sh -o errfail). The new option
makes it significantly easier to build production quality scripts, where
every error condition must be handled, and errors will not silently be
ignored by default. With the new option, it's possible to improve tje error
handling of existing scripts, as it follows common patterns. In general,
the implementation works along the lines of the 'try ... catch'
functionality supported by most modern scripting languages (Python,
Javascript, Perl, Groovy, ... to name a few). I believe the solution
implements what the 'errexit' that we really need - abort sequences of
commands whenever unhandled error occurs. The solution avoids (hopefully)
all the glitches that exist with the 'errexit' option - which can not be
fixed due to backward compatibility issues.

To activate: sh -o errfail
To disable: sh +o errfail

Short summary: when 'errfail' is on (sh -o errfail), each command must
succeed, otherwise, the command will fail.
More details:
* Functions that run under 'errfail' will return immediately on the first
statement that returns a non-zero error code.
* Sequence of commands separated by ';' or new line will stop on the first
statement that returns non-zero error code.
* The while and until statements will 'break' if a command in the "body"
returns a non-zero error code.
* The for each, and arithmetic for statement will "break" if a command in
the body returns a non-zero error code.
* The top level read-eval-loop will exit if a top level command returns a
non-zero error code.
* Behavior inherited by default into sub-shells
* Possible to explicitly turn on/off in a function scope, using the 'local
-' construct.

For users that are familiar with the 'errexit' behavior and limitations:
The limitations of the 'errexit' are covered extensively in stack overflow,
current mailing list, and numerous other locations - no need to rehash.

What is "unhandled error condition'' ? and how to handle them ? How to
ignore non-critical errors ?
* In bash, error conditions are signaled with non-zero return code from a
command, function or external process.
* Two common way to specify error handling is with 'if-then-else'
statement, or with the '||' connector (see below)
* Common method for ignoring non-critical error is to add '|| true'
(assuming the command itself will produce a meaningful error message, if
needed).
---
# Using if-then-else
if command-that-can-fail ; then
   echo "ALL GOOD"
else
   try-to-recover              # Try to handle the error
   if FATAL ; then exit 1    # If not possible to recover
fi
----
# Using the '||' connector
command-than-can-fail || try-to-recover || exit 1
----

# Handling non-critical errors
if non-critical-command ;
non-critical-command-that-can-fail || true      # silent error handling
non-critical-command-that-can-fail || :           # harder to notice the
':', but does the job.
non-critical-command-than-can-fail || echo "warning doing X - continuing"
>&2   # With some message


Practical Example - real life. A job has to copy 3 critical data files. It
then sends notification via email (non-critical).

#! /bin/bash
set -o errfail
function copy-files {
     # ALL files are critical, script will not cont
     cp /new/file1 /production/path/
     cp /new/file2 /production/path/
     # Use the files for something - e.g., count the lines.
     important-job /production/path/file1 /production/path/file2

     ls -l /production/path | mail -s "all-good" not...@company.com ||
true    # Not critical
}

if copy-files ; then
     more-critical-jobs
      echo "ALL GOOD"
else
      mail -s "PROBLEM" nor...@company.com < /dev/null
fi

What is the difference ? consider the case where /new/file1 does not
exists, which is critical error.
* Without errfail, an error message will be sent to script stderr, but the
script will continue to copy the 2nd file, and to perform the
important-job, even though the data is not ready.
* With errexit, we hit one of the pitfalls, an error message will be sent
to the script stderr, but the script will continue, same as without
'errexit'. errexit will only get triggered for 'more-critical-jobs'.
* With errfail, copy-files will stop after the first failure in the 'cp',
it will then continue to the 'else' section to send the alert.

Thanks for taking the time to review. Patch on bash-devel attached. For
those interested: 50 lines of code, most of them are comments. 8 hours of
development, including automated test script.

Looking for advice on how to "officially" submit.

For developers that want more stylish coding:
alias try=''
alias catch='||'

try { copy-files ; }
catch { mail -s "PROBLEM" ... ; }





On Mon, Jul 4, 2022 at 3:20 PM Yair Lenga <yair.le...@gmail.com> wrote:

> Hi,
>
> In my projects, I'm using bash to manage large scale jobs. Works very
> well, especially, when access to servers is limited to ssh. One annoying
> issue is the error handling - the limits/shortcomings of the 'errexit',
> which has been documented and discussed to the Nth degree in multiple
> forums.
>
> Needless to say, trying to extend bash to support try/catch clauses, (like
> other scripting solutions: Python, Groovy, Perl, ...), will be a major
> effort, which may never happen. Instead, I've tried to look into a minimal
> solution that will address the most common pitfall of errexit, where many
> sequences (e.g., series of commands in a function) will not properly
> "break" with 'errexit'. For example:
>
> function foo {
>     cat /missing/file   # e.g.: cat non-existing file.
>     action2   # Executed even if action 1 fail.
>     action3
> }
>
> set -oerrexit   # want to catch errors in 'foo'
> if ! foo ; then
>     # Error handling for foo failure
> fi
>
>
>
On Mon, Jul 4, 2022 at 3:20 PM Yair Lenga <yair.le...@gmail.com> wrote:

> Hi,
>
> In my projects, I'm using bash to manage large scale jobs. Works very
> well, especially, when access to servers is limited to ssh. One annoying
> issue is the error handling - the limits/shortcomings of the 'errexit',
> which has been documented and discussed to the Nth degree in multiple
> forums.
>
> Needless to say, trying to extend bash to support try/catch clauses, (like
> other scripting solutions: Python, Groovy, Perl, ...), will be a major
> effort, which may never happen. Instead, I've tried to look into a minimal
> solution that will address the most common pitfall of errexit, where many
> sequences (e.g., series of commands in a function) will not properly
> "break" with 'errexit'. For example:
>
> function foo {
>     cat /missing/file   # e.g.: cat non-existing file.
>     action2   # Executed even if action 1 fail.
>     action3
> }
>
> set -oerrexit   # want to catch errors in 'foo'
> if ! foo ; then
>     # Error handling for foo failure
> fi
>
> I was able to change Bash source and build a version that supports the new
> option 'errfail' (following the 'pipefail' naming), which will do the
> "right" thing in many cases - including the above - 'foo' will return 1,
> and will NOT proceed to action2. The implementation changes the processing
> of command-list ( '{ action1 ; action2 ; ... }') to break of the list, if
> any command returns a non-zero code, that is
>
> set -oerrfail
> { echo BEFORE ; false ; echo AFTER ; }
>
> Will print 'BEFORE', and return 1 (false), when executed under 'errfail'
>
> I'm looking for feedback on this implementation. Will be happy to share
> the code, if there is a chance that this will be accepted into the bash
> core code - I believe it will make it easier to strengthen many production
> systems that use Bash.
>
> To emphasize, this is a minimal proposal, with no intention of expanding
> it into full support for exceptions handling, finally blocks, or any of the
> other features implemented in other (scripting) languages.
>
> Looking for any feedback.
>
> Yair
>

diff -ur orig/bash-devel/builtins/set.def new/bash-devel/builtins/set.def
--- orig/bash-devel/builtins/set.def    2022-06-28 23:15:24.000000000 +0300
+++ new/bash-devel/builtins/set.def     2022-07-07 15:37:24.929011000 +0300
@@ -76,6 +76,8 @@
           emacs        use an emacs-style line editing interface
 #endif /* READLINE */
           errexit      same as -e
+          errfail      execution of command lists will stop whenever
+                       a single command return non-zero status
           errtrace     same as -E
           functrace    same as -T
           hashall      same as -h
@@ -196,6 +198,7 @@
   { "emacs",     '\0', (int *)NULL, set_edit_mode, get_edit_mode },
 #endif
   { "errexit",   'e', (int *)NULL, (setopt_set_func_t *)NULL, 
(setopt_get_func_t *)NULL  },
+  { "errfail",   '\0', &errfail_opt, (setopt_set_func_t *)NULL, 
(setopt_get_func_t *)NULL  },
   { "errtrace",          'E', (int *)NULL, (setopt_set_func_t *)NULL, 
(setopt_get_func_t *)NULL  },
   { "functrace",  'T', (int *)NULL, (setopt_set_func_t *)NULL, 
(setopt_get_func_t *)NULL  },
   { "hashall",    'h', (int *)NULL, (setopt_set_func_t *)NULL, 
(setopt_get_func_t *)NULL  },
diff -ur orig/bash-devel/eval.c new/bash-devel/eval.c
--- orig/bash-devel/eval.c      2022-06-28 23:15:24.000000000 +0300
+++ new/bash-devel/eval.c       2022-07-08 11:00:01.670123000 +0300
@@ -58,6 +58,7 @@
 {
   int our_indirection_level;
   COMMAND * volatile current_command;
+  int cmd_result ;
 
   USE_VAR(current_command);
 
@@ -168,7 +169,12 @@
              executing = 1;
              stdin_redir = 0;
 
-             execute_command (current_command);
+             cmd_result = execute_command (current_command);
+          /* with errfail, failure in commands (non-zero status)
+             will terminate execution. */
+          if ( errfail_opt && cmd_result != EXIT_SUCCESS ) {
+              EOF_Reached = EOF ;
+             } ;
 
            exec_done:
              QUIT;
diff -ur orig/bash-devel/execute_cmd.c new/bash-devel/execute_cmd.c
--- orig/bash-devel/execute_cmd.c       2022-06-28 23:15:24.000000000 +0300
+++ new/bash-devel/execute_cmd.c        2022-07-08 11:46:05.731181500 +0300
@@ -2749,7 +2749,7 @@
       QUIT;
 
 #if 1
-      execute_command (command->value.Connection->first);
+      exec_result = execute_command (command->value.Connection->first);
 #else
       execute_command_internal (command->value.Connection->first,
                                  asynchronous, pipe_in, pipe_out,
@@ -2757,10 +2757,15 @@
 #endif
 
       QUIT;
-      optimize_connection_fork (command);                      /* XXX */
-      exec_result = execute_command_internal 
(command->value.Connection->second,
+
+      /* With errfail, the ';' is similar to '&&' */
+      /* Execute the second part, only if first part was OK */
+      if ( !errfail_opt || exec_result == EXECUTION_SUCCESS ) {
+          optimize_connection_fork (command);                  /* XXX */
+          exec_result = execute_command_internal 
(command->value.Connection->second,
                                      asynchronous, pipe_in, pipe_out,
                                      fds_to_close);
+      } ;
       executing_list--;
       break;
 
@@ -2992,6 +2997,11 @@
          if (continuing)
            break;
        }
+
+      // with errfail - execute a break if the body failed
+      if ( errfail_opt && retval != EXECUTION_SUCCESS ) {
+          break ;
+      } ;
     }
 
   loop_level--;
@@ -3164,16 +3174,21 @@
            break;
        }
 
+      // with errfail: assume a break if the body failed
+      if ( errfail_opt && body_status != EXECUTION_SUCCESS ) {
+          break ;
+      } ;
+
       /* Evaluate the step expression. */
       line_number = arith_lineno;
       expresult = eval_arith_for_expr (arith_for_command->step, &expok);
       line_number = save_lineno;
 
       if (expok == 0)
-       {
-         body_status = EXECUTION_FAILURE;
-         break;
-       }
+           {
+               body_status = EXECUTION_FAILURE;
+               break;
+           }
     }
 
   loop_level--;
@@ -3734,6 +3749,12 @@
          if (continuing)
            break;
        }
+
+      // with errfail: assume a break if the body failed
+      if ( errfail_opt && body_status != EXECUTION_SUCCESS ) {
+          break ;
+      } ;
+
     }
   loop_level--;
 
diff -ur orig/bash-devel/flags.c new/bash-devel/flags.c
--- orig/bash-devel/flags.c     2022-06-28 23:15:24.000000000 +0300
+++ new/bash-devel/flags.c      2022-07-07 15:51:14.899947500 +0300
@@ -156,6 +156,15 @@
    with a 0 status, the status of the pipeline is 0. */
 int pipefail_opt = 0;
 
+/* Non-zero means that when executing connected commands (';' or new lines)
+   the sequence will be stopped when any individual commands return a non-zero
+   status. Similar to '&&'. Also, failure on the loop body will result
+   in breaking the loop (while, until, for, arith-for).
+
+   Used for improved error handling */
+
+int errfail_opt = 0;
+
 /* **************************************************************** */
 /*                                                                 */
 /*                     The Flags ALIST.                            */
diff -ur orig/bash-devel/flags.h new/bash-devel/flags.h
--- orig/bash-devel/flags.h     2022-06-28 23:15:24.000000000 +0300
+++ new/bash-devel/flags.h      2022-07-07 15:51:38.165941100 +0300
@@ -48,7 +48,7 @@
   echo_command_at_execute, noclobber,
   hashing_enabled, forced_interactive, privileged_mode, jobs_m_flag,
   asynchronous_notification, interactive_comments, no_symbolic_links,
-  function_trace_mode, error_trace_mode, pipefail_opt;
+  function_trace_mode, error_trace_mode, pipefail_opt, errfail_opt ;
 
 /* -c, -s invocation options -- not really flags, but they show up in $- */
 extern int want_pending_command, read_from_stdin;

Re: Revisiting Error handling (errexit)

Reply via email to