Greetings, First, I wanted to thank all the people that took the time to provide comments on the proposed improvements to the error handling. I believe that the final proposal is ready for evaluation (both behavior and implementation).
Summary: * Proposing adding new option 'errfail' (sh -o errfail). The new option makes it significantly easier to build production quality scripts, where every error condition must be handled, and errors will not silently be ignored by default. With the new option, it's possible to improve tje error handling of existing scripts, as it follows common patterns. In general, the implementation works along the lines of the 'try ... catch' functionality supported by most modern scripting languages (Python, Javascript, Perl, Groovy, ... to name a few). I believe the solution implements what the 'errexit' that we really need - abort sequences of commands whenever unhandled error occurs. The solution avoids (hopefully) all the glitches that exist with the 'errexit' option - which can not be fixed due to backward compatibility issues. To activate: sh -o errfail To disable: sh +o errfail Short summary: when 'errfail' is on (sh -o errfail), each command must succeed, otherwise, the command will fail. More details: * Functions that run under 'errfail' will return immediately on the first statement that returns a non-zero error code. * Sequence of commands separated by ';' or new line will stop on the first statement that returns non-zero error code. * The while and until statements will 'break' if a command in the "body" returns a non-zero error code. * The for each, and arithmetic for statement will "break" if a command in the body returns a non-zero error code. * The top level read-eval-loop will exit if a top level command returns a non-zero error code. * Behavior inherited by default into sub-shells * Possible to explicitly turn on/off in a function scope, using the 'local -' construct. For users that are familiar with the 'errexit' behavior and limitations: The limitations of the 'errexit' are covered extensively in stack overflow, current mailing list, and numerous other locations - no need to rehash. What is "unhandled error condition'' ? and how to handle them ? How to ignore non-critical errors ? * In bash, error conditions are signaled with non-zero return code from a command, function or external process. * Two common way to specify error handling is with 'if-then-else' statement, or with the '||' connector (see below) * Common method for ignoring non-critical error is to add '|| true' (assuming the command itself will produce a meaningful error message, if needed). --- # Using if-then-else if command-that-can-fail ; then echo "ALL GOOD" else try-to-recover # Try to handle the error if FATAL ; then exit 1 # If not possible to recover fi ---- # Using the '||' connector command-than-can-fail || try-to-recover || exit 1 ---- # Handling non-critical errors if non-critical-command ; non-critical-command-that-can-fail || true # silent error handling non-critical-command-that-can-fail || : # harder to notice the ':', but does the job. non-critical-command-than-can-fail || echo "warning doing X - continuing" >&2 # With some message Practical Example - real life. A job has to copy 3 critical data files. It then sends notification via email (non-critical). #! /bin/bash set -o errfail function copy-files { # ALL files are critical, script will not cont cp /new/file1 /production/path/ cp /new/file2 /production/path/ # Use the files for something - e.g., count the lines. important-job /production/path/file1 /production/path/file2 ls -l /production/path | mail -s "all-good" not...@company.com || true # Not critical } if copy-files ; then more-critical-jobs echo "ALL GOOD" else mail -s "PROBLEM" nor...@company.com < /dev/null fi What is the difference ? consider the case where /new/file1 does not exists, which is critical error. * Without errfail, an error message will be sent to script stderr, but the script will continue to copy the 2nd file, and to perform the important-job, even though the data is not ready. * With errexit, we hit one of the pitfalls, an error message will be sent to the script stderr, but the script will continue, same as without 'errexit'. errexit will only get triggered for 'more-critical-jobs'. * With errfail, copy-files will stop after the first failure in the 'cp', it will then continue to the 'else' section to send the alert. Thanks for taking the time to review. Patch on bash-devel attached. For those interested: 50 lines of code, most of them are comments. 8 hours of development, including automated test script. Looking for advice on how to "officially" submit. For developers that want more stylish coding: alias try='' alias catch='||' try { copy-files ; } catch { mail -s "PROBLEM" ... ; } On Mon, Jul 4, 2022 at 3:20 PM Yair Lenga <yair.le...@gmail.com> wrote: > Hi, > > In my projects, I'm using bash to manage large scale jobs. Works very > well, especially, when access to servers is limited to ssh. One annoying > issue is the error handling - the limits/shortcomings of the 'errexit', > which has been documented and discussed to the Nth degree in multiple > forums. > > Needless to say, trying to extend bash to support try/catch clauses, (like > other scripting solutions: Python, Groovy, Perl, ...), will be a major > effort, which may never happen. Instead, I've tried to look into a minimal > solution that will address the most common pitfall of errexit, where many > sequences (e.g., series of commands in a function) will not properly > "break" with 'errexit'. For example: > > function foo { > cat /missing/file # e.g.: cat non-existing file. > action2 # Executed even if action 1 fail. > action3 > } > > set -oerrexit # want to catch errors in 'foo' > if ! foo ; then > # Error handling for foo failure > fi > > > On Mon, Jul 4, 2022 at 3:20 PM Yair Lenga <yair.le...@gmail.com> wrote: > Hi, > > In my projects, I'm using bash to manage large scale jobs. Works very > well, especially, when access to servers is limited to ssh. One annoying > issue is the error handling - the limits/shortcomings of the 'errexit', > which has been documented and discussed to the Nth degree in multiple > forums. > > Needless to say, trying to extend bash to support try/catch clauses, (like > other scripting solutions: Python, Groovy, Perl, ...), will be a major > effort, which may never happen. Instead, I've tried to look into a minimal > solution that will address the most common pitfall of errexit, where many > sequences (e.g., series of commands in a function) will not properly > "break" with 'errexit'. For example: > > function foo { > cat /missing/file # e.g.: cat non-existing file. > action2 # Executed even if action 1 fail. > action3 > } > > set -oerrexit # want to catch errors in 'foo' > if ! foo ; then > # Error handling for foo failure > fi > > I was able to change Bash source and build a version that supports the new > option 'errfail' (following the 'pipefail' naming), which will do the > "right" thing in many cases - including the above - 'foo' will return 1, > and will NOT proceed to action2. The implementation changes the processing > of command-list ( '{ action1 ; action2 ; ... }') to break of the list, if > any command returns a non-zero code, that is > > set -oerrfail > { echo BEFORE ; false ; echo AFTER ; } > > Will print 'BEFORE', and return 1 (false), when executed under 'errfail' > > I'm looking for feedback on this implementation. Will be happy to share > the code, if there is a chance that this will be accepted into the bash > core code - I believe it will make it easier to strengthen many production > systems that use Bash. > > To emphasize, this is a minimal proposal, with no intention of expanding > it into full support for exceptions handling, finally blocks, or any of the > other features implemented in other (scripting) languages. > > Looking for any feedback. > > Yair >
diff -ur orig/bash-devel/builtins/set.def new/bash-devel/builtins/set.def --- orig/bash-devel/builtins/set.def 2022-06-28 23:15:24.000000000 +0300 +++ new/bash-devel/builtins/set.def 2022-07-07 15:37:24.929011000 +0300 @@ -76,6 +76,8 @@ emacs use an emacs-style line editing interface #endif /* READLINE */ errexit same as -e + errfail execution of command lists will stop whenever + a single command return non-zero status errtrace same as -E functrace same as -T hashall same as -h @@ -196,6 +198,7 @@ { "emacs", '\0', (int *)NULL, set_edit_mode, get_edit_mode }, #endif { "errexit", 'e', (int *)NULL, (setopt_set_func_t *)NULL, (setopt_get_func_t *)NULL }, + { "errfail", '\0', &errfail_opt, (setopt_set_func_t *)NULL, (setopt_get_func_t *)NULL }, { "errtrace", 'E', (int *)NULL, (setopt_set_func_t *)NULL, (setopt_get_func_t *)NULL }, { "functrace", 'T', (int *)NULL, (setopt_set_func_t *)NULL, (setopt_get_func_t *)NULL }, { "hashall", 'h', (int *)NULL, (setopt_set_func_t *)NULL, (setopt_get_func_t *)NULL }, diff -ur orig/bash-devel/eval.c new/bash-devel/eval.c --- orig/bash-devel/eval.c 2022-06-28 23:15:24.000000000 +0300 +++ new/bash-devel/eval.c 2022-07-08 11:00:01.670123000 +0300 @@ -58,6 +58,7 @@ { int our_indirection_level; COMMAND * volatile current_command; + int cmd_result ; USE_VAR(current_command); @@ -168,7 +169,12 @@ executing = 1; stdin_redir = 0; - execute_command (current_command); + cmd_result = execute_command (current_command); + /* with errfail, failure in commands (non-zero status) + will terminate execution. */ + if ( errfail_opt && cmd_result != EXIT_SUCCESS ) { + EOF_Reached = EOF ; + } ; exec_done: QUIT; diff -ur orig/bash-devel/execute_cmd.c new/bash-devel/execute_cmd.c --- orig/bash-devel/execute_cmd.c 2022-06-28 23:15:24.000000000 +0300 +++ new/bash-devel/execute_cmd.c 2022-07-08 11:46:05.731181500 +0300 @@ -2749,7 +2749,7 @@ QUIT; #if 1 - execute_command (command->value.Connection->first); + exec_result = execute_command (command->value.Connection->first); #else execute_command_internal (command->value.Connection->first, asynchronous, pipe_in, pipe_out, @@ -2757,10 +2757,15 @@ #endif QUIT; - optimize_connection_fork (command); /* XXX */ - exec_result = execute_command_internal (command->value.Connection->second, + + /* With errfail, the ';' is similar to '&&' */ + /* Execute the second part, only if first part was OK */ + if ( !errfail_opt || exec_result == EXECUTION_SUCCESS ) { + optimize_connection_fork (command); /* XXX */ + exec_result = execute_command_internal (command->value.Connection->second, asynchronous, pipe_in, pipe_out, fds_to_close); + } ; executing_list--; break; @@ -2992,6 +2997,11 @@ if (continuing) break; } + + // with errfail - execute a break if the body failed + if ( errfail_opt && retval != EXECUTION_SUCCESS ) { + break ; + } ; } loop_level--; @@ -3164,16 +3174,21 @@ break; } + // with errfail: assume a break if the body failed + if ( errfail_opt && body_status != EXECUTION_SUCCESS ) { + break ; + } ; + /* Evaluate the step expression. */ line_number = arith_lineno; expresult = eval_arith_for_expr (arith_for_command->step, &expok); line_number = save_lineno; if (expok == 0) - { - body_status = EXECUTION_FAILURE; - break; - } + { + body_status = EXECUTION_FAILURE; + break; + } } loop_level--; @@ -3734,6 +3749,12 @@ if (continuing) break; } + + // with errfail: assume a break if the body failed + if ( errfail_opt && body_status != EXECUTION_SUCCESS ) { + break ; + } ; + } loop_level--; diff -ur orig/bash-devel/flags.c new/bash-devel/flags.c --- orig/bash-devel/flags.c 2022-06-28 23:15:24.000000000 +0300 +++ new/bash-devel/flags.c 2022-07-07 15:51:14.899947500 +0300 @@ -156,6 +156,15 @@ with a 0 status, the status of the pipeline is 0. */ int pipefail_opt = 0; +/* Non-zero means that when executing connected commands (';' or new lines) + the sequence will be stopped when any individual commands return a non-zero + status. Similar to '&&'. Also, failure on the loop body will result + in breaking the loop (while, until, for, arith-for). + + Used for improved error handling */ + +int errfail_opt = 0; + /* **************************************************************** */ /* */ /* The Flags ALIST. */ diff -ur orig/bash-devel/flags.h new/bash-devel/flags.h --- orig/bash-devel/flags.h 2022-06-28 23:15:24.000000000 +0300 +++ new/bash-devel/flags.h 2022-07-07 15:51:38.165941100 +0300 @@ -48,7 +48,7 @@ echo_command_at_execute, noclobber, hashing_enabled, forced_interactive, privileged_mode, jobs_m_flag, asynchronous_notification, interactive_comments, no_symbolic_links, - function_trace_mode, error_trace_mode, pipefail_opt; + function_trace_mode, error_trace_mode, pipefail_opt, errfail_opt ; /* -c, -s invocation options -- not really flags, but they show up in $- */ extern int want_pending_command, read_from_stdin;