Re: test '-v' - associative vs. normal array discrepancy - a bug ?

Greg Wooledge Fri, 21 Nov 2014 13:44:54 -0800

On Fri, Nov 21, 2014 at 06:21:30AM +0800, konsolebox wrote:
> On Thu, Nov 20, 2014 at 4:47 AM, Greg Wooledge <wool...@eeg.ccf.org> wrote:
> > Passing the name of a variable has NEVER worked in ANY function for ANY
> > purpose in bash.
> 
> I hope you're actually not generalizing but you seriously have to reconsider
> that.

Well, I did overstate the problem slightly.  There are some scenarios
where you can successfully pass the name of a variable as an argument
to a function, but they are limited, and you must handle everything
precisely right.

None of the obvious or elegant approaches will work without extra safety
checks and byzantine constructions.

> This is just one:
> https://sourceforge.net/p/playshell/code/ci/master/tree/source/filter.sh

OK, right off the bat, you are using an undocumented HACK:

function filter {
        local REF="$1[@]" X 

I don't remember whether we ever got official word from Chet on whether
this syntax would continue to be supported in the future.  It seems to
me more of an accidental misfeature that people have latched onto because
there is no other way to get things done.  (Besides eval.)

The fact that one must use this undocumented hack (which may or may not
magically vanish the next time Chet tightens up the grammar) in order
to retrieve values from an array whose name is passed as an argument is
already one strike against it.

> "eval" is not synonymous to security holes.  There are many ways to sanitize 
> it
> (if necessarily needed).  Namerefs (declare -n in 4.3) and indirect variable
> references for example can avoid referencing invalid variable names simply by
> checking them with a pattern.

In my previous message I demonstrated why declare -n is just as insecure
as eval.  In fact it's worse, because:

 1) There's always one name that you cannot pass as an argument (the same
    name that the ref is planning to use).

 2) It gives the ILLUSION that it works like ksh's nameref, and thus the
    ILLUSION of safety, which lulls the user into thinking he won't have
    to worry about security.

> > This is precisely why I tell people not to try to write "libraries" of
> > reusable code in bash.  You just can't do it.  It's pointless to try.
> 
> Sorry but it's not.  Or at least most of the time it's not always.

There are some simple things you can write reusably, without doing
backflips.  For example, I have a function that "returns" a random number:

# Pick unbiased random number from 0 to N-1 ($1 = N)
# Returns value in global variable r.
rand() {
  local max=$((32768 / $1 * $1))
  while (( (r=$RANDOM) >= max )); do :; done
  r=$(( r % $1 ))
}

This is right around the complexity limit that I would consider reasonable
for a bash function.  Anything beyond this level, and you're really getting
into hot water.

> On Thu, Nov 20, 2014 at 5:14 AM, Greg Wooledge <wool...@eeg.ccf.org> wrote:
> > This is because you DO NOT WRITE LIBRARIES IN BASH.  You can't.  This
> > issue is just one of so very many pitfalls.  It's an intractable problem.
> 
> Granting that it's always the library writer that should care about how the
> functions should be called.  The scripter also has the responsibility of 
> making
> sure that the type of argument the script is passing to the function is
> correct.

There's another concern in shell scripts, though, which is absent in many
other languages: arbitrary code execution.  It's not just a matter of
whether the library function gives the correct results, or a reasonable
error message if you pass the "wrong thing" to it.  If you try to
implement a complex function in bash (or almost any other shell), the
odds are the user can BREAK THINGS (cause arbitrary code execution) by
passing the "wrong thing" as an argument.

There is a whole extra class of issues that one must worry about, and
it's REALLY difficult to write a function that not only performs the
desired task correctly, but also does so without introducing new attack
vectors against the operating system.

I gave this example in my previous message, but it bears repeating:

blackbox() {
  declare -n arg=$1
  printf '%s\n' "$arg"
}

This looks completely innocuous, but it's NOT.  declare -n is a trap
for the poor unwary programmer.

imadev:~$ blackbox 'a[b=$(date)]'
bash: b=Fri Nov 21 15:55:58 EST 2014: syntax error in expression (error token 
is "Nov 21 15:55:58 EST 2014")

If you want to use declare -n, you have to write several lines of safety
checking every time:

blackbox() {
  if [[ $1 = arg ]]; then
    echo "blackbox: 'arg' is not permitted, because reasons" >&2
    return 1
  elif [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
    echo "blackbox: argument must be a non-indexed identifier" >&2
    return 1
  fi
  declare -n arg=$1
  ...
}

There's simply no other way to be secure.  Every feature that bash has
which even comes close to this has the same issues.

imadev:~$ var='a[b=$(date)]'
imadev:~$ printf -v "$var" %s foo
bash: b=Fri Nov 21 16:00:51 EST 2014: syntax error in expression (error token 
is "Nov 21 16:00:51 EST 2014")

So, printf -v isn't safe either.  You have to sanitize the argument of -v
for exactly the same reason you have to sanitize the variable pointed to
by declare -n, or any variable that you expand with eval.

If you don't, you introduce an opportunity for arbitrary code execution.

So, if you want to write a function that populates a (scalar!) variable
of the caller's choosing, here's one way to do it:

# Add a bunch of integers and store the result
# varname may not be __i or __sum
addemup() {
  (($#)) || { echo "usage: addemup varname [int ...]" >&2; return 1; }
  # optionally check for $1 = __i or __sum
  if [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
    echo "addemup: varname must be a non-indexed identifier" >&2; return 1
  fi
  local __sum=0 __i
  for __i in "${@:2}"; do ((__sum+=__i)); done
  eval "$1=\$__sum"
}

And sure enough, you mustn't pass __i or __sum to it:

imadev:~$ addemup __i 1 3
imadev:~$ echo $__i

The problem is bash has no way to refer to a variable in the caller's
scope.  As soon as you have any local variable, it shadows the caller's
variable of the same name.

The only way around this limitation that I have EVER heard of is
described by Freddy Vulto at
http://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference

I have read that page a few times, and I STILL don't understand how it
works.  Nevertheless, here is a solution using Freddy's helper function:

# Assign variable one scope above the caller
# Usage: local "$1" && upvar $1 "value(s)"
# Param: $1  Variable name to assign value to
# Param: $*  Value(s) to assign.  If multiple values, an array is
#            assigned, otherwise a single value is assigned.
# NOTE: For assigning multiple variables, use 'upvars'.  Do NOT
#       use multiple 'upvar' calls, since one 'upvar' call might
#       reassign a variable to be used by another 'upvar' call.
# Example: 
#
#    f() { local b; g b; echo $b; }
#    g() { local "$1" && upvar $1 bar; }
#    f  # Ok: b=bar
#
upvar() {
    if unset -v "$1"; then           # Unset & validate varname
        if (( $# == 2 )); then
            eval $1=\"\$2\"          # Return single value
        else
            eval $1=\(\"\${@:2}\"\)  # Return array
        fi
    fi
}

addemup() {
    (($#)) || { echo "usage: addemup varname [int ...]" >&2; return 1; }
    if [[ $1 != [[:alpha:]_]*([[:alnum:]_]) ]]; then
        echo "addemup: varname must be a non-indexed identifier" >&2; return 1
    fi
    local sum=0 i
    for i in "${@:2}"; do ((sum+=i)); done
    local "$1" && upvar "$1" "$sum"
}

It's pure voodoo!  You need a dozen lines of code, including a function
that no normal person can understand that comes with its own warning
label, just to put a value into a variable safely!

Why would you do this to yourself?  Just switch to a real programming
language.

Re: test '-v' - associative vs. normal array discrepancy - a bug ?

Reply via email to