Re: Obsolete powerpc*-*-*spe*

2017-02-17 Thread Richard Biener
On Fri, Feb 17, 2017 at 1:10 AM, David Edelsohn  wrote:
> On Thu, Feb 16, 2017 at 3:53 PM, Sandra Loosemore
>  wrote:
>> On 02/16/2017 03:19 PM, Segher Boessenkool wrote:
>>>
>>> On Thu, Feb 16, 2017 at 02:49:47PM -0700, Sandra Loosemore wrote:
>
> I propose to mark powerpc*-*-*spe* as obsolete in GCC 7.  This includes
> the spe.h installed header file, all the __builtin_spe* intrinsics, the
> -mfloat-gprs= command-line option, and the support for the SPE ABIs.
>
> No one has properly tested these targets in a long time (the latest
> testresults I could find are from July 2015, >1000 failures), and the
> SPE support makes a lot of code much more complex.
>
> Any objections to this obsoletion?  GCC 7 will then be the last release
> with support for SPE (it will need --enable-obsolete to build these
> targets), and we will delete the SPE support during GCC 8 development.


 Can I ask that we hold off a bit before making a decision on this?
>>>
>>>
>>> Of course, that is what we're doing in any case.
>>>
>>> Note that obsoleting it in GCC 7 means GCC 7 will still work, and that
>>> we *can* remove it in GCC 8; we do not have to.  You have plenty of time
>>> to find some way to keep SPE support in GCC.  The obsoletion notice _is_
>>> the advance warning you're asking for.
>>>
>>> The gcc-7/changes.html text I'll propose later says:
>>>
>>>
>>>Support for a number of older systems and recently
>>>unmaintained or untested target ports of GCC has been declared
>>>obsolete in GCC 7.  Unless there is activity to revive them, the
>>>next release of GCC will have their sources permanently
>>>removed.
>>>
>>>The following ports for individual systems on
>>>particular architectures have been obsoleted:
>>>
>>>
>>>  PowerPC SPE (powerpc*-*-*spe*) as announced
>>>  https://gcc.gnu.org/ml/gcc/2017-02/msg00041.html";>
>>>  here.
>>>
>>>
>>
>>
>> I understand that you're not going to remove the SPE support tomorrow. But
>> that notice is going to scare users who depend on it, and I think it's not a
>> good idea to scare users unnecessarily.  AFAIK GCC 7 is not going to be
>> released tomorrow, either, so why not give folks a little more time to look
>> into alternatives to announcing the support is being obsoleted?  IMO that
>> should only be done when new maintainers have been solicited and nobody has
>> come forward.
>
> Sandra,
>
> This is not a new issue.  The maintainer did not suddenly resign last
> week.  There have been numerous efforts to reach out to the SPE
> community for over a *decade*, cajoling them to step up with
> maintenance for the port.  I am glad that this notice of obsolescence
> has focused more attention on the long-standing problem.

+1

I'd like us to be more agressive in deprecating/removing of unmaintained
parts of GCC.  It's not only target/host support but also things like
unmaintained
language extensions (or frontends) as well as optimization passes.

Richard.

> Thanks, David


Re: Obsolete powerpc*-*-*spe*

2017-02-17 Thread Janne Blomqvist
On Fri, Feb 17, 2017 at 11:19 AM, Richard Biener
 wrote:
> I'd like us to be more agressive in deprecating/removing of unmaintained
> parts of GCC.  It's not only target/host support but also things like
> unmaintained
> language extensions (or frontends) as well as optimization passes.

So... what about dropping i386 support? Steven Boscher suggested it 4
years ago following the Linux kernel dropping i386, but at that time
the discussion petered out without any firm conclusions either way.
See the thread starting at

https://gcc.gnu.org/ml/gcc/2012-12/msg00079.html

Or while we're at it, why not drop i486 too at the same time, unless
there are, well, users? That would additionally guarantee availability
of x87 (should we be happy or cry?), cpuid, cmpxchg8b, rdtsc.

-- 
Janne Blomqvist


Improving code generation in the nvptx back end

2017-02-17 Thread Thomas Schwinge
Hi!

I'm not all to familiar with the nvptx back end, and I keep forgetting
(and then later re-learning) a lot of PTX details, so please bear with
me...  I'd like to discuss/gather some ideas about how to improve
(whatever that may mean exactly) code generation in the nvptx back end.


We're currently looking into updating OpenACC "privatization"/"state
propagation" (between OpenACC gang, worker, and vector parallel regions)
according to how that got clarified in the OpenACC 2.5 standard.  So, not
considering to otherwise touch all this machinery until that task is
resolved.


Obviously, we can generally update the back end to generate code for
newer PTX/CC versions, adding new instructions, and all that.


On /
we're arguing that "as these would be difficult to implement due to the
constraints set by PTX itself, the GCC nvptx back end doesn't support
setjmp/longjmp, exceptions (?), alloca, computed goto, non-local goto,
for example".  We could improve on that, but that's probably not too
useful, given the desired use case for nvptx code generation, which is
OpenACC/OpenMP offloaded regions, which don't make use of such
functionality, typically.


The PTX code we generate will later be "JIT"-compiled by the CUDA driver,
so we're expecting that one to "clean up" a lot of stuff for us.

For example, PTX itself doesn't bound the number of registers, so we're
not currently doing any register allocation (and instead just emit all
"virtual" registers), and the PTX "JIT" compiler will then do the
register allocation, according to the actual target hardware
capabilities.  Of course, it remains a valid question, if GCC could do
better register allocation itself (because it has better knowledge of the
code structure, and doesn't have to reconstruct that), or if that would
in fact produce worse code/worse performance, because the PTX "JIT"
compiler then might not understand that code anymore.  I had the idea to
actually try this out, using some benchmarking code, without and with
(manual) register allocation (that is, basically, re-using existing
"dead" registers instead of allocating new ones).


Looking at some actual code.  Given:

$ cat < s.c
struct S { double d; int y; };

float f(int, struct S) __attribute__((noinline));
float f(int x, struct S s)
{
  if (x == s.y)
s.d = 0.;
  return s.d;
}

int main()
{
  struct S s;
  s.d = 1.;
  s.y = 2;
  if (f(2, s) != 0.)
__builtin_trap();
  if (f(1, s) != 1.)
__builtin_trap();

  return 0;
}

..., we currently produce the following "-O2" code:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ --sysroot=install/nvptx-none -Wall 
-Wextra s.c -O2 -mmainkernel
$ install/bin/nvptx-none-run a.out # launches, and completes normally
$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ --sysroot=install/nvptx-none -Wall 
-Wextra s.c -O2 -S
$ cat -n < s.s
 1  // BEGIN PREAMBLE
 2  .version3.1
 3  .target sm_30
 4  .address_size 64
 5  // END PREAMBLE
 6  
 7  
 8  // BEGIN GLOBAL FUNCTION DECL: f
 9  .visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, 
.param.u64 %in_ar1);
10  
11  // BEGIN GLOBAL FUNCTION DEF: f
12  .visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, 
.param.u64 %in_ar1)
13  {
14  .reg.f32 %value;
15  .reg.u32 %ar0;
16  ld.param.u32 %ar0, [%in_ar0];
17  .reg.u64 %ar1;
18  ld.param.u64 %ar1, [%in_ar1];
19  .reg.f64 %r23;
20  .reg.f32 %r24;
21  .reg.u32 %r25;
22  .reg.u64 %r26;
23  .reg.u32 %r27;
24  .reg.pred %r28;
25  mov.u32 %r25, %ar0;
26  mov.u64 %r26, %ar1;
27  ld.f64  %r23, [%r26];
28  ld.u32  %r27, [%r26+8];
29  setp.eq.u32 %r28, %r27, %r25;
30  @%r28   bra $L3;
31  cvt.rn.f32.f64  %r24, %r23;
32  bra $L1;
33  $L3:
34  mov.f32 %r24, 0f;
35  $L1:
36  mov.f32 %value, %r24;
37  st.param.f32[%value_out], %value;
38  ret;
39  }
40  
41  // BEGIN GLOBAL FUNCTION DECL: main
42  .visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
.param.u64 %in_ar1);
43  
44  // BEGIN GLOBAL FUNCTION DEF: main
45  .visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
.param.u64 %in_ar1)
46  {
47  .reg.u32 %value;
48  .local .align 8 .b8 %frame_ar[32];
49  .reg.u6

Re: Improving code generation in the nvptx back end

2017-02-17 Thread Thomas Schwinge
Hi!

On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:
> [...] for "normal" functions there is no reason to use the
> ".param" space for passing arguments in and out of functions.  We can
> then get rid of the boilerplate code to move ".param %in_ar*" into ".reg
> %ar*", and the other way round for "%value_out"/"%value".  This will then
> also simplify the call sites, where all that code "evaporates".  That's
> actually something I started to look into, many months ago, and I now
> just dug out those changes, and will post them later.
> 
> (Very likely, the PTX "JIT" compiler will do the very same thing without
> difficulty, but why not directly generate code that is less verbose to
> read?)

Using my WIP patch, the generated PTX code changes/is simplified as
follows:

 // BEGIN GLOBAL FUNCTION DECL: f
-.visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, .param.u64 
%in_ar1);
+.visible .func (.reg.f32 %value_out) f (.reg.u32 %ar0, .reg.u64 %ar1);

 // BEGIN GLOBAL FUNCTION DEF: f
-.visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, .param.u64 
%in_ar1)
+.visible .func (.reg.f32 %value_out) f (.reg.u32 %ar0, .reg.u64 %ar1)
 {
.reg.f32 %value;
-   .reg.u32 %ar0;
-   ld.param.u32 %ar0, [%in_ar0];
-   .reg.u64 %ar1;
-   ld.param.u64 %ar1, [%in_ar1];
.reg.f64 %r23;
.reg.f32 %r24;
.reg.u32 %r25;
@@ -34,15 +30,15 @@ $L3:
mov.f32 %r24, 0f;
 $L1:
mov.f32 %value, %r24;
-   st.param.f32[%value_out], %value;
+   mov.f32 %value_out, %value;
ret;
 }

 // BEGIN GLOBAL FUNCTION DECL: main
-.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
.param.u64 %in_ar1);
+.visible .func (.reg.u32 %value_out) main (.reg.u32 %ar0, .reg.u64 %ar1);

 // BEGIN GLOBAL FUNCTION DEF: main
-.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
.param.u64 %in_ar1)
+.visible .func (.reg.u32 %value_out) main (.reg.u32 %ar0, .reg.u64 %ar1)
 {
.reg.u32 %value;
.local .align 8 .b8 %frame_ar[32];
@@ -70,13 +66,9 @@ $L1:
st.u64  [%frame+24], %r29;
add.u64 %r31, %frame, 16;
{
-   .param.f32 %value_in;
-   .param.u32 %out_arg1;
-   st.param.u32 [%out_arg1], %r26;
-   .param.u64 %out_arg2;
-   st.param.u64 [%out_arg2], %r31;
-   call (%value_in), f, (%out_arg1, %out_arg2);
-   ld.param.f32%r32, [%value_in];
+   .reg.f32 %value_in;
+   call (%value_in), f, (%r26, %r31);
+   mov.f32 %r32, %value_in;
}
setp.eq.f32 %r33, %r32, 0f;
@%r33   bra $L5;
@@ -89,17 +81,13 @@ $L5:
st.u64  [%frame+24], %r36;
mov.u32 %r34, 1;
{
-   .param.f32 %value_in;
-   .param.u32 %out_arg1;
-   st.param.u32 [%out_arg1], %r34;
-   .param.u64 %out_arg2;
-   st.param.u64 [%out_arg2], %r31;
-   call (%value_in), f, (%out_arg1, %out_arg2);
-   ld.param.f32%r39, [%value_in];
+   .reg.f32 %value_in;
+   call (%value_in), f, (%r34, %r31);
+   mov.f32 %r39, %value_in;
}
setp.neu.f32%r40, %r39, 0f3f80;
@%r40   bra $L6;
mov.u32 %value, 0;
-   st.param.u32[%value_out], %value;
+   mov.u32 %value_out, %value;
ret;
 }

(Not yet directly using "%value_out" instead of the intermediate "%value"
register.)

Is such a patch something to pursue to completion?

--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -603,19 +603,32 @@ nvptx_promote_function_mode (const_tree type, 
machine_mode mode,
to an argument register and it is greater than zero if we're
copying to a specific hard register.  */
 
+static bool write_as_kernel (tree attrs);
 static int
 write_arg_mode (std::stringstream &s, int for_reg, int argno,
-   machine_mode mode)
+   machine_mode mode, const_tree decl)
 {
+  bool kernel = (decl != NULL_TREE) && write_as_kernel (DECL_ATTRIBUTES 
(decl));
   const char *ptx_type = nvptx_ptx_type_from_mode (mode, false);
 
   if (for_reg < 0)
 {
   /* Writing PTX prototype.  */
   s << (argno ? ", " : " (");
-  s << ".param" << ptx_type << " %in_ar" << argno;
+  if (kernel)
+   s << ".param" << ptx_type << " %in_ar" << argno;
+  else
+#if 0
+   s << ".reg" << ptx_type << " %in_ar" << argno;
+#else
+   s << ".reg" << ptx_type << " %ar" << argno;
+#endif
 }
+#if 0
   else
+#else
+  else if (kernel || for_reg)
+#endif
 

Re: Obsolete powerpc*-*-*spe*

2017-02-17 Thread Nathan Sidwell

On 02/17/2017 04:19 AM, Richard Biener wrote:


I'd like us to be more agressive in deprecating/removing of unmaintained
parts of GCC.  It's not only target/host support but also things like
unmaintained
language extensions (or frontends) as well as optimization passes.


If you want a compiler that supports what a previous version supported, 
you know where to find it :)


nathan

--
Nathan Sidwell


Re: Re: Improving code generation in the nvptx back end

2017-02-17 Thread Cesar Philippidis
On 02/17/2017 05:09 AM, Thomas Schwinge wrote:

> On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:
>> [...] for "normal" functions there is no reason to use the
>> ".param" space for passing arguments in and out of functions.  We can
>> then get rid of the boilerplate code to move ".param %in_ar*" into ".reg
>> %ar*", and the other way round for "%value_out"/"%value".  This will then
>> also simplify the call sites, where all that code "evaporates".  That's
>> actually something I started to look into, many months ago, and I now
>> just dug out those changes, and will post them later.
>>
>> (Very likely, the PTX "JIT" compiler will do the very same thing without
>> difficulty, but why not directly generate code that is less verbose to
>> read?)
> 
> Using my WIP patch, the generated PTX code changes/is simplified as
> follows:
> 
>  // BEGIN GLOBAL FUNCTION DECL: f
> -.visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, .param.u64 
> %in_ar1);
> +.visible .func (.reg.f32 %value_out) f (.reg.u32 %ar0, .reg.u64 %ar1);
> 
>  // BEGIN GLOBAL FUNCTION DEF: f
> -.visible .func (.param.f32 %value_out) f (.param.u32 %in_ar0, .param.u64 
> %in_ar1)
> +.visible .func (.reg.f32 %value_out) f (.reg.u32 %ar0, .reg.u64 %ar1)
>  {
> .reg.f32 %value;
> -   .reg.u32 %ar0;
> -   ld.param.u32 %ar0, [%in_ar0];
> -   .reg.u64 %ar1;
> -   ld.param.u64 %ar1, [%in_ar1];
> .reg.f64 %r23;
> .reg.f32 %r24;
> .reg.u32 %r25;
> @@ -34,15 +30,15 @@ $L3:
> mov.f32 %r24, 0f;
>  $L1:
> mov.f32 %value, %r24;
> -   st.param.f32[%value_out], %value;
> +   mov.f32 %value_out, %value;
> ret;
>  }
> 
>  // BEGIN GLOBAL FUNCTION DECL: main
> -.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
> .param.u64 %in_ar1);
> +.visible .func (.reg.u32 %value_out) main (.reg.u32 %ar0, .reg.u64 %ar1);
> 
>  // BEGIN GLOBAL FUNCTION DEF: main
> -.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, 
> .param.u64 %in_ar1)
> +.visible .func (.reg.u32 %value_out) main (.reg.u32 %ar0, .reg.u64 %ar1)
>  {
> .reg.u32 %value;
> .local .align 8 .b8 %frame_ar[32];
> @@ -70,13 +66,9 @@ $L1:
> st.u64  [%frame+24], %r29;
> add.u64 %r31, %frame, 16;
> {
> -   .param.f32 %value_in;
> -   .param.u32 %out_arg1;
> -   st.param.u32 [%out_arg1], %r26;
> -   .param.u64 %out_arg2;
> -   st.param.u64 [%out_arg2], %r31;
> -   call (%value_in), f, (%out_arg1, %out_arg2);
> -   ld.param.f32%r32, [%value_in];
> +   .reg.f32 %value_in;
> +   call (%value_in), f, (%r26, %r31);
> +   mov.f32 %r32, %value_in;
> }
> setp.eq.f32 %r33, %r32, 0f;
> @%r33   bra $L5;
> @@ -89,17 +81,13 @@ $L5:
> st.u64  [%frame+24], %r36;
> mov.u32 %r34, 1;
> {
> -   .param.f32 %value_in;
> -   .param.u32 %out_arg1;
> -   st.param.u32 [%out_arg1], %r34;
> -   .param.u64 %out_arg2;
> -   st.param.u64 [%out_arg2], %r31;
> -   call (%value_in), f, (%out_arg1, %out_arg2);
> -   ld.param.f32%r39, [%value_in];
> +   .reg.f32 %value_in;
> +   call (%value_in), f, (%r34, %r31);
> +   mov.f32 %r39, %value_in;
> }
> setp.neu.f32%r40, %r39, 0f3f80;
> @%r40   bra $L6;
> mov.u32 %value, 0;
> -   st.param.u32[%value_out], %value;
> +   mov.u32 %value_out, %value;
> ret;
>  }
> 
> (Not yet directly using "%value_out" instead of the intermediate "%value"
> register.)
> 
> Is such a patch something to pursue to completion?

Are you trying to optimize acc routines in general? I'm not sure how
frequently they are used at the moment.

Also, while .param values may be overkill for routines, they are
addressable. Looking at section 5.1.6.1 in the PTX reference manual, you
can have something like this:

.entry foo ( .param .b32 N, .param .align 8 .b8 buffer[64] )
{
  .reg .u32 %n;
  .reg .f64 %d;
  ld.param.u32 %n, [N];
  ld.param.f64
  ...

Granted, this is an entry function to be called from the host, but the
same usage is applicable inside routines.

This gives me an idea. While working on the firstprivate changes, I
noticed that GCC packs all of the offloaded function arguments into a
structure, which the nvptx run time plugin uploads to a special data
mapping prior to calling cuLaunchKernel. That's inefficient in
application that launch a

Fwd: [patch, contrib] Add support to install libcaf-mpi for multi-image coarray Fortran

2017-02-17 Thread Jerry DeLisle
I forgot to copy gcc list for this request for comments and approval to commit.

Regards,

Jerry


 Forwarded Message 
Subject: [patch, contrib] Add support to install libcaf-mpi for multi-image
coarray Fortran
Date: Wed, 15 Feb 2017 22:20:49 -0800
From: Jerry DeLisle 
To: GCC Patches , fort...@gcc.gnu.org 


The attached patch adds a new subdirectory called mk-libcaf-multi under contrib
which contains scripts which will download OpenCoarrays, build libcaf-mpi.a, and
install it in the user provided --install-prefix.

As given the script is only manually executed by a user interested in doing so.

Eventually we would like to fully integrate the build of libcaf-mpi into gcc to
provide full multi-image support for gfortran.  These scripts provide an
intermediate means of doing so, bringing gfortran pretty close to full Fortran
2008 and 2015 standards.

Providing this will greatly expand user testing and development of gfortran
based CoArray Fortran (CAF) and simplify for users enabling this modern feature.

Tested on linux-x86-64 (Fedora 25) and MAC Darwin. Requires the user to
previously have installed  mpich for the mpi library. The build uses cmake and
bash 3 scripts to enable a lot of useful argument checking and diagnostics.

For those not familiar with Coarrays in Fortran and want or need to explore
these advanced features, using this script is a very helful way to get started.

Others may chime in with comments or questions.

OK for trunk?

Regards,

Jerry


diff --git a/contrib/mk-libcaf-multi/mk-libcaf-multi.sh b/contrib/mk-libcaf-multi/mk-libcaf-multi.sh
new file mode 100755
index 000..753b035
--- /dev/null
+++ b/contrib/mk-libcaf-multi/mk-libcaf-multi.sh
@@ -0,0 +1,269 @@
+#!/usr/bin/env bash
+
+#  Copyright (C) 2017 Free Software Foundation, Inc.
+#  Contributed by Jerry DeLisle in collaboration with Damian Rousan.
+#
+# This file is part of the GNU Fortran runtime library (libgfortran).
+#
+# Libgfortran is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# Libgfortran is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# .
+
+# mk-multi-image.sh
+#
+# --- This script downloads and installs OpenCoarrays to directly support mult-image 
+# execution in libgfortran. Execute this script the last step of the libgfortran 
+# make install.
+
+# Portions of this script derive from or call sub-scripts of BASH3 Boilerplate. See
+# the B3B_USE_CASE subdirectory for the substantial portions of the Software and the
+# required permission notices of the MIT License (MIT).
+
+
+export LIBGFORTRAN_SRC_DIR="${LIBGFORTRAN_SRC_DIR:-${PWD%/}}"
+if [[ ! -d "${LIBGFORTRAN_SRC_DIR}/caf" ]]; then
+  echo "File not found: ${LIBGFORTRAN_SRC_DIR}/caf"
+  echo "Please run this script inside the libgfortran source directory or "
+  echo "set LIBGFORTRAN_SRC_DIR to the libgfortran source directory path."
+  exit 1
+fi
+export B3B_USE_CASE="${B3B_USE_CASE:-${LIBGFORTRAN_SRC_DIR}/../contrib/mk-libcaf-multi/utils}"
+if [[ ! -f "${B3B_USE_CASE:-}/bootstrap.sh" ]]; then
+  echo "Please set B3B_USE_CASE to the bash3boilerplate utils directory path."
+  exit 2
+else
+  source "${B3B_USE_CASE}/bootstrap.sh" "$@"
+fi
+
+# Set expected value of present flags that take no arguments
+export __flag_present=1
+
+if [[ "${arg_D}" == "${__flag_present}" || "${arg_L}" == "${__flag_present}" || "${arg_U}" == "${__flag_present}" || ${arg_V}"" == "${__flag_present}" ]]; then 
+   print_debug_only=7
+   if [ "$(( LOG_LEVEL < print_debug_only ))" -ne 0 ]; then
+ debug "Supressing info and debug messages: -v present."
+ suppress_info_debug_messages
+#export LOG_LEVEL=5
+   fi
+fi
+
+# If one of the --print-* arguments is present (or its single-letter equivalanet), we 
+# print its value and exit normally.  Here we print with echo instead of a B3B function u
+# because the output might be used in an assignment to a variable in another script e.g., 
+# version=`mk-multi-image.sh -V`
+
+if [[ "${arg_L}" == "${__flag_present}" ]]; then
+  echo "mpich"
+  exit 0
+fi
+
+# Set the variable 'fetch' to invoke an available downloader utility.
+source ${B3B_USE_CASE}/set_or_print_downloader.sh
+set_or_print_downloader
+if [[ "${arg_D}" == "${_