Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Alex Ameen
Yeah honestly splitting most of the `configure` checks into multiple
threads is definitely possible.

Caching between projects is even a straightforward extension with systems
like `Nix`.

The "gotcha" here in both cases is that existing scripts that are living in
source tarballs are not feasible to "regenerate" in the general case. You
could have this ship out with future projects though if project authors
updated to new versions of Autoconf.


If you have a particularly slow package, you can optimize it in a few
hours. Largely this means "identify which tests 100% match the standard
implementation of a check" in which case you can fill in a cached value.
But what I think y'all are asking about is "can I safely use a cache from
one project in another project?" and the answer there is "no not really -
and please don't because it will be a nightmare to debug".

The nasty part about trying to naively share caches is that it will
probably work fine ~90% of the time. The problem is that the 10% that
misbehave are high risk for undefined behavior. My concern is the 0.5% that
appear to work fine, but "whoops we didn't know project X extended a macro
without changing the name - and now an ABI conflict in `gpgp` appears on
the third Sunday of every October causing it skip encryption silently" or
some absurd edge case.


I think optimizating "freshly generated" scripts is totally doable though.

On Mon, Jun 13, 2022, 5:40 PM Paul Eggert  wrote:

> In many Gnu projects, the 'configure' script is the biggest barrier to
> building because it takes s long to run. Is there some way that we
> could improve its performance without completely reengineering it, by
> improving Bash so that it can parallelize 'configure' scripts?
>
> For ideas about this, please see PaSh-JIT:
>
> Kallas K, Mustafa T, Bielak J, Karnikis D, Dang THY, Greenberg M,
> Vasilakis N. Practically correct, just-in-time shell script
> parallelization. Proc OSDI 22. July 2022.
> https://nikos.vasilak.is/p/pash:osdi:2022.pdf
>
> I've wanted something like this for *years* (I assigned a simpler
> version to my undergraduates but of course it was too much to expect
> them to implement it) and I hope some sort of parallelization like this
> can get into production with Bash at some point (or some other shell if
> Bash can't use this idea).
>
>


Re: Parallelization of shell scripts for 'configure' etc.

2022-06-13 Thread Alex Ameen
You can try to use the `requires` toposort routine to identify "Strongly
Connected Sub-Components", which is where I imagine you'll get the
best results. What you'll need to watch out for is undeclared ordering
requirements that parallelism would break.

The `m4sh` and `m4sugar` source code is documented in a lot of detail. The
manuals exclude that type of documentation because it's internal; but you
could keep yourself occupied for at least a month or two before you ran out
of topics to explore.

On Mon, Jun 13, 2022, 8:45 PM Dale R. Worley  wrote:

> Paul Eggert  writes:
> > In many Gnu projects, the 'configure' script is the biggest barrier to
> > building because it takes s long to run. Is there some way that we
> > could improve its performance without completely reengineering it, by
> > improving Bash so that it can parallelize 'configure' scripts?
>
> It seems to me that bash provides the needed tools -- "( ... ) &" lets
> you run things in parallel.  Similarly, if you've got a lot of small
> tasks with a complex web of dependencies, you can encode that in a
> "makefile".
>
> It seems to me that the heavy work is rebuilding how "configure" scripts
> are constructed based on which items can be run in parallel.  I've never
> seen any "metadocumentation" that laid out how all that worked.
>
> Dale
>
>


Re: Parallelization of shell scripts for 'configure' etc.

2022-07-08 Thread Alex Ameen
I've been telling folks about the config site file every time this thread
comes up. Good on you for actually trying it haha.

It can make a huge difference. You can short circuit a lot of checks this
way.

Now, the disclaimer: you still shouldn't share a cache file between
projects, and if you use a `config.site` don't stash internal values. Be
sure you keep an eye on your `config.site` values as your system is updated
over time, and if you use containers or build in chroots keep in mind how
that can effect the validity of your cache and `config.site` settings.



On Fri, Jul 8, 2022, 3:05 PM Simon Josefsson via Discussion list for the
autoconf build system  wrote:

> Tim Rühsen  writes:
>
> > a) The maintainer/contributor/hacker setup
> > This is when you re-run configure relatively often for the same
> project(s).
> > I do this normally and and came up with
> >
> https://gitlab.com/gnuwget/wget2/-/wikis/Developer-hints:-Increasing-speed-of-GNU-toolchain.
>
> > It may be a bit outdated, but may help one or the other here.
> > Btw, I am down to 2.5s for a ./configure run from 25s originally.
>
> Wow, I think more developers should known about your final suggestion:
>
>
> https://gitlab.com/gnuwget/wget2/-/wikis/Developer-hints:-Increasing-speed-of-GNU-toolchain#cccflags-dependent-usage-of-configure-caching
>
> That is, put this in ~/.bash_aliases:
>
> export CONFIG_SITE=~/src/config.site
>
> and this in ~/src/config.site:
>
> if test "$cache_file" = /dev/null; then
>   hash=`echo $CFLAGS $LDFLAGS $host_alias $build_alias|md5sum|cut -d' '
> -f1`
>   cache_file=~/src/config.cache.$CC.$hash
> fi
>
> The top of config.log says which cache file was used, so you can remove
> it when you hack on autoconf/M4 macros.
>
> This appears to save me tons of build time, and I'll run with this now
> since it is non-obtrusive and doesn't require changes in each project...
> maybe the CWD should be put into the cache_file string to avoid cache
> poisining between projects, but that is minor.
>
> /Simon
>