Re: Parallelization of shell scripts for 'configure' etc.
Yeah honestly splitting most of the `configure` checks into multiple threads is definitely possible. Caching between projects is even a straightforward extension with systems like `Nix`. The "gotcha" here in both cases is that existing scripts that are living in source tarballs are not feasible to "regenerate" in the general case. You could have this ship out with future projects though if project authors updated to new versions of Autoconf. If you have a particularly slow package, you can optimize it in a few hours. Largely this means "identify which tests 100% match the standard implementation of a check" in which case you can fill in a cached value. But what I think y'all are asking about is "can I safely use a cache from one project in another project?" and the answer there is "no not really - and please don't because it will be a nightmare to debug". The nasty part about trying to naively share caches is that it will probably work fine ~90% of the time. The problem is that the 10% that misbehave are high risk for undefined behavior. My concern is the 0.5% that appear to work fine, but "whoops we didn't know project X extended a macro without changing the name - and now an ABI conflict in `gpgp` appears on the third Sunday of every October causing it skip encryption silently" or some absurd edge case. I think optimizating "freshly generated" scripts is totally doable though. On Mon, Jun 13, 2022, 5:40 PM Paul Eggert wrote: > In many Gnu projects, the 'configure' script is the biggest barrier to > building because it takes s long to run. Is there some way that we > could improve its performance without completely reengineering it, by > improving Bash so that it can parallelize 'configure' scripts? > > For ideas about this, please see PaSh-JIT: > > Kallas K, Mustafa T, Bielak J, Karnikis D, Dang THY, Greenberg M, > Vasilakis N. Practically correct, just-in-time shell script > parallelization. Proc OSDI 22. July 2022. > https://nikos.vasilak.is/p/pash:osdi:2022.pdf > > I've wanted something like this for *years* (I assigned a simpler > version to my undergraduates but of course it was too much to expect > them to implement it) and I hope some sort of parallelization like this > can get into production with Bash at some point (or some other shell if > Bash can't use this idea). > >
Re: Parallelization of shell scripts for 'configure' etc.
You can try to use the `requires` toposort routine to identify "Strongly Connected Sub-Components", which is where I imagine you'll get the best results. What you'll need to watch out for is undeclared ordering requirements that parallelism would break. The `m4sh` and `m4sugar` source code is documented in a lot of detail. The manuals exclude that type of documentation because it's internal; but you could keep yourself occupied for at least a month or two before you ran out of topics to explore. On Mon, Jun 13, 2022, 8:45 PM Dale R. Worley wrote: > Paul Eggert writes: > > In many Gnu projects, the 'configure' script is the biggest barrier to > > building because it takes s long to run. Is there some way that we > > could improve its performance without completely reengineering it, by > > improving Bash so that it can parallelize 'configure' scripts? > > It seems to me that bash provides the needed tools -- "( ... ) &" lets > you run things in parallel. Similarly, if you've got a lot of small > tasks with a complex web of dependencies, you can encode that in a > "makefile". > > It seems to me that the heavy work is rebuilding how "configure" scripts > are constructed based on which items can be run in parallel. I've never > seen any "metadocumentation" that laid out how all that worked. > > Dale > >
Re: Parallelization of shell scripts for 'configure' etc.
I've been telling folks about the config site file every time this thread comes up. Good on you for actually trying it haha. It can make a huge difference. You can short circuit a lot of checks this way. Now, the disclaimer: you still shouldn't share a cache file between projects, and if you use a `config.site` don't stash internal values. Be sure you keep an eye on your `config.site` values as your system is updated over time, and if you use containers or build in chroots keep in mind how that can effect the validity of your cache and `config.site` settings. On Fri, Jul 8, 2022, 3:05 PM Simon Josefsson via Discussion list for the autoconf build system wrote: > Tim Rühsen writes: > > > a) The maintainer/contributor/hacker setup > > This is when you re-run configure relatively often for the same > project(s). > > I do this normally and and came up with > > > https://gitlab.com/gnuwget/wget2/-/wikis/Developer-hints:-Increasing-speed-of-GNU-toolchain. > > > It may be a bit outdated, but may help one or the other here. > > Btw, I am down to 2.5s for a ./configure run from 25s originally. > > Wow, I think more developers should known about your final suggestion: > > > https://gitlab.com/gnuwget/wget2/-/wikis/Developer-hints:-Increasing-speed-of-GNU-toolchain#cccflags-dependent-usage-of-configure-caching > > That is, put this in ~/.bash_aliases: > > export CONFIG_SITE=~/src/config.site > > and this in ~/src/config.site: > > if test "$cache_file" = /dev/null; then > hash=`echo $CFLAGS $LDFLAGS $host_alias $build_alias|md5sum|cut -d' ' > -f1` > cache_file=~/src/config.cache.$CC.$hash > fi > > The top of config.log says which cache file was used, so you can remove > it when you hack on autoconf/M4 macros. > > This appears to save me tons of build time, and I'll run with this now > since it is non-obtrusive and doesn't require changes in each project... > maybe the CWD should be put into the cache_file string to avoid cache > poisining between projects, but that is minor. > > /Simon >