Re: [PATCH] maintainer-scripts: add gen_gcc_docs.sh

Mark Wielaard Sun, 07 Sep 2025 10:52:51 -0700

Hi Arsen,

On Fri, Sep 05, 2025 at 01:15:07PM +0200, Arsen Arsenović wrote:
> This script generates docs to match https://gcc.gnu.org/onlinedocs/ in a
> way that's easy to invoke by anyone (i.e. in a way that doesn't depend
> on the specifics of the server hosting gcc.gnu.org).  It is based on
> generate_libstdcxx_web_docs and update_web_docs_* and intends to replace
> them eventually, as a single source of truth for building all docs.  For
> now, it can be used for the snapshots server.
> 
> To use, run the script, passing it a source directory and a directory
> where to store the results.
> 
> maintainer-scripts/ChangeLog:
> 
>       * gen_gcc_docs.sh: New file.
> ---
> The intention is to initially use this script to generate snapshots, and
> later all documentations, for releases and for the nightly updates, as a
> "single source of truth" that doesn't require any special setup on the
> machine running it.
> 
> Does such a change to the release workflow sound OK?


I think it is a good thing to make it easier for users to regenerate
the documentation locally for offline usage. And it would be helpful
to check documentation generation work by using is as a snapshot
builder and/or an Action that could be run on a merge request in the
forge.

We don't have to change the workflow to generate the online docs, it
could still be done through a cron job. But if we can use the same
script to generate them also locally, through a snapshots builder and
maybe a merge request Action on the forge that would be great. Then
when that works, we can decide whether to change the actual mechanism.

I used the script to create a gcc docs snapshot builder:
https://snapshots.sourceware.org/gcc/docs/
https://builder.sourceware.org/buildbot/#/builders/gcc-snapshots

I had to add the following packages to the fedora-latest container:

  mandoc docbook5-style-xsl doxygen graphviz dblatex libxml2 libxslt
  texlive-latex texlive-makeindex texinfo texinfo-tex python3-sphinx
  groff-base groff-perl texlive-hanging texlive-adjustbox
  texlive-stackengine texlive-tocloft texlive-newunicodechar

Might be good to document that somewhere. Also not everything is
checked for so when you are missing some packages things might just
break half-way through.

I am not sure what to do about the CSS. It would be way nicer if that
was also embedded in the source instead of relying on an external URL
or repository.

Also it would be nice if there was a little top-level index.html.
Maybe a snippet like at the end of
https://gcc.gnu.org/onlinedocs/index.html (Current development)?

Some comments on the actual script below.

>  maintainer-scripts/gen_gcc_docs.sh | 391 +++++++++++++++++++++++++++++
>  1 file changed, 391 insertions(+)
>  create mode 100755 maintainer-scripts/gen_gcc_docs.sh
> 
> diff --git a/maintainer-scripts/gen_gcc_docs.sh 
> b/maintainer-scripts/gen_gcc_docs.sh
> new file mode 100755
> index 000000000000..c10733d21da2
> --- /dev/null
> +++ b/maintainer-scripts/gen_gcc_docs.sh
> @@ -0,0 +1,391 @@
> +#!/usr/bin/bash
> +#
> +# Copyright (C) 2025 Free Software Foundation, Inc.
> +#
> +# This script is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3, or (at your option)
> +# any later version.
> +
> +# Usage: gen_gcc_docs.sh [srcdir] [outdir]
> +#
> +# Generates and outputs GCC documentation to [outdir].
> +#
> +# Impacted by a few environment variables:
> +# - BUGURL :: The bug URL to insert into the manuals.
> +# - CSS :: URL to pass as the CSS reference in HTML manuals.
> +# - BRANCH :: Documentation branch to build.  Defaults to git default.
> +# - TEXI2DVI, TEXI2PDF, MAKEINFO, SPHINXBUILD :: Names of the respective 
> tools.
> +
> +# Based on update_web_docs_git and generate_libstdcxx_web_docs.
> +
> +MANUALS=(
> +  cpp
> +  cppinternals
> +  fastjar

fastjar brings back memories, but I believe we haven't shipped it in
15 years.

> +  gcc
> +  gccgo
> +  gccint
> +  gcj

Likewise for gcj

> +  gdc
> +  gfortran
> +  gfc-internals
> +  gm2
> +  gnat_ugn
> +  gnat-style
> +  gnat_rm
> +  libgomp
> +  libitm
> +  libquadmath
> +  libiberty
> +  porting

Isn't porting part of libstdc++ now?

> +)

So jit, libstdc++ and gcobol are their own thing?

Why is libffi bot included?

> +die() {
> +  echo "fatal error ($?)${*+: }$*" >&2
> +  exit 1
> +}
> +
> +v() {
> +  echo "+ $*" >&2
> +  "$@"
> +}
> +export -f v die
> +
> +# Check arguments.
> +[[ $1 ]] \
> +  || die "Please specify the source directory as the first argument"
> +srcdir="$1"
> +if ! [[ $srcdir = /* ]]; then
> +  srcdir="$(pwd)/${srcdir}"
> +fi
> +
> +[[ $2 ]] \
> +  || die "Please specify the output directory as the directory argument"
> +outdir="$2"
> +if ! [[ $outdir = /* ]]; then
> +  outdir="$(pwd)/${outdir}"
> +fi

OK, makes them required and absolute paths.

> +## Find build tools.
> +# The gccadmin home directory contains a special build of Texinfo that has
> +# support for copyable anchors.  Find it.
> +makeinfo_git=/home/gccadmin/texinfo/install-git/bin/
> +if [ -x "${makeinfo_git}"/makeinfo ]; then
> +  : "${MAKEINFO:=${makeinfo_git}/makeinfo}"
> +  : "${TEXI2DVI:=${makeinfo_git}/texi2dvi}"
> +  : "${TEXI2PDF:=${makeinfo_git}/texi2pdf}"
> +else
> +  : "${MAKEINFO:=makeinfo}"
> +  : "${TEXI2DVI:=texi2dvi}"
> +  : "${TEXI2PDF:=texi2pdf}"
> +fi
> +
> +py_venv_bin=/home/gccadmin/venv/bin
> +# Similarly, it also has a virtualenv that contains a more up-to-date Sphinx.
> +if [ -x "${py_venv_bin}"/sphinx-build ]; then
> +  : "${SPHINXBUILD:=${py_venv_bin}/sphinx-build}"
> +else
> +  : "${SPHINXBUILD:=sphinx-build}"
> +fi
> +export MAKEINFO TEXI2DVI TEXI2PDF SPHINXBUILD

Do we really need that special case hardcoded /home/gccadmin/...?
Can't we just require those bin dirst are just prepended to PATH
before invoking the script or that they set the special case TOOL env
variables?

> +# Check for the programs.
> +for i in \
> +  doxygen dot dblatex pdflatex makeindex "${MAKEINFO}" "${TEXI2DVI}" \
> +          "${TEXI2PDF}" "${SPHINXBUILD}"; do
> +  echo >&2 -n "Checking for ${i##*/}... "
> +  type >&2 -P "$i" && continue
> +  echo >&2 "not found"
> +  exit 1
> +done

Maybe at least add mandoc? xsltproc? groff? check that groff can
generate PDF? That all required latex packages are installed?

> +# Set sane defaults.
> +: "${BUGURL:=https://gcc.gnu.org/bugs/}";
> +: "${CSS:=/texinfo-manuals.css}" # https://gcc.gnu.org/texinfo-manuals.css
> +export CSS BUGURL

Maybe include that css in the sources so it is standalone by default?

> +v mkdir -p "${outdir}" || die "Failed to create the output directory"
> +
> +workdir="$(mktemp -d)" \
> +  || die "Failed to get new work directory"
> +readonly workdir
> +trap 'cd /; rm -rf "$workdir"' EXIT
> +cd "$workdir" || die "Failed to enter $workdir"
> +
> +if [[ -z ${BRANCH} ]]; then
> +  git clone -q "$srcdir" gccsrc
> +else
> +  git clone -b "${BRANCH}" -q "$srcdir" gccsrc
> +fi || die "Clone failed"

Not a fan of the cd /; rm -rf ... but lets pretend that works out ok.

So the current script depends on the srcdir being a full gcc git repo
from which it can checkout a BRANCH and then build the docs for that
branch. I think it might make sense to have the script on each branch
for that ranch, so you would just build the docs for the source/branch
you have. Since different branches might have different sets of
manuals.

> +######## BUILD libstdc++ DOCS
> +# Before we wipe out everything but JIT and Texinfo documentation, we need to
> +# generate the libstdc++ manual.
> +mkdir gccbld \
> +  || die "Couldn't make build directory"
> +(
> +  set -e
> +  cd gccbld
> +
> +  disabled_libs=()
> +  for dir in ../gccsrc/lib*; do
> +    dir="${dir##*/}"
> +    [[ -d $dir ]] || continue
> +    [[ $dir == libstdc++-v3 ]] && continue
> +    disabled_libs+=( --disable-"${dir}" )
> +  done
> +
> +  v ../gccsrc/configure \
> +    --enable-languages=c,c++ \
> +    --disable-gcc \
> +    --disable-multilib \
> +    "${disabled_libs[@]}" \
> +    --docdir=/docs \
> +    || die "Failed to configure GCC for libstdc++"
> +  v make configure-target-libstdc++-v3 || die "Failed to configure libstdc++"
> +
> +  # Pick out the target directory.
> +  target=  # Suppress warnings from shellcheck.
> +  eval "$(grep '^target=' config.log)"
> +  v make -C "${target}"/libstdc++-v3 \
> +    doc-install-{html,xml,pdf} \
> +    DESTDIR="$(pwd)"/_dest \
> +    || die "Failed to compile libstdc++ docs"
> +  set +x

Doesn't that make things very verbose?

> +  cd _dest/docs
> +  v mkdir libstdc++
> +  for which in api manual; do
> +    echo "Prepping libstdc++-${which}..."
> +    if [[ -f libstdc++-"${which}"-single.xml ]]; then
> +      # Only needed for GCC 4.7.x
> +      v mv libstdc++-"${which}"{-single.xml,} || die
> +    fi

Do we really want to support 4.7.x in this (modern) script?
See also the BRANCH comment above.

> +    v gzip --best libstdc++-"${which}".xml || die
> +    v gzip --best libstdc++-"${which}".pdf || die
> +
> +    v mv libstdc++-"${which}"{.html,-html} || die
> +    v tar czf libstdc++-"${which}"-html.tar.gz libstdc++-"${which}"-html \
> +      || die
> +    mv libstdc++-"${which}"-html libstdc++/"${which}"
> +
> +    # Install the results.
> +    v cp libstdc++-"${which}".xml.gz "${outdir}" || die
> +    v cp libstdc++-"${which}".pdf.gz "${outdir}" || die
> +    v cp libstdc++-"${which}"-html.tar.gz "${outdir}"
> +  done
> +
> +  v cp -Ta libstdc++ "${outdir}"/libstdc++ || die
> +) || die "Failed to generate libstdc++ docs"
> +
> +v rm -rf gccbld || die
> +
> +######## PREPARE SOURCES
> +
> +# Remove all unwanted files.  This is needed to avoid packaging all the
> +# sources instead of only documentation sources.
> +# Note that we have to preserve gcc/jit/docs since the jit docs are
> +# not .texi files (Makefile, .rst and .png), and the jit docs use
> +# include directives to pull in content from jit/jit-common.h and
> +# jit/notes.txt, and parts of the jit.db testsuite, so we have to preserve
> +# those also.
> +find gccsrc -type f \( -name '*.texi' \
> +     -o -path gccsrc/gcc/doc/install.texi2html \
> +     -o -path gccsrc/gcc/doc/include/texinfo.tex \
> +     -o -path gccsrc/gcc/BASE-VER \
> +     -o -path gccsrc/gcc/DEV-PHASE \
> +     -o -path "gccsrc/gcc/cobol/gcobol.[13]" \
> +     -o -path "gccsrc/gcc/ada/doc/gnat_ugn/*.png" \
> +     -o -path "gccsrc/gcc/jit/docs/*" \
> +     -o -path "gccsrc/gcc/jit/jit-common.h" \
> +     -o -path "gccsrc/gcc/jit/notes.txt" \
> +     -o -path "gccsrc/gcc/doc/libgdiagnostics/*" \
> +     -o -path "gccsrc/gcc/testsuite/jit.dg/*" \
> +     -o -print0 \) | xargs -0 rm -f \
> +  || die "Failed to clean up source tree"
> +
> +# The directory to pass to -I; this is the one with texinfo.tex
> +# and fdl.texi.
> +export includedir=gccsrc/gcc/doc/include

Does this need to be an exported variable?

> +# Generate gcc-vers.texi.
> +(
> +  set -e
> +  echo "@set version-GCC $(cat gccsrc/gcc/BASE-VER)"
> +  if [ "$(cat gccsrc/gcc/DEV-PHASE)" = "experimental" ]; then
> +    echo "@set DEVELOPMENT"
> +  else
> +    echo "@clear DEVELOPMENT"
> +  fi
> +  echo "@set srcdir $workdir/gccsrc/gcc"
> +  echo "@set VERSION_PACKAGE (GCC)"
> +  echo "@set BUGURL @uref{$BUGURL}"
> +) > "$includedir"/gcc-vers.texi \
> +  || die "Failed to generate gcc-vers.texi"
> +
> +# Generate libquadmath-vers.texi.
> +echo "@set BUGURL @uref{$BUGURL}" \
> +     > "$includedir"/libquadmath-vers.texi \
> +  || die "Failed to generate libquadmath-vers.texi"
> +
> +# Build a tarball of the sources.
> +tar cf docs-sources.tar --xform 's/^gccsrc/gcc/' gccsrc \
> +  || die "Failed to build sources"

Why not create a tar.gz? See also below.

> +######## BUILD DOCS
> +docs_build_single() {
> +  [[ $1 ]] || die "bad docs_build_single invoc"
> +  local manual="$1" filename miargs
> +  filename="$(find . -name "${manual}.texi")" \
> +    || die "Failed to find ${manual}.texi"
> +
> +  # Silently ignore if no such manual exists is missing.
> +  [[ $filename ]] || return 0

Maybe don't be silent about it?
If a manual suddenly disappears shouldn't this script just be adapted?

> +  miargs=(
> +    -I "${includedir}"
> +    -I "$(dirname "${filename}")"
> +  )
> +
> +  # Manual specific arguments.
> +  case "$manual" in
> +    gm2)
> +      miargs+=(
> +        -I gccsrc/gcc/m2/target-independent
> +        -I gccsrc/gcc/m2/target-independent/m2
> +      )
> +      ;;
> +    gnat_ugn)
> +      miargs+=(
> +        -I gccsrc/gcc/ada
> +        -I gccsrc/gcc/ada/doc/gnat_ugn
> +      )
> +      ;;
> +    *) ;;
> +  esac
> +
> +  v "${MAKEINFO}" --html \
> +    "${miargs[@]}" \
> +    -c CONTENTS_OUTPUT_LOCATION=inline \
> +    --css-ref "${CSS}" \
> +    -o "${manual}" \
> +    "${filename}" \
> +    || die "Failed to generate HTML for ${manual}"
> +  tar cf "${manual}-html.tar" "${manual}"/*.html \
> +    || die "Failed to pack up ${manual}-html.tar"

Maybe generate a tar.gz directly?

> +  v "${TEXI2DVI}" "${miargs[@]}" \
> +    -o "${manual}.dvi" \
> +    "${filename}" \
> +    </dev/null >/dev/null \
> +    || die "Failed to generate ${manual}.dvi"
> +  v dvips -q -o "${manual}".{ps,dvi} \
> +    </dev/null >/dev/null \
> +    || die "Failed to generate ${manual}.ps"

Do we really still want to produce a dvi and ps file if we already
produce a pdf below?

> +  v "${TEXI2PDF}" "${miargs[@]}" \
> +    -o "${manual}.pdf" \
> +    "${filename}" \
> +    </dev/null >/dev/null \
> +    || die "Failed to generate ${manual}.pdf"
> +
> +  while read -d $'\0' -r f; do
> +    # Do this for the contents of each file.
> +    sed -i -e 's/_002d/-/g' "$f" \
> +      || die "Failed to hack $f"
> +    # And rename files if necessary.
> +    ff="${f//_002d/-}"
> +    if [ "$f" != "$ff" ]; then
> +      printf "Renaming %s to %s\n" "$f" "$ff"

Maybe make this silent, the log already is fairly big?

> +      mv "$f" "$ff" || die "Failed to rename $f"
> +    fi
> +  done < <(find "${manual}" -name '*.html' -print0)
> +}
> +export -f docs_build_single
> +
> +# Now convert the relevant files from texi to HTML, PDF and PostScript.
> +if type -P parallel >&/dev/null; then
> +  parallel docs_build_single '{}' ::: "${MANUALS[@]}"
> +else
> +  for man in "${MANUALS[@]}"; do
> +    docs_build_single "${man}"
> +  done
> +fi

Interesting use of parallel (note, not currently installed on server
or in the container). Does it work with the nagware thing? Otherwise
it might be useful to explicitly do
  mkdir -p ~/.parallel; touch ~/.parallel/will-cite

> +v make -C gccsrc/gcc/jit/docs html SPHINXBUILD="${SPHINXBUILD}" \
> +  || die "Failed to generate libgccjit docs"
> +
> +v cp -a gccsrc/gcc/jit/docs/_build/html jit || die "failed to cp jit"
> +
> +
> +if [[ -d gccsrc/gcc/doc/libgdiagnostics/ ]]; then
> +  v make -C gccsrc/gcc/doc/libgdiagnostics/ html 
> SPHINXBUILD="${SPHINXBUILD}" \
> +    || die "Failed to generate libgdiagnostics docs"
> +
> +  v cp -a gccsrc/gcc/doc/libgdiagnostics/_build/html libgdiagnostics \
> +    || die "failed to cp libgdiagnostics"
> +fi

This is why I think it might make sense to have this script be
specific to each branch.

> +######## BUILD gcobol DOCS
> +# The COBOL FE maintains man pages.  Convert them to HTML and PDF.
> +cobol_mdoc2pdf_html() {
> +  mkdir -p gcobol
> +  input="$1"
> +  d="${input%/*}"
> +  pdf="$2"
> +  html="gcobol/$3"
> +  groff -mdoc -T pdf "$input" > "${pdf}" || die
> +  mandoc -T html "$filename" > "${html}" || die
> +}
> +find . -name gcobol.[13] |
> +  while read filename
> +  do
> +    case ${filename##*.} in
> +      1)
> +        cobol_mdoc2pdf_html "$filename" gcobol.pdf gcobol.html
> +        ;;
> +      3)
> +        cobol_mdoc2pdf_html "$filename" gcobol_io.pdf gcobol_io.html
> +        ;;
> +    esac
> +  done
> +
> +# Then build a gzipped copy of each of the resulting .html, .ps and .tar 
> files
> +(
> +  shopt -s nullglob
> +  for file in */*.html *.ps *.pdf *.tar; do
> +    # Tell gzip to produce reproducible zips.
> +    SOURCE_DATE_EPOCH=1 gzip --best > "$file".gz <"$file"
> +  done
> +)

Here you also create tar.gz files. Leaving the .tar archives as is.
Since the .tar archives are really big already I would remove them
here, or simply directly create tar.gz files above.

> +# And copy the resulting files to the web server.
> +while read -d $'\0' -r file; do
> +  outfile="${outdir}/${file}"
> +  mkdir -p "$(dirname "${outfile}")" \
> +    || die "Failed to generate output directory"
> +  cp "${file}" "${outfile}" \
> +    || die "Failed to copy ${file}"
> +done < <(find . \
> +              -not -path "./gccsrc/*" \
> +              \( -name "*.html" \
> +              -o -name "*.png" \
> +              -o -name "*.css" \
> +              -o -name "*.js" \
> +              -o -name "*.txt" \
> +              -o -name '*.html.gz' \
> +              -o -name '*.ps' \
> +              -o -name '*.ps.gz' \
> +              -o -name '*.pdf' \
> +              -o -name '*.pdf.gz' \
> +              -o -name '*.tar' \
> +              -o -name '*.tar.gz' \
> +              \) -print0)

So I might suggest to skip *.ps, *.ps.gz and *.tar here.

> +echo "Done \o/  Enjoy reading the docs."

Cheers,

Mark

Re: [PATCH] maintainer-scripts: add gen_gcc_docs.sh

Reply via email to