[RT] Super Simple Site Generation Tool

Leo Simons Tue, 27 Dec 2005 06:03:01 -0800

Hi gang,

It pains me to say this (Forrest is a cool project and I
consider at least some of its active developers and community
members my friends) but we've muddled around long enough.


I think that, for the incubator website, Apache Forrest

  * is too unstable as a codebase
  * is way too complex
  * has too many features we don't need and solves
    too many problems we don't have
  * has a learning curve that is too big
  * does not work well with the SVN-based publishing
    process we want to use
  * is not well-understood by enough of the current
    Incubator volunteers
  * has caused frustration for too many of the current
    Incubator volunteers

I think this needs solving. I think this needs solving in
general, not just for the Incubator (I have the exact same
problem over at Gump, which also has a horrendously outdated
website which also uses Apache Forrest).

== Use case ==

In particular, the use case we are looking at can be codified
in the following workflow:

  # 1) figure out how to edit docs...
  curl http://svn.../site/README | less

  # 2) get stuff from SVN
  svn co http://.../site
  cd site

  # 3) edit docs
  $EDITOR x/y/z/foo.xml

  # 4) validate edit, generate results
  ./build # must work on windows, linux, os x, solaris, ...

  # 5) check results locally
  svn diff
  mozilla-firefox build/x/y/z/foo.html

  # 6) commit
  svn commit -m "foo.xml: fix typo. Its 'spelling', not 'speling'."

  # 7) validate commit
  mozilla-firefox http://staging.incubator.apache.org/x/y/z/foo.html

  # 8) x hours later...
  mozilla-firefox http://incubator.apache.org/x/y/z/foo.html

== Requirements ==

Now, steps 1-7 should be possible in about 5 minutes (provided you're
on broadband network and a 600mhz/128mb/5400rpm machine. 5 minutes includes
reading the documentation and the like), and about 10 seconds for someone
who already knows how the tool works and has the checkout on his machine
already.

Step 4 should be so quick and natural and trustable that steps 5,7 and 8
are not really neccessary once you've used the tool a few times.

The produced HTML should be simple, clean, valid XHTML 1.1 strict, with
navigational stuff similar to that available on other parts of
www.apache.org. This output should be customizable in a manner that's
similarly simple as the above use case (for eg adding banner logos or
updating a project logo), though its allowed to be a little harder (eg
having to learn some template language is just fine).

The source format should be simple, clean, and simple to grok (within the
basic 5 minute period mentioned above) for anyone who knows how to write
basic HTML (eg <b>, <em>, <h1>). The translation from the source to the
produced HTML should be obvious and without surprises.

There should be a simple file (probably XML) for specifying navigational
elements where again the transformation from source to html is so obvious
that anyone who has ever edited XML doesn't need to RTFM.

== Available tools ==

None of the other generally available tools satisfy all the above
requirements at the moment.

 * forrest
   * requires installing forrest, which takes way more than 5 mins in and
     of itself
   * basically it fails many of the use case basics (too heavyweight,
     takes more than 3 minutes to learn, input->output transforms too
     complex), which is probably because I wrote the use case to contrast
     with the current forrest process
   * the transformation is not predictable enough

 * maven
   * requires installing maven, which takes 5 mins in and of itself
     * maven 1 not easy enough to install into svn self-contained
       * maven 1? will essentially be "dead" as everyone migrates to
         maven 2
     * don't know about maven 2 yet, but it doesn't have a very widely
       installed base yet, since it was just released
   * the source xdoc format is not simple or clean enough
   * the validation step is not complete enough
   * the transformation is not predictable enough
   * changing the stylesheet is not simple enough
     * if maven is upgraded your entire look and feel may change as
       well as other things (eg the xdoc plugin is not stable enough)

 * anakia
   * the source xdoc format is not simple or clean enough
   * the validation step is not complete enough
   * the transformation is not predictable enough
   * managing navigational elements is not simple or clean enough

 * others
   * no doubt there are many that could be used

== Possible steps forward? ==

Given the above, fixing forrest seems like a lot of work. I think its a
fundamentally bad fit, being built on top of many many layers of java code
and several frameworks makes it too heavy by definition. However, since
it is very easy to customize forrest to output the source format for
another tool, and we have ready access to forrest experts, migrating away
from forrest is probably not very painful (I think it involves writing some
XSLT). As long as the source format stays XML, moving back to forrest later
is also not painful. Standards-based. Lack of lock-in. Good.

Moving back to anakia is possible, but satisfying the validation requirement
is always going to be hard because of its use of velocity rather than "real"
XML parsing.

Moving forward to maven 2 is probably possible but I think the same argument
against anakia still applies to its document generation process. One big
advantage with maven is that its easy to also automate many of the other bits
of workflow, eg doing things like the 'svn commit' too.

Step 7 and 8 depend on what the site-dev at apache.org people come up with;
right now this is more like having to do an "svn up" remotely, setting up
a HTTP proxy, etc, or skip 7, and just do 8, with x == 2. But that'll be
fixed. Lots of plans made, apparently. Haven't been able to figure out what
the status is right now (seems there's no code for steps 1-6 at the moment),
but its definitely going to be compatible with any tool satisfying the above
use case. See:

  http://people.apache.org/~rgardler/site-dev/Site-Build.html

The site-dev people have been working on some other kinds of tools, including
one that uses Perl+XSL, which shows just how simple "step 4" at its core really
is:

  #!/usr/local/bin/perl

  use strict;

  my $xdocdir = "xdoc";
  my $htmldir = "html";
  my $xslfn = "xsl/xdoc2html.xsl";

  opendir(XDOC, $xdocdir) || die("Couldn't open directory $rdfdir");
  while (my $r = readdir(XDOC)) {
      next unless $r =~ /^[a-zA-Z].*\.xml$/;
      my $infn = "$xdocdir/$r";
      print "Processing $infn\n";
      my $outfn = "$htmldir/$r";
      $outfn =~ s/\.xml$/\.html/;
      `xsltproc $xslfn $infn > $outfn`;
  }
  closedir(RDF);

Some things to notice...

 * needs xsltproc to be installed
 * error reporting not so intuitive (just the output from xsltproc)
 * no clear abort on error
 * doesn't walk directories (I think perls readdir reads the current
   directory only)
 * specification of navigational elements not easy enough (eg no
   maven-style site.xml)
(* also note some of the use case being addressed depends on the XSLT
   file)

But these kinds of things aren't exactly hard to address. So I think
I'm going to spend a day or maybe two writing a dedicated script that
does the above, and a little more, and then once done I'll spec up
what forrest should be spitting out, and ask for a volunteer to write
the neccessary stylesheets and do the conversion.

The code for this is probably going to end up somewhere inside

  https://svn.apache.org/repos/asf/infrastructure/site-tools/trunk

Unless there's lots of discussion resulting from the above on site-dev
(there seems to be a lot of history to this topic), then it'll end up
somewhere else.

Its probably going to be written in python and will probably use minidom
and the Kid template language. Its probably going to have a source XML
format that is a subset of XHTML 1.1 and specified as a DTD file. I think
I'll do away with a "site.xml" or anything like that and just specify
the menu as an XHTML snippet, since I've never ever seen site.xml files
used for anything but generating websites. I'll try and write the critter
so that its easily thrown away and replaced by something written in Perl
or Java.

I think I'll call it xdok.

I'll send another email once I have something working and ready for a
demo.

>From there, we'll see where it goes.

- LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[RT] Super Simple Site Generation Tool

Reply via email to