upayavira 2003/08/07 03:39:42
Modified: src/documentation/xdocs/userdocs book.xml
Added: src/documentation/xdocs/userdocs/offline ant.xml bean.xml
book.xml cli.xml index.xml
Log:
Documentation for the CLI and Bean
Revision Changes Path
1.5 +3 -0 cocoon-2.1/src/documentation/xdocs/userdocs/book.xml
Index: book.xml
===================================================================
RCS file: /home/cvs/cocoon-2.1/src/documentation/xdocs/userdocs/book.xml,v
retrieving revision 1.4
retrieving revision 1.5
diff -u -r1.4 -r1.5
--- book.xml 1 Aug 2003 09:41:23 -0000 1.4
+++ book.xml 7 Aug 2003 10:39:42 -0000 1.5
@@ -31,6 +31,9 @@
<menu-item label="XSP" href="xsp/index.html"/>
</menu>
+ <menu label="Offline Generation">
+ <menu-item label="Offline Generation" href="offline/index.html"/>
+ </menu>
</book>
1.1 cocoon-2.1/src/documentation/xdocs/userdocs/offline/ant.xml
Index: ant.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
"../../dtd/document-v10.dtd">
<document>
<header>
<title>Offline Page Generation with Apache Ant</title>
<version>0.9</version>
<type>Technical document</type>
<authors><person name="Upayavira" email="[EMAIL PROTECTED]"/>
</authors>
<abstract>This document explains how to use Cocoon to generate offline pages
and sites with Apache Ant.</abstract>
</header>
<body>
<s1 title="Overview">
<p>Apache Ant can be used to start Cocoon in its Offline mode. Whilst
a specific
Cocoon Ant task is planned, at present it can be invoked by starting the
command line interface using a standard Java task.
</p>
</s1>
<s1 title="Sample Ant Task">
<p>A sample Ant task would be as follows:</p>
<source>
<![CDATA[
<java classname="org.apache.cocoon.Main" fork="true"
dir="${build.context}"
failonerror="true" maxmemory="128m">
<arg value="-xcli.xconf"/>
<arg value="index.html"/>
<classpath>
<path refid="classpath"/>
<fileset dir="${build.dir}">
<include name="*.jar"/>
</fileset>
<pathelement location="${tools.jar}"/>
<pathelement location="${build.context}/WEB-INF/classes"/>
</classpath>
</java>
]]> </source>
<p>This makes use of the Cocoon Command Line Interface's xconf configuration
file. See
<link href="cli.html">command line</link> page for details about how to
use this file.</p>
</s1>
</body>
</document>
1.1 cocoon-2.1/src/documentation/xdocs/userdocs/offline/bean.xml
Index: bean.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
"../../dtd/document-v10.dtd">
<document>
<header>
<title>The Cocoon Bean</title>
<version>0.9</version>
<type>Technical document</type>
<authors><person name="Upayavira" email="[EMAIL PROTECTED]"/>
</authors>
<abstract>This document details the basics of using the Cocoon
bean.</abstract>
</header>
<body>
<s1 title="Overview">
<p>The Cocoon Bean provides a Java programmatic interface for offline
page and site generation
with Apache Cocoon.
</p>
</s1>
<s1 title="Details">
<p>The Cocoon Bean forms the core of, and is used by the Cocoon Command Line
Interface
(CLI).</p>
<p>To find more about using the bean, look at the code for the CLI, which can
be found
in the Cocoon codebase in <code>src/java</code>, in the class
<code>org.apache.cocoon.Main</code>.</p>
<note>Whilst the Cocoon Bean works, it is still under development, and
therefore its API
must be considered unstable. Return to this page in future versions to
see what has
changed.</note>
</s1>
</body>
</document>
1.1 cocoon-2.1/src/documentation/xdocs/userdocs/offline/book.xml
Index: book.xml
===================================================================
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN"
"../../dtd/book-cocoon-v10.dtd">
<book software="Apache Cocoon"
title="Apache Cocoon User Documentation - Concepts"
copyright="@year@ The Apache Software Foundation">
<menu label="Navigation">
<menu-item label="Main" href="../../index.html"/>
<menu-item label="User Documentation" href="../index.html"/>
</menu>
<menu label="Offline">
<menu-item label="Overview" href="index.html"/>
<menu-item label="Command Line" href="cli.html"/>
<menu-item label="Ant" href="ant.html"/>
<menu-item label="Cocoon Bean" href="bean.html"/>
</menu>
</book>
1.1 cocoon-2.1/src/documentation/xdocs/userdocs/offline/cli.xml
Index: cli.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
"../../dtd/document-v10.dtd">
<document>
<header>
<title>Offline Page Generation with the Command Line Interface</title>
<version>0.9</version>
<type>Technical document</type>
<authors><person name="Upayavira" email="[EMAIL PROTECTED]"/>
</authors>
<abstract>This document explains how to use the Cocoon Command Line Interface
for offline page and site generation.</abstract>
</header>
<body>
<s1 title="Overview">
<p>The Command Line Interface provides access to Cocoon's offline generation
capabilities.</p>
<p>This page gives details of how configure and use the CLI. Details of the
concepts behind
offline page generation are given on the offline generation
<link href="index.html">overview</link> page.</p>
</s1>
<s1 title="Invoking the CLI">
<p>The CLI can be invoked from the command line. Change to the root directory
of your
Cocoon distribution, and then, on Unix use: <code>./cocoon.sh cli
<parameters></code>
and on Windows use <code>cocoon.bat cli <parameters></code></p>
<p>The relevant parameters are detailed in the following sections.</p>
</s1>
<s1 title="Configuring the CLI">
<p>The CLI has two methods of configuration, with an <code>xconf</code> file,
and using
command line parameters.</p>
<p>The <code>xconf</code> method is the newer, and gives access to a wider
range of
features, and is thus explained first.</p>
<note>Whilst the xconf method provides access to more features, the command
line
parameter method is more stable, as there are currently plans to improve
the xconf format to allow greater flexibility. If you require a stable
and
consistent method for accessing the CLI, it is recommended that you use
the
command line parameter method.</note>
<s2 title="Using an Xconf file">
<p>To start the CLI using an xconf file, on Unix do <code>./cocoon.sh cli -x
<xconf file></code>
or on Windows: <code>cocoon cli -x <xconf file></code>.</p>
<p>A sample xconf file is included below.</p>
<source>
<![CDATA[
<?xml version="1.0"?>
<!--+
| This is the Apache Cocoon command line configuration file.
| Here you give the command line interface details of where
| to find various aspects of your Cocoon installation.
|
| If you wish, you can also use this file to specify the URIs
| that you wish to generate.
|
| The current configuration information in this file is for
| building the Cocoon documentation. Therefore, all links here
| are relative to the build context dir, which, in the build.xml
| file, is set to ${build.context}
|
| Options:
| verbose: increase amount of information presented
| to standard output (default: false)
| follow-links: whether linked pages should also be
| generated (default: true)
| precompile-only: precompile sitemaps and XSP pages, but
| do not generate any pages (default: false)
| confirm-extensions: check the mime type for the generated page
| and adjust filename and links extensions
| to match the mime type
| (e.g. text/html->.html)
+-->
<cocoon verbose="true"
follow-links="true"
precompile-only="false"
confirm-extensions="false">
<!--+
| Broken link reporting options:
| Report into a text file, one link per line:
| <broken-links type="text" report="filename"/>
| Report into an XML file:
| <broken-links type="xml" report="filename"/>
| Ignore broken links (default):
| <broken-links type="none"/>
| When a page includes an error, should a page be generated?
|
| Two attributes to this node specify whether a page should
| be generated when an error occured. 'generate' specifies
| whether a page should be generated (default: true) and
| extension specifies an extension that should be appended
| to the generated page's filename (default: none)
| <broken-links generate="true" extension=".error.txt"/>
|
+-->
<broken-links type="xml"
file="brokenlinks.xml"
generate="false"
extension=".error"/>
<!--+
| Load classes at startup. This is necessary for generating
| from sites that use SQL databases and JDBC.
| The <load-class> element can be repeated if multiple classes
| are needed.
+-->
<!--
<load-class>org.firebirdsql.jdbc.Driver</load-class>
-->
<!--+
|
+-->
<logging log-kit="WEB-INF/logkit.xconf" logger="cli" level="ERROR" />
<!--+
| The context directory is usually the webapp directory
| containing the sitemap.xmap file.
|
| The config file is the cocoon.xconf file.
|
| The work directory is used by Cocoon to store temporary
| files and cache files.
|
| The destination directory is where generated pages will
| be written (assuming the 'simple' mapper is used)
+-->
<context-dir>.</context-dir>
<config-file>WEB-INF/cocoon.xconf</config-file>
<work-dir>work/docs</work-dir>
<dest-dir>dest</dest-dir>
<!--+
| Specifies the filename to be appended to URIs that
| refer to a directory (i.e. end with a forward slash).
+-->
<default-filename>index.html</default-filename>
<!--+
| Specifies a user agent string to the sitemap when
| generating the site.
+-->
<!--
<user-agent>xxx</user-agent>
-->
<!--+
| Specifies an accept string to the sitemap when generating
| the site.
+-->
<accept>*/*</accept>
<!--+
| Specifies the URIs that should be generated (using <uri>
| elements, and (if necessary) what should be done with the
| generated pages.
|
| The old behaviour - appends uri to the specified destination
| directory (as specified in <dest-dir>):
|
| <uri>documents/index.html</uri>
|
| Append: append the generated page's URI to the end of the
| source URI:
|
| <uri type="append" src-prefix="documents/" src="index.html"
| dest="build/dest/"/>
|
| Replace: Completely ignore the generated page's URI - just
| use the destination URI:
|
| <uri type="replace" src-prefix="documents/" src="index.html"
| dest="build/dest/docs.html"/>
|
| Insert: Insert generated page's URI into the destination
| URI at the point marked with a * (example uses fictional
| zip protocol)
|
| <uri type="insert" src-prefix="documents/" src="index.html"
| dest="zip://*.zip/page.html"/>
|
+-->
<uri>favicon.ico</uri>
<uri type="append" src-prefix="documents/" src="index.html" dest="docs/"/>
<!--+
| File containing URIs (plain text, one per
| line).
+-->
<!--
<uri-file></uri-file>
-->
</cocoon>
]]>
</source>
<s3 title="Broken Link Handling">
<p>The xconf method allows for more sophisticated broken link handling. The
user can select to have broken links reported to a file, this file being
either text or XML.</p>
<p>When this file is plain text, it will have one link URI per line.</p>
<p>When this file is in XML, it will detail a message explaining the reason
for the broken link, as well as the URI of the link.</p>
<p>It is also possible to specify whether an error page should be
generated
in the place of the broken page (based upon the configured
<code><map:handle-errors></code> code in the sitemap). If
required,
an extension can be appended to the original file's URI to signify that
it is an error page (e.g. <code>.error</code>).</p>
</s3>
</s2>
<s2 title="Command Line Parameters">
<p>You can get a listing of the available parameters on unix with
<code>./cocoon.sh cli -h</code> or on Windows with <code>cocoon cli
-h</code>.
This should give a listing something like:</p>
<source>
-------------------- Executing -----------------
Main Class: org.apache.cocoon.Main
usage: cocoon cli [options] [targets]
------------------------------------------------------------------------
cocoon 2.1
Copyright (c) 1999-2003 Apache Software Foundation. All rights reserved.
------------------------------------------------------------------------
-a,--userAgent use given string for user-agent header
-e,--confirmExtensions confirm that file extensions match mime-type of
pages and amend filename accordingly (default is true)
-C,--configFile specify alternate location of the configuration
file (default is ${contextDir}/cocoon.xconf)
-D,--defaultFilename specify a filename to be appended to a URI when
the URI refers to a directory
-L,--loadClass specify a class to be loaded at startup
(specifically for use with JDBC). Can be used multiple
times
-P,--precompileOnly generate java code for xsp and xmap files
-V,--verbose enable verbose messages to System.out
-b,--brokenLinkFile send a list of broken links to a file (one URI
per line)
-c,--contextDir use given dir as context
-d,--destDir use given dir as destination
-f,--uriFile use a text file with uris to process (one URI
per line)
-h,--help print this message and exit
-k,--logKitconfig use given file for LogKit Management
configuration
-l,--Logger use given logger category as default logger for
the Cocoon engine
-p,--accept use given string for accept header
-r,--followLinks process pages linked from starting page or not
(boolean argument is expected, default is true)
-u,--logLevel choose the minimum log level for logging
(DEBUG, INFO, WARN, ERROR, FATAL_ERROR) for startup
logging
-v,--version print the version information and exit
-w,--workDir use given dir as working directory
-x,--xconf specify a file containing XML configuration
details for the command line interface
Note: the context directory defaults to './webapp'
</source>
<p>For details of the meaning of each specific parameter, see the <link
href="index.html">overview</link>
page.</p>
<s3 title="Specifying Targets">
<p>The command line parameter method does not have access to all of
Cocoon's URI handling features. However,
it is possible to specify multiple URIs to be crawled, all of which
will be written to the same destination,
and that destination (specified by the <code>-d</code> or
<code>--destDir</code> option, may be a file URI
or any other protocol for which a ModifiableSource exists (e.g.
FTP).</p>
</s3>
<s3 title="URI Files">
<p>A URI file offers a simple way to specify multiple URIs. The file is
treated as one URI per line.</p>
</s3>
<s3 title="Broken Link Handling">
<p>If a broken link file is specified, all broken links will be written to
this file, in text format,
one URI per line.</p>
</s3>
</s2>
</s1>
</body>
</document>
1.1 cocoon-2.1/src/documentation/xdocs/userdocs/offline/index.xml
Index: index.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN"
"../../dtd/document-v10.dtd">
<document>
<header>
<title>Offline Page Generation</title>
<version>0.9</version>
<type>Technical document</type>
<authors><person name="Upayavira" email="[EMAIL PROTECTED]"/>
</authors>
<abstract>This document explains the basic concepts of offline page generation
with Apache Cocoon.</abstract>
</header>
<body>
<s1 title="Overview">
<p>Cocoon can generate static, 'offline' versions of web pages or web sites,
as well
as sites served dynamically. This document covers the concepts involved in
offline
page and site generation.
</p>
</s1>
<s1 title="Offline Page Generation">
<p>Cocoon allows static versions of Cocoon web sites to be created.</p>
<p>At present, this can be done in three ways:</p>
<ul>
<li><link href="cli.html">Command Line Interface</link></li>
<li><link href="ant.html">Using Ant</link></li>
<li><link href="bean.html">Cocoon Bean</link></li>
</ul>
<p>This document explains the general concepts that are shared by all of these
approaches.
The specific details for each method are explained on a separate page.</p>
<p>Cocoon, when generating pages offline, can follow links in a page (whether
that page
is HTML, PDF or anything else), and can rewrite URIs to create filenames by
checking
the mime type of the generated page. All links to pages who's URIs change
are changed
too.
</p>
</s1>
<s1 title="Configuration">
<p>To use Cocoon in its offline mode, a servlet container (e.g. Tomcat or
Jetty) is not
needed. Cocoon can generate an offline site directly using the information
available
in the Cocoon <code>webapp</code> folder.</p>
<p>Having said this, many choose to have a servlet container available locally
for use
whilst debugging, as this can speed up the development process
significantly.</p>
<s2 title="Directories and Files">
<p>As all the information Cocoon needs to generate a site is stored in the
Cocoon
webapp directory, we need to tell it where to find it, and where to find
various
other files and directories. These are:</p>
<ul>
<li>Context directory (the Cocoon Webapp directory)</li>
<li>Configuration File (usually
<code>${COCOON_WEBAPP}/WEB-INF/cocoon.xconf</code>)</li>
<li>Work Directory (used by Cocoon to store temporary files, this can be
anywhere of your choosing)</li>
</ul>
</s2>
<s2 title="Logging">
<p>There are three options that need to be specified in relation to logging.
These are:</p>
<ul>
<li>Log Kit (the logging configuration file, usually
<code>${COCOON_WEBAPP}/WEB-INF/logkit.xconf</code>)</li>
<li>Logger (a category used for logging, as configured in the
configuration file)</li>
<li>Log Level (a logging level, either DEBUG, INFO WARN, ERROR or
FATAL_ERROR. Relates specifically to logging
at startup, after which log kit configuration takes over)</li>
</ul>
</s2>
<s2 title="Other Configuration Options">
<p>In online mode, a User agent string tells Cocoon what browser is being
used to access a page. The user agent
can be configured manually for offline generation.</p>
<p>In online mode, an accept string is provided by a browser, telling the
browser what types of content it
is capable of accepting. This will be a comma separated list of mime
types. In offline mode, an accept
string can also be specified.</p>
<p>As Cocoon based sites can change the content they generate based upon the
user agent string and the accepts
string, it can be necessary to specify them in order to have the correct
content generated.</p>
<p>In order to generate sites that make use of databases and database
connections, it is necessary to load
JDBC classes at startup. Cocoon allows for this.</p>
<p>When, in offline mode, Cocoon generates a page ending in a
<code>/</code>, the resultant file cannot be
written to a filesystem as its name would refer specifically to a
directory. Therefore, the user can
specify a default filename which will be appended to the page's URI
before saving to disc.</p>
</s2>
</s1>
<s1 title="URIs and Targets">
<s2 title="SourceURIs">
<p>A source URI (which may also have a source prefix prepended) is the part
of the URI that is given
to Cocoon for processing. So, for example, if you access a page with:
<code>http://localhost:8080/cocoon/site/page.html</code> then the source
URI would be
<code>site/page.html</code></p>
</s2>
<s2 title="Destinations and Modifiable Sources">
<p>Most of the time, when generating pages, the generated pages will be
simply written to disk.</p>
<p>However, this is not the only option. Generated pages can be written
anywhere for which a
<code>ModifiableSource</code> exists. So, for example, it is possible to
generate a site and
have the pages written directly to a web server using FTP, by making use
of the Avalon
<code>FTPSource</code>.</p>
</s2>
<s2 title="Target Types">
<p>When generating a page, Cocoon needs to know how to decide upon the URI
of the generated page.
This process could be described as 'URI arithmetic'.</p>
<p>Source and destination URIs are made up of the following elements:</p>
<ul>
<li>Source Prefix: Part of a source URI used to request a page but
excluded from the destination
URI</li>
<li>Source URI: Part of a source URI that is used when calculating the
destination URI</li>
<li>Destination URI: The base URI for a destination</li>
<li>Type: The method used for merging the above elements (can be append,
replace or
insert</li>
</ul>
<note>When combining elements to make a URI, it is the user's responsibility
to include directory
separators. For example, <code>foo</code> with <code>bar</code>
appended will be
<code>foobar</code>, whereas <code>foo/</code> with <code>bar</code>
appended will be
<code>foo/bar</code>.
</note>
<s3 title="Appending">
<p>Here, when calculating the destination URI, the source prefix is
ignored, and the destination
URI is calculated by appending the source URI to the end of the
destination URI. For example,
with the following values:</p>
<p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>,
destination URI:
<code>pages/</code></p>
<p>A request will be made to Cocoon for a page at:
<code>site/page.html</code>. This will be
saved as <code>pages/page.html</code>.</p>
</s3>
<s3 title="Replacing">
<p>Here, when calculating the destination URI, the source prefix and the
source URI are
ignored, and the destination URI is used as is. This is useful when you
wish to save the
generated page with a filename that bears no relationship to the source
URI. For example,
with the following values:</p>
<p>Source prefix: <code>site/</code>, source URI: <code>page.html</code>,
destination URI:
<code>pages/simple.html</code></p>
<p>A request will be made to Cocoon for a page at:
<code>site/page.html</code>. This will be
saved as <code>pages/simple.html</code>.</p>
<note>Given the nature of this target type, it inherently cannot be used
when following links
(otherwise all pages will be written on top of each other).</note>
</s3>
<s3 title="Inserting">
<p>Here, when calculating the destination URI, the source prefix is
ignored, and the source URI
is inserted into the destination URI at the point marked by an asterisk
(*). This is intended
for use with complex protocols where the source URI does not appear at
the end of the
destination URI.</p>
</s3>
</s2>
<s2 title="Mime Type Checking">
<p>Cocoon can optionally test the mime type for a page, and, if the mime
type doesn't match the page's
extension, amend the destination URI to include the correct extension.
This will ensure that pages
will load correctly when served by a static web server.</p>
<p>When Cocoon amends a destination URI, it also amends URIs for links in
those pages, so that links
will still work when a site has been crawled.</p>
<note>This feature substantially slows down page generation, as each page
must be generated three times,
(once to find links, once to find its mime-type and once to collect
the actual content. This
can be avoided by ensuring that all URIs in the site are correct and
do not need amending, in which
case it is only necessary to generate a page once.</note>
</s2>
</s1>
<s1 title="Following Links and Site Crawling">
<p>Cocoon can be configured to either follow, or ignore, links in pages that
it generates. It has two methods
of gathering links, 'link view' and 'link gathering'.</p>
<s2 title="Link View Crawling">
<p>With link view crawling, Cocoon gets the links by generating the 'link
view' for a page. Using link view
gives a significant degree of configurability in terms of which links are
gathered, as it is possible to
insert a transformer into the view to select out links that should not be
followed.</p>
<p>The disadvantage with link view crawling is that each page must be
generated twice, which doubles page
generation time.</p>
<p>Link view is usually configured in the root sitemap with:</p>
<source>
<![CDATA[
<map:views>
<map:view from-position="last" name="links">
<map:serialize type="links"/>
</map:view>
</map:views>
]]>
</source>
<p>If you have this in your root sitemap, you do not need it in your
sub-sitemaps. However, you may choose
to override it with one that carries our further processing - for
example, with an XSLT transformer that
removes links that should not be crawled.</p>
<p>See <link href="../concepts/views.html">views</link> for more on views.
</p>
<p>You can see the link view yourself by appending
<code>?cocoon-view=links</code> to the page's URI.</p>
</s2>
<s2 title="Link Gathering Crawling">
<p>With link gathering crawling, links are gathered from the SAX stream
right before the serializer. All
<code>src</code>, <code>href</code> and <code>xlink:href</code>
attributes are taken to be links, and are
therefore followed.</p>
<p>The benefit of link gathering crawling is that pages do not need to be
generated twice. However, one looses
the ability to configure which links should be followed that exists with
link view crawling.</p>
</s2>
</s1>
<s1 title="Broken Links">
<p>When a page cannot be found at a URI that has either been specified, or has
been found as a link in another
page, it is considered 'broken'.</p>
<p>Exactly what is done when a broken link is found depends upon the method
used to evoke
Cocoon. See related pages for specific details.</p>
</s1>
<s1 title="Precompiling XSPs">
<p>When used offline, Cocoon can precompile XSP pages. If no URIs are
specified, it will scan all directories
within the context directory looking for XSP files, each of which will be
compiled. If URIs are specified,
all links will be followed looking for pages that make use of XSP,
compiling those XSP pages as they are
found.</p>
</s1>
</body>
</document>