Lately, I've been thinking a lot about form handling in Cocoon. The reason for this is that I will very soon start a project which is basically a large set of forms (about 40 different screens used to fill an XML document containing collections having up to 1000 or 2000 items). As part of our proposal for the project, I did some prototyping with XMLForm (+flowscript) and liked its lightweight markup and the strong separation it enforces between form definition and form layout. But I disliked its poor syntactical validation facilities. On the other side, we have Woody which is very good a validating data but which I find heavy to use and defines its own schema language. So this RT is my attempt to make a synthesis of the good and bad points of both frameworks, augmented with my own ideas, so that we can move towards a single unified form handling package in Cocoon.
Disclaimer : I don't want to start a war between Woody and XMLForm, but just try to analyze what we have today and expose what I (hence it's subjective) consider as good. Discussion is of course welcomed. Also, I may have missed some features of one or the other framework. In that case, please don't shoot at me, but be kind enough to explain what I missed !
Also, I'll speak about XMLForm, even if it's somewhat dead and replaced by JXForms (essentially a cleaner rewriting of the XMLFormTransformer and an update of the markup to the latest XForms draft), because all criticisms about XMLForm below come from the original XMLForm and not the JXForms work.
---oOo---
General overview ----------------
Both Woody and XMLForm use the same basic principles :
1/ Content production : a form template is "instanciated", i.e. it is filled with values coming from a data model, and the instanciated form is transformed to the target language (e.g. HTML) using generic and/or custom stylesheets that know how to render the various widgets.
2/ Form validation : upon form submission, values are validated and stored into a data model, and violations are produced if some validation error occurs (validations involving several fields are also possible). In case of error, the form can be redisplayed with the violations.
But, as we will see below, the notions of form template, data model and validation are very different in Woody and in XMLForm.
---oOo---
Form definition ---------------
Woody separates form definition, form template and form instance (3 different namespaces). The form definition is a kind of schema language that defines every widget in the form with its label, datatype and validation constraints. The template contains references to form fields mixed with foreign markup (such as HTML). It is instanciated using the WoodyTransformer : every field present in the template is replaced by the corresponding instance acccording to the form definition.
Woody has no notion of application model, as it stores field values in it's own data structure, which must be read and written to the application model. Work is underway in this area with a JXPath based binding.
XMLForm has only one markup, inspired by the W3C's XForms specification. This markup is more or less equivalent to the Woody template (it accepts foreign markup), which is instanciated ("augmented" would be better) with either the XMLFormTransformer/JXFormsTransformer or the JXFormsGenerator. Form fields contain XPath references to the data model, which can therefore have an arbitrary complexity.
<my-opinion>
XMLForm is way easier to setup to produce forms : a single file, a data model containing any mixture of objects handled by JXPath (JavaBeans, DOM elements, etc), XPath expressions everywhere, and you're done. But as soon as there's a need for data whose formatting is more than toString(), such as dates and float values, and even more in an I18Nized environment, XMLForm shows strong limitations, mainly related to lack of proper formatting functions in XPath.
As JXPath supports extension functions, building a library of formatting functions can be a solution to circumvent XPath's reduced function set. But we'll see below that there's still a problem with parsing submitted form data.
Woody, on the other hand, is more complicated to set up, as two files are needed (form definition and form template), with many cross-references (field IDs). But Woody shines for complicated formatting (see <convertor> directives) and I18N.
IMO, Woody's separation of concerns between form definition and template is not that good. Woody would be easier to use if the definition file was only a schema defining datatypes and if fields were defined only in the template. Although there is a great probability that datatypes can be reused for different fields and even different forms, I'm not sure using the same fields within different templates really make sense. For example, HTML and WML browsers have so much different screen sizes and interaction constraints that a single form definition can hardly be used for both.
Reusing datatypes for different fields would also increase the overall application consistency : as of today, if two fields have the same datatype and constraints, these must be duplicated. This could also open the door to other schema languages (WXS, RNG, etc).
</my-opinion>
---oOo---
Population and validation -------------------------
"Population" is the term used to designate the action of "filling" the data model with form-submitted data. "Validation" is the action of controlling that submitted data is valid, i.e. that is satisfies some syntactic and semantic constraints.
Upon form submission, XMLForm traverses all request parameters and tries to set their value on the data model using JXPath. A feature allows to filter request parameters that are not part of the data model. If the data model was filled correctly, a validation is performed using Schematron. This allows to have finer-grained or inter-field controls, again using XPath expressions. Each of these two phases can produce violations, which are recorded in the Form object.
Upon form submission, Woody traverses the form's widget tree, and each widget is responsible to parse the corresponding request parameter and validate it's value. Non-visual widgets are also provided to perform inter-field controls.
<my-opinion>
Here again, XMLForm is very easy to use but shows some strong limitations : because it's designed after XForms, XMLForm has no feature to specify how to parse form parameters (strings) into strongly typed data. So even basic parsing of e.g. dates is not possible, and locale-dependent parsing is clearly not possible.
The Schematron validation has less restrictions since it deals with the populated data model, and thus on strongly typed data, if they could be parsed in the population phase.
XMLForm also has what I consider a strong security weakness : the default request parameter filter rejects only special parameters such as "cocoon-action-*", which means that a request can be hacked that modifies a part of the data model that wasn't available as a form field. Considering that programmers are lazy (as I am), the form model will often be the actual business object. The consequences of providing a form to a user to update her location information can be catastrophic if the User class contains "address", "phoneNumber", but also "accessRights"...
W3C XForms, which inspired XMLForm, is a client-side specification targeted at producing XML documents validated by a WXS (W3C XML Schema). But XMLForm is server-side, and doesn't enforce any particular schema language. This means that very few features of XForms are actually used except the form markup and that all has to be invented to produce a featured server-side form framework, particularily in this population & validation phase.
Woody, by traversing the widget tree that was used to produce the form, doesn't have the security weakness of XMLForm since only parameters present in the produced form are considered. Also, it's strong parsing and I18N features make custom formatting really easy.
But, being limited to the form's data model, complex validations involving form data and application data can be difficult to do with Woody and will need custom Java code.
Finally, Woody uses its own expression language, with IMO is not a good choice if we consider that "standard" expression languages such as Jexl exist and are already used in other Cocoon blocks.
</my-opinion>
---oOo---
Mapping to the application data model -------------------------------------
A form is useless if its content cannot be mapped in some way to the application data model.
XMLForm has no special provision for mapping form data to application data, but using JXPath makes it easy to fill any JavaBean or any DOM structure. Post-validation application behaviour can be added to either a subclass of AbstractXMLFormAction or in a flowscript.
Woody currently does not provide anything to map form data to application data and all this must be coded either in a subclass of AbstractWoodyAction or in a flowscript. But there's work underway to add binding features to Woody, the first incarnation being based on JXPath.
<my-opinion>
XMLForm makes it easy (as pointed out above) for the lazy programmer to set the application data as the form model : mapping is then immediate and totally transparent. But along with the security problem mentioned above, this also means that when a form population & validation fails, it is very likely that some fields already have been modified, potentially leaving the data model in an inconsistent state.
So the secure and clean solution is to use a form-specific data model (a JavaBean, DynaBean or XML DOM), but this requires then custom code to copy form data to the application data model, thus loosing the simplicity provided by JXPath.
The ongoing work on Woody binding potentially allows a great range of target data models : the current JXPath binding will make it easy to map form data to an abitrary data structure, without XMLForm's limitations since parsed and strongly typed data will be stored in the application model. But we can also imagine other declarative bindings targetted at e.g. relational databases (no intermediate bean), EJBs, etc.
</my-opinion>
---oOo---
I18N ----
I18N features should be separated in two main areas : - I18Nization of form labels and item values (i.e. combobox labels) - I18Nization of textbox inputs, such as floating point numbers, dates, etc.
For the first item, both XMLForm and Woody accept any foreign markup in widget labels, including <i18n:*> tags for use with the I18NTransformer. Woody lacks the equivalent to <xf:help> but this was recently discussed and should be added soon. XMLForm also allows labels and similar items to have their content fetched from the form model using a "ref" attribute. In that case, however, only characters are produced, and not mixed content.
For the second item (i18nization of inputs), XMLForm has no support, as it hardly supports custom formats, as explained previously. Woody, on the other hand, has strong support for i18nization of inputs through its <convertor> tag that supports locale-specific patterns for formatting and parsing.
<my-opinion>
XMLForm's strong limitations for values formatting also apply to the i18n domain, whereas Woody not only provides strong support for value formatting, but also strong support for locale-dependend formatting.
XMLForm's "ref" attribute on form labels allows messages to be part of the form model, and thus be dynamic, but I'm not sure this is of real use. And if it is, Woody may be able to provide an equivalent through nested tags in the <wd:label> element.
</my-opinion>
---oOo---
Conclusion ----------
XMLForm has a lot of success because it has filled a giant need in Cocoon applications to handle forms. Moreover, it fits nicely with flowscript, and this combination builds an easy to use solution for form handling. But using it in more and more complex use cases show some strong limitations that are largely related to its desire to mimic XForms. And I'm not sure these limitations can be removed without diverging largely from the XForms approach.
These limitations were obviously taken into account early in Woody's design, which make it stronger at handling data formatting and enforcing semantic constraints. But Woody, by over-separating concerns, is more heavy to use.
Considering all the pros and cons, I think Woody, which is still in its infancy, is more promising on the long term and should be promoted, once featured enough, as the preferred form handling package in Cocoon.
---oOo---
Proposals ---------
We've seen that Woody requires to separate form definition from form template. I think (Bruno, correct me if I'm wrong) this constraint comes from the fact that the form _is_ the model, and thus must be filled with data _before_ being processed by the form template.
The ongoing work on form binding considers binding as a process surrounding form population and validation : the application->form binding fills an existing form, and the form->application binding transfers form data to the application model once the form is correctly validated.
Now we can imagine to have a "live" application->form binding occuring at form definition time which could allow simultaneous building of the form definition and population of form data from the binding. This feature could remove the need for a separate form definition and could be implemented by a WoodyTemplateGenerator taking as input a template file containing field definitions. A kind of "definition by example" (like the QBE that exists in Excel and various database systems).
This "defining-template" would only define fields and not datatypes. These datatypes could be either inferred from the application model trough the binding or fetched from a separate schema file (the current form definition, with only datatypes definitions).
On the other hand, form->application binding cannot be live, since we must ensure that all submitted value are valid before modifying the application data.
---oOo---
Thanks for reading so far. As I expect this post to generate lots of discussions, I suggest to create separate threads for particular subjects (particularily the final "proposals" chapter) in order to keep the discussion focused.
Sylvain
-- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
