Upcoming work on the ODF Toolkit

Svante Schubert Fri, 15 Sep 2017 07:23:41 -0700

Dear developers,

I am able to enhance my working time on the ODF toolkit in the upcoming
months, especially on a feature branch on collaboration, as I was able to
get a sponsorship by the German Government (Prototypefund).


[FYI: PrototypeFund - all people living in Germany are able to apply for
work on OpenSource projects for 6 months so check it out! You may be alone
or in a group and the maximum possible funding will be increased next year
by 40% up to 50.000€. For those curious, here is the site with our
successful project
<https://prototypefund.de/project/documents-for-democracy/> - Google
translate might help for none German speakers  ;) ]

Regarding my work, there are a few things to be changed in the ODF Toolkit
basic design to be able to add the high-level collaboration feature.
I will overwork the W3C XML DOM, but plan to keep it, to provide overall
simplicity and interoperability with XML technology.
The collaboration feature is to transform an ODF text document with the
ODFDOM into a sequence of user changes. The purpose is to be able to send
only the user changes (like git diffs for the software domain) among ODF
editors instead as in the past always the full document. In addition, it
should be possible to apply any new changes to the ODFDOM and they will be
merged into the existing document.

In the past two weeks, I started to investigate on some basic
infrastructure problems, such as making the package layer (pkg)
independent. My intention was to be able to change the XML model for a
potential future project.
As you might know, the package layer is dealing only with the ZIP. Taking
care of its content table (manifest) and file signature/encryption -
precisely everything that is specified in the ODF 1.2 specification part 3
<http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2-part3.html>.
>From the package perspective, an OdfPackageDocument is just a certain
directory within the package (might be even the root). This is how common
ODF nested subdocuments are being implemented.
Basically, any other formats might use this ZIP handling. For instance,
ePub used in the beginning ODF 1.0 ZIP, but unfortunately forked with its
own implementation, without ever contacting the OASIS standardization group.
Or as mentioned before, the XML layer might be later exchanged with some
binary model to enhance efficiency - especially for cells of spreadsheets,
where sometimes people creating paintings by using a background coloured
cell as a pixel.
The rule of thumb is to focus on generating the run-time XML model from the
RelaxNG ODF grammar but to keep it clearly distinct from the underlying pkg.
As currently, it is possible to create an OdfPackage from an ODF Text
document and the correct high level is created as well, I will call the
package with the dependency using Google Guice. Yet, uncertain which
binding might be work best <https://github.com/google/guice/wiki/Bindings>.
Although there might be multiple JAR, I would keep it simple for now and
still provide only one ODFDOM jar.

The next step is the improved generation of the ODF XML model.
We are currently using the Model from the MultiSchemaValidator (MSV) in our
generator / schema2template project. But we have little understanding
of choice/sequence of elements from the grammar, which leads to much
unnecessary hand-written code. For instance, if a parent element has three
optional elements A, B and C in a sequence and B was being added, the code
had to be written to check if A and C already exists, before inserting the
element in the appropriate place. Strange, but this often resulted in
errors in the past. The manifest or the styles (list, automatic, styles)
are one example.
Another generation feature should be the lazy creation of Java maps for
parent DOM elements, which have multiple element children with an ID
attribute. To receive those children by ID from the parent.
The perfect use case is style handling. Therefore in addition to the
explicit ID there will be a config file to name attributes that work as ID
but are not referred as such in the grammar. Like the concatenation of
style name and style family (both strings) as ID (see JIRA-182
<https://issues.apache.org/jira/browse/ODFTOOLKIT-182>).
With this in place, it will be easier to refactor the existing ODFDOM style
handling.
I am uncertain if the work on the MultiSchemaValidator (MSV) model will
work out as the access seems by the visitor pattern only. I haven't figured
out how to find choice/sequence from the MSV API. Otherwise, I have already
loaded the graph memory dump from MSV already into a Apache Tinkerpot graph
database <http://tinkerpop.apache.org/docs/current/reference/> and
experimented with Gremlin and domain script languages as some projects
before <http://joern.readthedocs.io/en/latest/querying.html>.

On top of this XML DOM model, which is mostly generated - unfortunately not
totally after the 6 months, will be a user API. For instance, the user
might change the page orientation on any table or paragraph element but
will have no idea that there is something within the styles.xml is
involved. The XML will be an implementation detail. From user perspective,
there will be a tree of user known object (e.g. paragraph, table, images,
etc.).
On each user object is should be possible to receive its XML and change
representative.

To be able to have a more modular design on the XML layer, each user object
will be separated. For instance, there will be a separate SAXhandler for
each object.

Perhaps we might want to overthink if we merge the Simple API back to the
upcoming ODFDOM user layer, but let's see how it evolves.

Any feedback most welcome. Likely I am not able to answer before Monday.

Have a nice weekend,
Svante
ᐧ

Upcoming work on the ODF Toolkit

Reply via email to