Dear developers, I am able to enhance my working time on the ODF toolkit in the upcoming months, especially on a feature branch on collaboration, as I was able to get a sponsorship by the German Government (Prototypefund).
[FYI: PrototypeFund - all people living in Germany are able to apply for work on OpenSource projects for 6 months so check it out! You may be alone or in a group and the maximum possible funding will be increased next year by 40% up to 50.000€. For those curious, here is the site with our successful project <https://prototypefund.de/project/documents-for-democracy/> - Google translate might help for none German speakers ;) ] Regarding my work, there are a few things to be changed in the ODF Toolkit basic design to be able to add the high-level collaboration feature. I will overwork the W3C XML DOM, but plan to keep it, to provide overall simplicity and interoperability with XML technology. The collaboration feature is to transform an ODF text document with the ODFDOM into a sequence of user changes. The purpose is to be able to send only the user changes (like git diffs for the software domain) among ODF editors instead as in the past always the full document. In addition, it should be possible to apply any new changes to the ODFDOM and they will be merged into the existing document. In the past two weeks, I started to investigate on some basic infrastructure problems, such as making the package layer (pkg) independent. My intention was to be able to change the XML model for a potential future project. As you might know, the package layer is dealing only with the ZIP. Taking care of its content table (manifest) and file signature/encryption - precisely everything that is specified in the ODF 1.2 specification part 3 <http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2-part3.html>. >From the package perspective, an OdfPackageDocument is just a certain directory within the package (might be even the root). This is how common ODF nested subdocuments are being implemented. Basically, any other formats might use this ZIP handling. For instance, ePub used in the beginning ODF 1.0 ZIP, but unfortunately forked with its own implementation, without ever contacting the OASIS standardization group. Or as mentioned before, the XML layer might be later exchanged with some binary model to enhance efficiency - especially for cells of spreadsheets, where sometimes people creating paintings by using a background coloured cell as a pixel. The rule of thumb is to focus on generating the run-time XML model from the RelaxNG ODF grammar but to keep it clearly distinct from the underlying pkg. As currently, it is possible to create an OdfPackage from an ODF Text document and the correct high level is created as well, I will call the package with the dependency using Google Guice. Yet, uncertain which binding might be work best <https://github.com/google/guice/wiki/Bindings>. Although there might be multiple JAR, I would keep it simple for now and still provide only one ODFDOM jar. The next step is the improved generation of the ODF XML model. We are currently using the Model from the MultiSchemaValidator (MSV) in our generator / schema2template project. But we have little understanding of choice/sequence of elements from the grammar, which leads to much unnecessary hand-written code. For instance, if a parent element has three optional elements A, B and C in a sequence and B was being added, the code had to be written to check if A and C already exists, before inserting the element in the appropriate place. Strange, but this often resulted in errors in the past. The manifest or the styles (list, automatic, styles) are one example. Another generation feature should be the lazy creation of Java maps for parent DOM elements, which have multiple element children with an ID attribute. To receive those children by ID from the parent. The perfect use case is style handling. Therefore in addition to the explicit ID there will be a config file to name attributes that work as ID but are not referred as such in the grammar. Like the concatenation of style name and style family (both strings) as ID (see JIRA-182 <https://issues.apache.org/jira/browse/ODFTOOLKIT-182>). With this in place, it will be easier to refactor the existing ODFDOM style handling. I am uncertain if the work on the MultiSchemaValidator (MSV) model will work out as the access seems by the visitor pattern only. I haven't figured out how to find choice/sequence from the MSV API. Otherwise, I have already loaded the graph memory dump from MSV already into a Apache Tinkerpot graph database <http://tinkerpop.apache.org/docs/current/reference/> and experimented with Gremlin and domain script languages as some projects before <http://joern.readthedocs.io/en/latest/querying.html>. On top of this XML DOM model, which is mostly generated - unfortunately not totally after the 6 months, will be a user API. For instance, the user might change the page orientation on any table or paragraph element but will have no idea that there is something within the styles.xml is involved. The XML will be an implementation detail. From user perspective, there will be a tree of user known object (e.g. paragraph, table, images, etc.). On each user object is should be possible to receive its XML and change representative. To be able to have a more modular design on the XML layer, each user object will be separated. For instance, there will be a separate SAXhandler for each object. Perhaps we might want to overthink if we merge the Simple API back to the upcoming ODFDOM user layer, but let's see how it evolves. Any feedback most welcome. Likely I am not able to answer before Monday. Have a nice weekend, Svante ᐧ
