https://bugs.documentfoundation.org/show_bug.cgi?id=167170

            Bug ID: 167170
           Summary: Turn off RSIDs by default and warn about their privacy
                    implications
           Product: LibreOffice
           Version: Inherited From OOo
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: LibreOffice
          Assignee: [email protected]
          Reporter: [email protected]
            Blocks: 116885

tl;dr: RSIDs leak private information, directly and contextually, without
author awareness (and are also not fully reliable), so we should reconsider
having them enabled by default.

Now for the long version.


Introduction - about RSIDs
---------------------------

The openooo:rsid attribute is a LibreOffice (OpenOffice?) extension of the ODF
standard, based on a similar feature in MS OpenXML; see

https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.wordprocessing.rsid?view=openxml-3.0.1

https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes/List_of_LibreOffice_ODF_Extensions

It is a mostly-random value which distinguishes editing sessions of a document;
if we open a document in LO with RSID generation enabled, and make edits to
styles - these styles will be marked with a new generated RSID, which must be
higher than previous RSIDs in the document.

It is added to different ODF, and Flat ODF, entities; particularly, character
and paragraph styles, including autostyles. It is intended to assist in
document comparison, helping to identify originally-identical entities in the
documents.

Here is an example of how they look, when generated:

 <office:automatic-styles>
  <style:style style:name="T1" style:family="text">
   <style:text-properties officeooo:rsid="001e7e69"/>
  </style:style>
 </office:automatic-styles>

RSID generation is turned on by default in LO, and controlled via Tools >
Options > LibreOffice Writer > Comparison .


RSIDs divulge private information I
-----------------------------------

Let us begin with the 'dry' technicality of this claim, then add usage
scenarios as context.

RSIDs induce an ordered breakdown of the creating (or modification) of styles
in a document. While the exact dates and times are not listed - if the styles
are applied, one can tell which styled parts of a document were styled in the
same session.

A particular kind of styles, and the one which users are least cognizant of,
are automatic styles. These are applied not only when a user directly-formats
content, but something completely automatically. In fact, when adding text to a
paragraph from a previous editing session, it is likely (perhaps certain?) that
an automatic text style would be generated and applied to the new edit, with a
different RSID.

Thus, through an induction from styles, mainly automatic styles, to the styled
text, RSIDs allow a partial time-ordered decomposition of the text into editing
sessions.

This means, among others:

* The ability to identify relations between disparate pieces of the text, which
happen to be edited in the same session, separately from their surrounding text
which had been edited earlier.
* The ability to partition/section the text along different lines than its
formatting or explicit sectioning indicates.


RSIDs divulge private information II
------------------------------------

Alice and Bob are negotiating a contract with many articles and terms. Alice
sends over a draft, and Bob party makes all sorts of changes, in different
editing sessions. In particular, Bob inserts two separate clauses, in different
parts of the contract; they each appear to belong where they have been added.
It so happens, that one clause creates conditions in which the other is likely
to apply. Also, Bob visits Charlie and consults him about the contract; Charlie
is unhappy, and Bob makes some changes based on Charlie's comments; some of
them can be clearly understood to relate to Charlie, some cannot. Finally, Bob
sends a new draft to Alice.

Normally, Alice would simply be faced with the new contents of the draft,
possibly in track-changes form, and/or be able to diff the previous and current
drafts. However, given that the above-mentioned changes have likely resulted in
new automatic styles created for added paragraphs in different editing
sessions, she can reconstruct - possibly with near-perfection - a sequence of
editing sessions in which the different changes were made.

This would let her discern, among other things:

1. The two seemingly-unrelated clauses are actually related, and thus their
combined effect must be scrutinized.
2. One of the editing session is a "Charlie-related session", having changes
which obviously relate to his interests; the rest of the changes in that
session are likely to also be influenced or inspired by his comments and
interest.
3. What Bob had immediately started working on (and is readily on his mind),
and what he only changed later, after some thought. The ordering of work is
particularly useful when it does not correspond to the order of the paragraphs
in the contract.
4. Which clauses and terms Bob has written in "one sitting", and which he wrote
gradually.
5. Which clauses and terms Bob struggled with, making edits at different times
rather than changing them once.

Note that even if items (1.) and (2.) seem to you a little contrived, or niche,
situations - items (3.) through (5.) are quite general.


Consequences
-------------

Much of what we put in a document divuleges information about the user or
author; however, when this happens:

* without conscious user intent
* without the user being reasonably notified of the information being stored
* with it being non-trivial to realize how more sensitive information can be
inferred from seemingly "dry" and uninteresting information

this is problematic. One might go as to far as to say that adding this
information to documents, under the above conditions, is somewhat unfaithful to
the user - especially considering how LibreOffice prides itself on a commitment
to user privacy.

At this point one may object, regarding the benefits of RSID's for document
comparison. While it is difficult for me to estimate how significant these are,
I believe that the balance of considerations should lean in favor of better
privacy at the expense of efficiency, by default.

Moreover, in the Tools > Options tree branch for controlling whether  RSIDs are
stored (LibreOffice > LibreOffice Writer > Comparison), we should include a
warning about the potential leak of private information through this mechanism.
We do not need to go into specific details (although the documentation could go
into them - both the benefits and the detriments).


Referenced Bugs:

https://bugs.documentfoundation.org/show_bug.cgi?id=116885
[Bug 116885] [META] Privacy and data security issues
-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to