Document: draft-ietf-oauth-selective-disclosure-jwt
Title: Selective Disclosure for JWTs (SD-JWT)
Reviewer: Henry Thompson
Review result: Ready with Issues
Document: draft-ietf-oauth-selective-disclosure-jwt-18
Title: Selective Disclosure for JWTs (SD-JWT)
Reviewer: Henry S. Thompson
Review Date: 2025-05-02
*Summary*
The substance of this is, as far as I can tell as a non-specialist, is
in good shape. There are a few nits and editorial points at the end
of this review, but as will be evident by its length, there is one
essentially presentational issue, classified as Minor because a
specialist in this area will shrug and say "yes, but I see what
they're getting at". I hope none-the-less the authors will find it
useful and address the points I raise, because I do think as things
stand there's a genuine risk of misunderstanding exactly what's
required of an implementation.
*Minor points*
4.2.1
This bullet
"JSON-encode the array, producing [a] UTF-8 string"
looks simple, but ended up taking me several days to sort out.
For the rest of things to work, you must mean "Serialize the array
to the corresponding UTF-8 byte sequence", but that's not
exactly trivial in the JSON-native context you've adopted in this
document.
In the end I think you should include one extra step in the
Disclosure construction, example, namely what the that byte
sequence looks like as (what [RFC8259] calls) "UTF-8 encoded JSON
text", immediately after the array creation display:
["_26bc4LT-ac6q2KI6cBW5es", "family_name", "Möbius"] [1]
["26bc4LT-ac6q2KI6cBW5es", "family_name", "M%xc3%xb6bius"] [2]
It would also be good at this point to clarify notation and
terminology, following [RFC8259]. That is, to emphasise the
distinction that [1] is a "JSON text" per the RFC, whose final
value is a six-character Unicode string, while [2] is a UTF-8
byte sequence, the result of what you call "JSON-encoding".
It's true that they are both valid JSON texts, per RFC8259, but
you have to apply a JSON parser to them to get to
indistinguishable JSON objects.
To be more specific, since you use "JSON-encode" a number of
times in later sections, I would _strongly recommend_ that you
a) Add the following to section 1.2, immediately after the
definition of *base64url*:
*JSON-encode* denotes the conversion of a JSON object to
"JSON text" and encoding that text in UTF-8, as defined in
[RFC8259]. That is, mapping a JSON object to a UTF-8 byte
sequence which when decoded and parsed will reconstruct an
object indistinguishable from the original.
b) Replace the first two bullets in the algorithm description, with
* JSON-encode the array, producing a UTF-8 byte sequence.
* base64url-encode the resulting byte sequence. The resulting
string is the Disclosure.
c) Be careful never to use "string" when "(UTF-8) byte sequence"
is meant, starting in 4.2.2 with
The Disclosure string is created by JSON-encoding this array
and base64url-encoding the resulting byte sequence as
described in Section 4.2.1
d) In the second media type registration in 12.2
"represented as a JSON Object" ->
'represented as UTF-8 encoded "JSON text" as defined in [RFC8259]'
e) Include [RFC8259] in 13.1
Appendices A and B.
The above problem resurfaces here, with confusion between three
possible interpretations, in the terms of [RFC8259], of what is
displayed at various points:
* a JSON object, that is, structured data composed of
instances of the six primitive types which JSON can
represent. It is _not_ to be understood as string, byte
sequence or file contents;
* a possible JSON text for some JSON object.
* a UTF-8 encoding of some JSON text, aka a "JSON encoding".
I'll use the first example figure in Appendix B to go through
this in detail, expanding on the discussion above about 4.2.1.
The first figure is labelled as a JSON object, which is OK.
But it is indistinguishable from one of the possible JSON texts
corresponding to that object, and that should be explicitly
acknowledged.
The next figure purports to present two alternative "JSON
encodings", the second of which is problematic.
Its first line appears indistinguishable from that shown for the
JSON object in the preceding figure, but is in fact different.
In the first figure, construed as "JSON text", the o-umlaut glyph
denotes a single Unicode character in a six-character
representation of a six-character object member string value.
However in the second figure, second alternative, the o-umlaut
corresponds to a _two_-byte UTF-8 sub-part of the JSON encoding of
that value as a seven-byte UTF-8 byte sequence, either in some
internal representation or an external stream or file.
What to do? First, add something similar to
https://www.ietf.org/archive/id/draft-bray-unichars-14.html#name-notation
Then, whenever presenting JSON, always indicate whether what is
being shown is JSON text or JSON-encoded text (that is UTF-8
byte sequences). In JSON text, always include a version using the
U+xxxx notation whenever the underlying string contains non-ASCII
characters. In JSON-encoded text, _always_ use the %xnn notation
for non-ASCII characters.
Some examples of a possible way of indicating JSON text (*JT*)
and JSON-encoded text (*JUBS*), from section 4.2.1
Replace the first figure with these two:
_________________________________________________________
|*JT* |
| |
| ["_26bc4LT-ac6q2KI6cBW5es", "family_name", "Möbius"] |
| ^ |
| | |
| X+00F6 |
| |
|________________________________________________________|
_______________________________________________________________
|*JUBS* |
| |
| ["_26bc4LT-ac6q2KI6cBW5es", "family_name", "M%xC3%xB6bius"] |
| |
|_______________________________________________________________|
and the first bullet of the three alternatives which follow with
* A different way to encode the unicode o-umlaut:
______________________________________________________________
|*JT* |
| |
| ["_26bc4LT-ac6q2KI6cBW5es", "family_name", "M\x00f6bius"] |
| |
|_____________________________________________________________|
______________________________________________________________
|*JUBS* |
| |
| ["_26bc4LT-ac6q2KI6cBW5es", "family_name", "M\x00f6bius"] |
| |
|_____________________________________________________________|
The corresponding declaration is then
WyJfMjZiYzRMVC1hYzZxMktJNmNCVzVlcyIsICJmYW1pbHlfbmFtZSIsICJNX
HUwMGY2Yml1cyJd
And throughout the examples in Appendices A and B, label the initial
figure with *JT* and the 'Content' boxes with *JUBS*. You don't
need to gloss every Chinese/German string with their U+xxxx version,
but saying something at the top of A that where non-ASCII characters
appear in any of the initial examples that the actual Unicode
character is what is meant.
The Appendix B example then looks like this, along with some small
changes to the text:
Usually, JSON-based formats transport claim values as simple
properties of a JSON object such as this:
_________________________________________
|*JT* |
| |
| ... |
| "family_name": "Möbius", | ö is the single character
| "address": { | LATIN SMALL LETTER O
| "street_address": "Schulstr. 12", | WITH DIAERESIS
| "locality": "Schulpforta" |
| } |
| ... |
|________________________________________|
[In first para, change "byte string" to "byte sequence"
twice, and three more times further down]
JSON, however, does not prescribe a unique representation for
data, allowing for variations in the how it presented. The JSON
text above is only one possibility. Other possible
representations include
________________________________________
|*JT* and *JUEBS* |
| |
| ... |
| "family_name": "M\u00f6bius", |
| "address": { |
| "street_address": "Schulstr. 12", |
| "locality": "Schulpforta" |
| } |
| ... |
|_______________________________________|
and
__________________________________________________________________________
|*JT* |
| |
| ... |
| "family_name": "Möbius", |
| "address": {"locality":"Schulpforta", "street_address":"Schulstr. 12"} |
| ... |
|_________________________________________________________________________|
ö is the single character LATIN SMALL LETTER O WITH DIAERESIS
__________________________________________________________________________
|*JUBS* |
| |
| ... |
| "family_name": "M%xC3%xB6bius", |
| "address": {"locality":"Schulpforta", "street_address":"Schulstr. 12"} |
| ... |
|_________________________________________________________________________|
The two representations of the value in family_name are very
different on the byte-level, but when decoded from UTF-8 byte
sequences to JSON texts, those texts would be parsed into
indistinguishable JSON objects. The same goes for ...
The variations in white space, ordering of object properties, and
representation of Unicode characters are all explicitly allowed
in [RFC8259]. There are further variations, e.g. for floating
point values ([RFC 8785]) and UNICODE combining characters
([UNICODE]).
*Nits*
4. "(for those who celebrate)" will be anywhere from obscure to
confusing for many readers from many cultures -- best to remove it.
4.2.1. "an UTF-8" -> "a UTF-8" [overtaken above]
"However, the digest is calculated over the respective
base64url-encoded value itself, which effectively signs"
->
"Because the digest is calculated over the respective
base64url-encoded value itself, this effectively signs"
4.3.1. I'd recommend
"The bytes of the digest MUST" -> "The bytes of the sd_hash value MUST"
6. I have decoded a few of the Disclosures and they're fine, but you
might want to ask a friendly 3rd party to double-check all the
Disclosures and digests, at least here and in Appendix A.
9. "Security considerations in this section help achieve the
following properties:"
This confused me for a while. I think what you mean to say here
is something like
This spec aims to provide two security guarantees:
*Selective Disclosure*: ...
*Integrity*: ...
The following sub-sections show how the various aspects of the
design presented here combine to achieve this.
_______________________________________________
OAuth mailing list -- [email protected]
To unsubscribe send an email to [email protected]