Suggested approach.

Gosh there's a lot of decisions to make!

Ian.


## Fields giving names and email addresses (entity and email fields)

The format of the content of all fields naming people including
Maintainer, and Uploaders are based on IETF RFC 5322 recipient fields
(e.g., "To:" fields in emails), with some modifications.

We call these "entity and email fields".

Informally:

 * The field is a comma-separated list of `name <email>` where `name`
   can be quoted `"name"` (and may then contain Unicode), or be
   unquoted but then has a restriccted character set which excludes
   Unicode and excludes `,`.

 * There is no `\`-escaping: names simply cannot contain `\` or `"`.

 * The `<email>` part contains no whitespace and has a very
   restrictive character set.

 * The value can be folded to break the lines, except within `"..."`.

Formally:

 * These fields are "multiline" as per [internal xref].
 * The content is an RFC5322 `address-list`,
   with exceptions:
 * Each RFC5322 `mailbox` must be `name-addr`
   with a non-absent `display-name`.
 * Whitespace is not permitted within RFC5322 `angle-addr`.
 * Each RFC5322 `local-part` must be `dot-atom`.
 * Each RFC5322 `phrase` must be either a single `quoted-string`,
   or one or more space-separated `atom`s.
 * The RFC5322 `domain` must be in lowercase.
 * The following RFC5322 constructs are forbidden:
   `obs-*`, `group`, `comment`, `quoted-pair`, `domain-literal`.
 * UTF-8 representing Unioce characters with Graphic basic type
   may occur as part of `qtext` within `qcontent`.
 * Outside RFC5322 `quoted-text`, `FWS` means a single ASCII space,
   or a newline followed by one or more ASCII spaces.
 * Inside RFC5322 `quoted-text`, `FWS` means a single ASCII space.

Currently, Maintainer may only contain one entry, but this is a
semantic, not syntactic restriction, and may be relaxed in the future.

Email addresses that don't fit into `dot-atom@domain` set are
theoretically legal in RFC 5322, but cannot be represented.  However,
these are almost unuseable on the modern Internet.

### Historical notes

The Maintainer and Uploader fields have historically had a more
relaxed, but also inconsistent and confusing syntax.

When existing data is processed:

 * ASCII punctuation characters not permitted in RFC5322 atext might
   be found unquoted in the `phrase` part.

 * This includes commas in Maintainer, but not in Uploaders.

### Processing strategy

A system which doesn't need to understand the field can safely display
it as-is in its entirity.

A system which needs to understand an entity and email field could
proceed as follows:

 * Unfold as if this were a "folded" field, collapsing each whitespace
   sequence into a single space, so we have a single line.

 * Match `"` quotes to identify quoted text.  These quotes always
   appear in pairs.  Check that quoted text contains no `\`.

 * Split the whole field on unqquoted `,`.

   If the field is a Maintainer field and this would result in any
   fragments that do not end in `>`, skip this step.  In the future,
   this rule will be abolished, and only be relevant for old data.

 * Strip whitespace from the ends.

 * Now each entry will end in `<....>`.  That is the email address
   part.

   It has a restricted syntax: the allowable character set is ascii
   alphanumerics plus any of the following punctuation:
      ! # $ % & ' * + - / = ?  ^ _ ` { | } ~

   The email address is in a canonical representation, so can be
   directly compared for equality.

 * The remainer of the entry (with white space normalised to single
   spaces) is the name part.  Strip any `"`.

   The name part may be used for human display and possibly ordering.
   It should not be involved in equality comparisons, lookups, etc.

### Sending emails

To send email to those named in an entity and email field:

Replace any " " that contain non-ASCII with `encoded-words`
as per IETF RFC 1342.

Or, split the address above and use a email header generation library.



-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to