Le Sat, Aug 28, 2010 at 01:34:35PM -0700, Russ Allbery a écrit : > Charles Plessy <ple...@debian.org> writes: > > Le Fri, Aug 27, 2010 at 10:24:57AM -0700, Russ Allbery a écrit : > > >> In fields where the value may not span multiple lines, the amount > >> of whitespace in the field body is not significant. Any amount of > >> whitespace is equivalent to a single space. Whitespace must not > >> appear inside names (of packages, architectures, files, or anything > >> else) or version numbers, or between the characters of > >> multi-character versoin relationships. > > > I still have difficulties to understand the meaning of this paragraph, > > and to what fields it applies. Is it just specifiying that the parser, > > in the case of fields that allow continuation lines, can be either > > intructed to replace all white spaces and newlines by single spaces, or > > to leave the value as it is, including the new lines? > > No, it's really trying to say that the amount of whitespace isn't > significant. I'm not sure how else to explain it. That's one of those > precise terms of art for which there isn't really an acceptable synonym, > at least not that I can think of. Replacing all whitespace with a single > space is one of the things that you can do *because* the amount of > whitespace is not significant, but it's not an equivalent statement.
Dear Russ and everybody since I was still confused, I comprehensively inspected the wordings in chapter 5, to better understand what is meant by ’span multiple lines’. First, as a sidenote, no field specifies that it may not span multiple lines. I therefore agree with you that it is an implicit default case, and propose to make it explicit in § 5.1 (see below). I then looked at which field description specifies that they ‘may span multiple lines’. These are the relationship fields (Depends etc., §7.1), the Binary field, and the Uploaders field, but only in source package control files. The Files and Checksums-* fields, on the other hand are described as ‘multiline fields’. Lastly, nothing is specified for the Description and Changes fields, perhaps because it is so obvious. It looks like ‘may span multiple lines’ means that continuation lines are allowed but newlines are not significant, and ‘multiline fields’ means that continuation lines are allowed and newlines are significant. At this point, I am not sure what is expected from the parsers: deliver the value of the fields that ‘may span multiple lines’ with or without newlines? In any casee, I find this similarity of terminology very confusing. I therefore propose to replace ‘may span multiple lines’ by ‘continuation lines are allowed’ and add it where it was implicit, and add ‘continuation lines are allowed and newlines are significant’ where needed as well. I noted another confusing sentence on the subject, in §5.2: ‘Many fields are permitted to span multiple lines in <file>debian/control</file> but not in any other control file’. Actually, I found this to be true only for the Uploaders field. I propose to replace it by ‘Continuation lines can be permitted for some fields in <file>debian/control</file> but not in any other control file.’ Together with the separation that you suggested in your answer, the rewording that I propose allows to dramaticly slim down—and in my opinion, clarify—the paragraph that discusses mulitple line spanning and whitespace: <p> Continuation lines need to be allowed for each field separately. When continuation lines are allowed, whitespace including newlines is not significant in the fields values, unless specified otherwise. </p> <p> Whitespace must not appear inside names (of packages, architectures, files or anything else) or version numbers, or between the characters of multi-character version relationships. </p> This would need to be changed if the parser is expected to deliver the field's value with the newlines already stripped, except for fields where whitespace is speficied to be significant. Note that in all cases the trailing whitespace of a continuation line is not included in field values, since they are removed as part of the parsing of the continuation lines. I attached a patch that tries to summarise the above and the points discussed in other messages of this thread. It also includes a minor correction for the Essential field, to bring it in line with the nomenclature used in this chapter. I also tried to replace occurences of ‘wrap’ and ‘span’ by less ambiguous words. Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan
>From 8e2afc25d07bee246e546801a9a0b81bb8526454 Mon Sep 17 00:00:00 2001 From: Charles Plessy <ple...@debian.org> Date: Fri, 3 Sep 2010 10:30:55 +0900 Subject: [PATCH] Clarifications on the syntax of control fields. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit See ‘http://bugs.debian.org/593909’. --- policy.sgml | 96 +++++++++++++++++++++++++++++++++-------------------------- 1 files changed, 54 insertions(+), 42 deletions(-) diff --git a/policy.sgml b/policy.sgml index 9037de8..666590e 100644 --- a/policy.sgml +++ b/policy.sgml @@ -2450,19 +2450,22 @@ endif fields<footnote> The paragraphs are also sometimes referred to as stanzas. </footnote>. - The paragraphs are separated by blank lines. Some control + The paragraphs are separated by empty lines. As a special exception + for backwards compatibility, parsers may accept lines consisting + solely of spaces and tabs as paragraph separators. Some control files allow only one paragraph; others allow several, in which case each paragraph usually refers to a different package. (For example, in source packages, the first paragraph refers to the source package, and later paragraphs - refer to binary packages generated from the source.) + refer to binary packages generated from the source.). The + ordering of the paragraphs in control files is significant. </p> <p> Each paragraph consists of a series of data fields; each field consists of the field name, followed by a colon and then the data/value associated with that field. It ends at - the end of the (logical) line. Horizontal whitespace + the end of a logical line (see below). Horizontal whitespace (spaces and tabs) may occur immediately before or after the value and is ignored there; it is conventional to put a single space after the colon. For example, a field might @@ -2480,22 +2483,31 @@ Package: libc6 </p> <p> - Many fields' values may span several lines; in this case - each continuation line must start with a space or a tab. - Any trailing spaces or tabs at the end of individual - lines of a field value are ignored. + Fields values may be contained in a logical line that spans + several lines; these lines are called continuation lines and + must start with a space or a tab. Any trailing spaces or tabs + at the end of individual lines of a field value are ignored. </p> <p> - In fields where it is specified that lines may not wrap, - only a single line of data is allowed and whitespace is not - significant in a field body. Whitespace must not appear + Continuation lines need to be allowed for each field separately. + When continuation lines are allowed, whitespace including newlines + is not significant in the field values, unless specified otherwise. + </p> + + <p> + Whitespace must not appear inside names (of packages, architectures, files or anything else) or version numbers, or between the characters of multi-character version relationships. </p> <p> + The presence and purpose of a field, and the syntax of its + value may differ between types of control files. + </p> + + <p> Field names are not case-sensitive, but it is usual to capitalize the field names using mixed case as shown below. Field values are case-sensitive unless the description of the @@ -2503,9 +2515,17 @@ Package: libc6 </p> <p> - Blank lines, or lines consisting only of spaces and tabs, - are not allowed within field values or between fields - that - would mean a new paragraph. + Paragraph separators (empty lines) and lines consisting only of + spaces and tabs are not allowed within field values or between + fields. Empty lines in field values are usually escaped by + representing them by a space followed by a dot. + </p> + + <p> + Lines starting with # without any preceding whitespace are comments + lines that are only permitted in source package control files + (<file>debian/control</file>). These comment lines are ignored, even + in the middle of continuation lines. They do not end logical lines. </p> <p> @@ -2570,7 +2590,7 @@ Package: libc6 <file>.changes</file> file to accompany the upload, and by <prgn>dpkg-source</prgn> when it creates the <file>.dsc</file> source control file as part of a source - archive. Many fields are permitted to span multiple lines in + archive. Continuation lines can be permitted for some fields in <file>debian/control</file> but not in any other control file. These tools are responsible for removing the line breaks from such fields when using fields from @@ -2584,16 +2604,6 @@ Package: libc6 when they generate output control files. See <ref id="substvars"> for details. </p> - - <p> - In addition to the control file syntax described <qref - id="controlsyntax">above</qref>, this file may also contain - comment lines starting with <tt>#</tt> without any preceding - whitespace. All such lines are ignored, even in the middle of - continuation lines for a multiline field, and do not end a - multiline field. - </p> - </sect> <sect id="binarycontrolfiles"> @@ -2791,11 +2801,10 @@ Package: libc6 </p> <p> - Any parser that interprets the Uploaders field in - <file>debian/control</file> must permit it to span multiple - lines. Line breaks in an Uploaders field that spans multiple - lines are not significant and the semantics of the field are - the same as if the line breaks had not been present. + Continuation lines are allowed in the Uploaders field in + <file>debian/control</file>. Line breaks are not significant and + the semantics of the field are the same as if the line breaks had + not been present. </p> </sect1> @@ -2975,7 +2984,7 @@ Package: libc6 <p> This is a boolean field which may occur only in the control file of a binary package or in a per-package fields - paragraph of a main source control data file. + paragraph of a source package control file. </p> <p> @@ -3211,7 +3220,8 @@ Package: libc6 In a source or binary control file, the <tt>Description</tt> field contains a description of the binary package, consisting of two parts, the synopsis or the short description, and the - long description. The field's format is as follows: + long description. Continuation lines are allowed and whitespace + is significant. The field's format is as follows: </p> <p> @@ -3417,6 +3427,7 @@ Package: libc6 </p> <p> + Continuation lines are allowed and whitespace is significant. The first line of the field value (the part on the same line as <tt>Changes:</tt>) is always empty. The content of the field is expressed as continuation lines, with each line @@ -3460,7 +3471,7 @@ Package: libc6 packages which a source package can produce, separated by commas<footnote> A space after each comma is conventional. - </footnote>. It may span multiple lines. The source package + </footnote>. Continuation lines are allowed. The source package does not necessarily produce all of these binary packages for every architecture. The source control file doesn't contain details of which architectures are appropriate for which of @@ -3470,7 +3481,7 @@ Package: libc6 <p> When it appears in a <file>.changes</file> file, it lists the names of the binary packages being uploaded, separated by - whitespace (not commas). It may span multiple lines. + whitespace (not commas). Continuation lines are allowed. </p> </sect1> @@ -3502,7 +3513,8 @@ Package: libc6 </p> <p> - In all cases, Files is a multiline field. The first line of + In all cases, continuation lines are allowed and + whitespace is significant. The first line of the field value (the part on the same line as <tt>Files:</tt>) is always empty. The content of the field is expressed as continuation lines, one line per file. Each line must be @@ -3602,8 +3614,8 @@ Files: </p> <p> - <tt>Checksums-Sha1</tt> and <tt>Checksums-Sha256</tt> are - multiline fields. The first line of the field value (the part + Continuation lines are allowed and whitespace is significant. + The first line of the field value (the part on the same line as <tt>Checksums-Sha1:</tt> or <tt>Checksums-Sha256:</tt>) is always empty. The content of the field is expressed as continuation lines, one line per @@ -4426,16 +4438,16 @@ Checksums-Sha256: Whitespace may appear at any point in the version specification subject to the rules in <ref id="controlsyntax">, and must appear where it's necessary to - disambiguate; it is not otherwise significant. All of the - relationship fields may span multiple lines. For - consistency and in case of future changes to + disambiguate; it is not otherwise significant. + Continuation lines are allowed in all of the relationship fields. + For consistency and in case of future changes to <prgn>dpkg</prgn> it is recommended that a single space be used after a version relationship and before a version number; it is also conventional to put a single space after each comma, on either side of each vertical bar, and before - each open parenthesis. When wrapping a relationship field, it - is conventional to do so after a comma and before the space - following that comma. + each open parenthesis. When opening a continuation line in a + relationship field, it is conventional to do so after a comma + and before the space following that comma. </p> <p> -- 1.7.1