On 12/23/2016 02:25 PM, Martin Sebor wrote:
Bug 78703 points out that the decimal point character in floating
directives can be longer than just one byte (in locales where the
decimal point is a multibyte character).  The decimal point can
result in anywhere between 1 and MB_LEN_MAX bytes.  This is unlikely
but locales with two-byte decimal point are known to exist, and
the gimple-ssa-sprintf pass must handle them correctly.

In a comment on the bug Jakub suggests that while printf return
value optimization must correctly deal with the worst case (i.e.,
MB_LEN_MAX of 6 for UTF-8), reflecting the worst case in the text
of warnings could be confusing to users most of whom expect
a single byte decimal point.

Finally, a limitation of the gimple-ssa-sprintf pass has been that
it only understands constant width and precision and treats others
as essentially unlimited even if they are constrained to a limited
range of values.  This results in false positives and negatives
that can be avoided.

The attached patch enhances the pass to overcome both of these
limitations.  It does that by first replacing the exact byte counter
with two other counters: 1) a likely counter that tracks the number
of bytes a directive is likely to result in, and 2) an "unlikely"
byte for lack of a better name, that tracks the unlikely maximum
byte count in cases like multibyte decimal point, and second by
adding range handling for width and precision specified by the
asterisk (such as in sprintf("%*.*i", w, p, i)).

The patch resulted in more extensive changes than I initially
intended but the result is a simplified implementation.  A good
amount of the changes is factoring code out into more general
functions that can be shared throughout the pass.

With these enhancements, although the support for ranges in the
pass is complete, it's not as robust as it could be.  I think
having the pass run later could improve things.

The pass does produce a fair number of warnings for calls to
snprintf in the linux kernel.  Some of these I suspect will be
considered false positives.  I think it might be worth splitting
up the snprintf warning from -Wformat-length and adding a separate
option to control it.

Martin

gcc-78703.diff


PR middle-end/78703 -  -fprintf-return-value floating point handling incorrect 
in locales with a mulltibyte decimal point

gcc/ChangeLog:

        PR middle-end/78703
        * gimple-ssa-sprintf.c (get_int_range): New function.
        (struct result_range): Add members.
        (struct format_result): Replace number_chars, number_chars_min, and
        number_chars_max, with struct result_ramge.  Remove constant.
s/result_ramge/result_range/

        (format_result::operator+=): Update and define out of class.
        (struct fmtresult): Add constructors.  Remove constant and bounded
        members.
        (format_result::type_max_digits): New function.
        (format_result::adjust_for_width_and_precision): New function.
        (struct conversion_spec): Rename...
        (struct directive): ...to this.
        (struct directive): Add new data members.
        (directive::set_width, directive::set_precison): New functions.
        (bytes_remaining, get_int_range, format_character, format_plain): Same.
        (should_warn_p, maybe_warn, parse_directive): Same.
        (min_bytes_remaining, add_bytes): Remove.
        (format_percent, get_string_length): Simplify.
        (format_integer): Handle width and precision ranges.
        (format_floating): Same.
        (get_mpfr_format_length): Work around MPFR bugs and simplify.
        (format_string): Factor single character handling into
        format_character.  Handle width and precision ranges.
        (format_directive): Factor out most warning code into maybe_warn.
        (pass_sprintf_length::compute_format_length): Factor out parsing
        into parse_directive.
        (try_substitute_return_value): Handle unlikely maximum byte counter.
        Simplify for better clarity.

gcc/testsuite/ChangeLog:

        PR middle-end/78703
        * gcc.dg/tree-ssa/builtin-sprintf-2.c: Adjust.
        * gcc.dg/tree-ssa/builtin-sprintf-5.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-7.c: Same.
        * gcc.dg/tree-ssa/builtin-sprintf-warn-9.c: New test.
        * gcc.dg/tree-ssa/builtin-sprintf.c: Adjust.
        * gcc.dg/format/pr78569.c: Same.
You weren't kidding when you mentioned this patch was going to be painful to review. Glad this is the last mega-patch in this space and that we'll be breaking them down into more manageable stuff from here out.

As I mentioned on the phone, you know the space better than I, so I'm largely going to assume the low level technical bits are correct and focus more on the higher level concepts, places where this code interacts with other parts of GCC and nit cleanup.

I must admit a certain amount of concern WRT the introduction of likely/unlikely counters. If I were working on a codebase where locales were important, I'd *really* like to know if I didn't account for the multi-byte decimal points and such.

I'd like to give interested folks a chance to chime in on this issue.




diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index a91dcb8..904e4f1 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -145,50 +149,53 @@ pass_sprintf_length::gate (function *)
          && (optimize > 0) == fold_return_value);
 }

+/* The minimum, maximum, likely, and unlikely maximum number of bytes
+   of output either a formatting function or an individual directive
+   can result in.  */
+
+struct result_range
+{
+  /* The absolute minimum number of bytes.  The result of a successful
+     conversion is guaranteed to be no less than this.  (An erroneous
+     conversion can be indicated by MIN > HOST_WIDE_INT_MAX.)  */
+  unsigned HOST_WIDE_INT min;
+  /* The likely maximum result that is used in diagnostics.  In most
+     cases MAX is the same as the worst case UNLIKELY result.  */
+  unsigned HOST_WIDE_INT max;
+  /* The likely result used to trigger diagnostics.  For conversions
+     that result in a range of bytes [MIN, MAX], LIKELY is somewhere
+     in that range.  */
+  unsigned HOST_WIDE_INT likely;
+  /* In rare cases (e.g., for nultibyte characters) UNLIKELY gives
s/nultibyte/multibyte/

@@ -456,37 +492,129 @@ struct fmtresult
      heuristics that depend on warning levels.  */
   bool knownrange;

-  /* True when the range is the result of an argument determined
-     to be bounded to a subrange of its type or value (such as by
-     value range propagation or the width of the formt directive),
s/formt/format/

-/* Description of a conversion specification.  */
+/* Adjust result upward to reflect the range ADJUST of values the
+   specified width or precision is known to be in.  When non-null,
+   TYPE denotes the type of the directive whose result is being
+   adjusted, BASE gives the base of the directive (octal, decimal,
+   or hex), and ADJ denotes the additional adjustment to the LIKELY
+   counter that may need to be added when ADJUST is a range.  */
+
+fmtresult&
+fmtresult::adjust_for_width_or_precision (const HOST_WIDE_INT adjust[2],
+                                         tree type /* = NULL_TREE */,
+                                         unsigned base /* = 0 */,
+                                         unsigned adj /* = 0 */)
+{
+  bool minadjusted = false;
+
+  /* Adjust the minimum and likely counters.  */
+  if (0 <= adjust[0])
Swap the order of operands to match canonical form.

+    {
+      if (range.min < (unsigned HOST_WIDE_INT)adjust[0])
+       {
+         range.min = adjust[0];
+         minadjusted = true;
+       }
+
+      /* Adjust the likely counter.  */
+      if (range.likely < range.min)
+       range.likely = range.min;
+    }
+  else if (adjust[0] == target_int_min ()
+          && (unsigned HOST_WIDE_INT)adjust[1] == target_int_max ())
+    {
+      knownrange = false;
+    }
+
+  /* Adjust the maximum counter.  */
+  if (0 < adjust[1])
Likewise.


@@ -494,18 +622,13 @@ struct conversion_spec
   /* Format specifier character.  */
   char specifier;

-  /* Numeric width was given.  */
-  unsigned have_width: 1;
-  /* Numeric precision was given.  */
-  unsigned have_precision: 1;
-  /* Non-zero when certain flags should be interpreted even for a directive
-     that normally doesn't accept them (used when "%p" with flags such as
-     space or plus is interepreted as a "%x".  */
s/interepreted/interpreted/


@@ -767,57 +869,94 @@ build_intmax_type_nodes (tree *pintmax, tree *puintmax)
     }
 }

-/* Set *PWIDTH and *PPREC according to the width and precision specified
-   in SPEC.  Each is set to HOST_WIDE_INT_MIN when the corresponding
-   field is specified but unknown, to zero for width and -1 for precision,
-   respectively when it's not specified, or to a non-negative value
-   corresponding to the known value.  */
+/* Determine the range [*PMIN, *PMAX] that the expression ARG of TYPE
+   is in.  Return true when the range is a subrange of that of TYPE.
+   Whn ARG is null it is as if it had the full range of TYPE.
s/Whn/When/

More to follow, got to run

Jeff

Reply via email to