Fix up one mishandled plural in diagnostics in gimple-ssa-sprintf.c

Message ID 20171219172049.GT2353@tucnak
State New
Headers show
Series
  • Fix up one mishandled plural in diagnostics in gimple-ssa-sprintf.c
Related show

Commit Message

Jakub Jelinek Dec. 19, 2017, 5:20 p.m.
Hi!

This inform is correct in english, but wrong in many other languages.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-19  Jakub Jelinek  <jakub@redhat.com>

	* gimple-ssa-sprintf.c (format_directive): Use inform_n instead of
	inform with hardcoded english plural handling.


	Jakub

Comments

Martin Sebor Dec. 19, 2017, 5:49 p.m. | #1
On 12/19/2017 10:20 AM, Jakub Jelinek wrote:
> Hi!

>

> This inform is correct in english, but wrong in many other languages.


I think someone already mentioned this in another review but
the computation below is far from intuitive:

+	      ? (fmtres.range.likely % 1000000) + 1000000
+	      : fmtres.range.likely,

It may not be a big deal in this one isolated instance but
there are many more examples of this problem in this code
and elsewhere that should be fixed (the sprintf warnings
aren't as easy to deal with due to bug 77810).

Other examples of the same problem are going to be in some
of the other warnings I added (-Walloc-size-larger-than,
-Wstringop-overflow/truncation, -Wrestrict).

Since all these should be adjusted and ideally handled
consistently, duplicating the same computation as above
would be tedious and error-prone.  I'd much rather have
a general solution that obviated having to deal with it
at every call site.

Can the math be moved into inform_n (and warning_n) itself?
If that doesn't make sense, then providing a helper for
warning_n and inform_n callers to call might be an alternative.

I CC David for his thoughts on how best to deal with it.

As a separate enhancement, it would also be nice to put in
place a checker (such a post-processing script) to help detect
this problem before it's introduced into the code base.

Martin

>

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

>

> 2017-12-19  Jakub Jelinek  <jakub@redhat.com>

>

> 	* gimple-ssa-sprintf.c (format_directive): Use inform_n instead of

> 	inform with hardcoded english plural handling.

>

> --- gcc/gimple-ssa-sprintf.c.jj	2017-12-14 12:00:32.000000000 +0100

> +++ gcc/gimple-ssa-sprintf.c	2017-12-19 12:19:10.975042433 +0100

> @@ -2933,13 +2933,15 @@ format_directive (const sprintf_dom_walk

>

>    if (warned && fmtres.range.min < fmtres.range.likely

>        && fmtres.range.likely < fmtres.range.max)

> -    {

> -      inform (info.fmtloc,

> -	      (1 == fmtres.range.likely

> -	       ? G_("assuming directive output of %wu byte")

> -	       : G_("assuming directive output of %wu bytes")),

> +    /* Some languages have special plural rules even for large values,

> +       but it is periodic with period of 10, 100, 1000 etc.  */

> +    inform_n (info.fmtloc,

> +	      fmtres.range.likely > INT_MAX

> +	      ? (fmtres.range.likely % 1000000) + 1000000

> +	      : fmtres.range.likely,

> +	      "assuming directive output of %wu byte",

> +	      "assuming directive output of %wu bytes",

>  	      fmtres.range.likely);

> -    }

>

>    if (warned && fmtres.argmin)

>      {

>

> 	Jakub

>
Jakub Jelinek Dec. 19, 2017, 5:58 p.m. | #2
On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:
> Can the math be moved into inform_n (and warning_n) itself?


No.  I'm against having dozens of inform_n and warning_n etc.
overloads, it is already bad enough we have so many diagnostics
entry points now that putting breakpoints on them is so hard.
Those functions just have int, and we can have
perhaps some other function (plural_count or some better name)
that is inline and overloaded on the various types and can be used
if the count isn't int.

As for gimple-ssa-sprintf.c, there are several other cases that
want a plural form, but I haven't changed them because
format_warning_at_substring doesn't have _n suffixed variant.

	Jakub
Richard Biener Dec. 19, 2017, 6:02 p.m. | #3
On December 19, 2017 6:58:15 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

>> Can the math be moved into inform_n (and warning_n) itself?

>

>No.  I'm against having dozens of inform_n and warning_n etc.

>overloads, it is already bad enough we have so many diagnostics

>entry points now that putting breakpoints on them is so hard.

>Those functions just have int, and we can have

>perhaps some other function (plural_count or some better name)

>that is inline and overloaded on the various types and can be used

>if the count isn't int.

>

>As for gimple-ssa-sprintf.c, there are several other cases that

>want a plural form, but I haven't changed them because

>format_warning_at_substring doesn't have _n suffixed variant.


Can't we simply use bytes (plural) and not care for the detail of special values like one? Other languages (Czech?) even have special cases for two and three... 

Richard. 

>	Jakub
Jakub Jelinek Dec. 19, 2017, 6:09 p.m. | #4
On Tue, Dec 19, 2017 at 07:02:38PM +0100, Richard Biener wrote:
> On December 19, 2017 6:58:15 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:

> >On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

> >> Can the math be moved into inform_n (and warning_n) itself?

> >

> >No.  I'm against having dozens of inform_n and warning_n etc.

> >overloads, it is already bad enough we have so many diagnostics

> >entry points now that putting breakpoints on them is so hard.

> >Those functions just have int, and we can have

> >perhaps some other function (plural_count or some better name)

> >that is inline and overloaded on the various types and can be used

> >if the count isn't int.

> >

> >As for gimple-ssa-sprintf.c, there are several other cases that

> >want a plural form, but I haven't changed them because

> >format_warning_at_substring doesn't have _n suffixed variant.

> 

> Can't we simply use bytes (plural) and not care for the detail of special

> values like one?  Other languages (Czech?) even have special cases for two

> and three...


If it is supposed to be gramattically correct, it should use the _n suffixed
APIs.  There are languages with just 1 or 2, 3, 4 and 6 forms, see
https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html
for details.

	Jakub
Martin Sebor Dec. 19, 2017, 6:19 p.m. | #5
On 12/19/2017 10:58 AM, Jakub Jelinek wrote:
> On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

>> Can the math be moved into inform_n (and warning_n) itself?

>

> No.  I'm against having dozens of inform_n and warning_n etc.


The question/suggestion is to change the existing inform_n and
warning_n functions to do the math instead of having each caller
do it.  (I.e., not add any more overloads.)

> overloads, it is already bad enough we have so many diagnostics

> entry points now that putting breakpoints on them is so hard.

> Those functions just have int, and we can have

> perhaps some other function (plural_count or some better name)

> that is inline and overloaded on the various types and can be used

> if the count isn't int.


If the change I suggested above doesn't work for some reason
then let's consider introducing such a helper function.

> As for gimple-ssa-sprintf.c, there are several other cases that

> want a plural form, but I haven't changed them because

> format_warning_at_substring doesn't have _n suffixed variant.


Yes, I think I mentioned that.  It involves adding even more
"overloads."  I'm not a fan of a proliferation of APIs with
only subtle differences either.  They are too easy to misuse.
I'd prefer fewer "smart" APIs that git it right on their own.
It's probably too late to add something like that for GCC 8
but I'm hoping we can come up with an idea and implement it
in GCC 9 to avoid these pitfalls while (hopefully) also
making it easier to write messages involving more than one
plural form.

Martin
Jakub Jelinek Dec. 19, 2017, 6:28 p.m. | #6
On Tue, Dec 19, 2017 at 11:19:44AM -0700, Martin Sebor wrote:
> On 12/19/2017 10:58 AM, Jakub Jelinek wrote:

> > On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

> > > Can the math be moved into inform_n (and warning_n) itself?

> > 

> > No.  I'm against having dozens of inform_n and warning_n etc.

> 

> The question/suggestion is to change the existing inform_n and

> warning_n functions to do the math instead of having each caller

> do it.  (I.e., not add any more overloads.)


The "math" is a mapping from some type (in this case UHWI) to
some other type (int, used by our *_n APIs), so that it gives the
same plural forms for known translations.
Are you suggesting we change the type of the *_n APIs to UHWI
instead?  What if the caller needs something different?  E.g.
wide_int, widest_int, offset_int, __int128, whatever else?

	Jakub
Martin Sebor Dec. 19, 2017, 6:59 p.m. | #7
On 12/19/2017 11:28 AM, Jakub Jelinek wrote:
> On Tue, Dec 19, 2017 at 11:19:44AM -0700, Martin Sebor wrote:

>> On 12/19/2017 10:58 AM, Jakub Jelinek wrote:

>>> On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

>>>> Can the math be moved into inform_n (and warning_n) itself?

>>>

>>> No.  I'm against having dozens of inform_n and warning_n etc.

>>

>> The question/suggestion is to change the existing inform_n and

>> warning_n functions to do the math instead of having each caller

>> do it.  (I.e., not add any more overloads.)

>

> The "math" is a mapping from some type (in this case UHWI) to

> some other type (int, used by our *_n APIs), so that it gives the

> same plural forms for known translations.


I understand that.  Presumably this mapping should be the same
for all messages (whether we're dealing with "bytes" or "elements"
or whatever other quantity).

> Are you suggesting we change the type of the *_n APIs to UHWI

> instead?  What if the caller needs something different?  E.g.

> wide_int, widest_int, offset_int, __int128, whatever else?


I'd expect HWI (either signed or unsigned) to work for existing
callers.  There is (unfortunately) no directive to format wide
ints so callers that also format the number (as in "%wu bytes")
in addition to using it to choose a plural vs singular form have
to convert it to an integer as it is(*).

If/when in the future a directive to format an offset_int and
wide_int (and the various flavors of poly_int) is added we can
revise the API to take whatever type is the most appropriate,
or (more likely) add true overloads (i.e., APIs with the same
name but different type) to handle each form.

My "vision" is to have a single warning_n() name (if not
a single function) handle all there nuances so we don't have
to remember to use waning_uhwi_n() and warning_wide_int_n()
and warning_poly_int_n(), etc., or even call
get_plural_form (n), but still get values of all types
formatted optimally and consistently.

Martin

[*] Extending the dollar flag (or coming up with some other
     intuitive notation) to refer to the N argument might be
     a useful enhancement to make the association between it
     and the form of the message explicit:

     inform_n (loc, nbytes,
               "%0$wu byte",
               "%0$wu bytes");

     and avoid the mistake of passing one value as N and another
     as the argument to the numeric directive.
Jeff Law Dec. 19, 2017, 8:32 p.m. | #8
On 12/19/2017 10:20 AM, Jakub Jelinek wrote:
> Hi!

> 

> This inform is correct in english, but wrong in many other languages.

> 

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

> 

> 2017-12-19  Jakub Jelinek  <jakub@redhat.com>

> 

> 	* gimple-ssa-sprintf.c (format_directive): Use inform_n instead of

> 	inform with hardcoded english plural handling.

OK.
jeff
David Malcolm Dec. 20, 2017, 4:31 p.m. | #9
On Tue, 2017-12-19 at 18:58 +0100, Jakub Jelinek wrote:
> On Tue, Dec 19, 2017 at 10:49:07AM -0700, Martin Sebor wrote:

> > Can the math be moved into inform_n (and warning_n) itself?

> 

> No.  I'm against having dozens of inform_n and warning_n etc.

> overloads, it is already bad enough we have so many diagnostics

> entry points now that putting breakpoints on them is so hard.


FWIW, I added a "break-on-diagnostic" command to our .gdbinit script,
in case that helps:

  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01382.html

> Those functions just have int, and we can have

> perhaps some other function (plural_count or some better name)

> that is inline and overloaded on the various types and can be used

> if the count isn't int.

> 

> As for gimple-ssa-sprintf.c, there are several other cases that

> want a plural form, but I haven't changed them because

> format_warning_at_substring doesn't have _n suffixed variant.

> 

> 	Jakub

Patch

--- gcc/gimple-ssa-sprintf.c.jj	2017-12-14 12:00:32.000000000 +0100
+++ gcc/gimple-ssa-sprintf.c	2017-12-19 12:19:10.975042433 +0100
@@ -2933,13 +2933,15 @@  format_directive (const sprintf_dom_walk
 
   if (warned && fmtres.range.min < fmtres.range.likely
       && fmtres.range.likely < fmtres.range.max)
-    {
-      inform (info.fmtloc,
-	      (1 == fmtres.range.likely
-	       ? G_("assuming directive output of %wu byte")
-	       : G_("assuming directive output of %wu bytes")),
+    /* Some languages have special plural rules even for large values,
+       but it is periodic with period of 10, 100, 1000 etc.  */
+    inform_n (info.fmtloc,
+	      fmtres.range.likely > INT_MAX
+	      ? (fmtres.range.likely % 1000000) + 1000000
+	      : fmtres.range.likely,
+	      "assuming directive output of %wu byte",
+	      "assuming directive output of %wu bytes",
 	      fmtres.range.likely);
-    }
 
   if (warned && fmtres.argmin)
     {