Fix multibyte-related issues in pretty-print.c (PR 91843)

Message ID 20191010202746.GA53480@ldh.local
State New
Headers show
Series
  • Fix multibyte-related issues in pretty-print.c (PR 91843)
Related show

Commit Message

Lewis Hyatt Oct. 10, 2019, 8:27 p.m.
Hello-

This short patch addresses https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843
by adding the needed multibyte awareness to pretty-print.c.
Together with my other patch awaiting review
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this fixes all
issues that I am aware of regarding printing diagnostics with multibyte
characters in UTF-8 locales. Would you please have a look and see if it's OK?
Thanks very much.

bootstrapped and tested on x86-64 Linux, all test results were identical before
and after:
34 XPASS
109 FAIL
1490 XFAIL
9470 UNSUPPORTED
332971 PASS

-Lewis
gcc/ChangeLog:

2019-10-10  Lewis Hyatt  <lhyatt@gmail.com>

	PR 91853
	* pretty-print.c (pp_quoted_string): Avoid hex-escaping valid
	multibyte input. Fix off-by-one-bug printing the last byte before a
	hex-escaped output.
	(pp_character): Don't apply line wrapping in the middle of multibyte
	characters.
	(test_utf8): New test.
	(pretty_print_c_tests): Call the new test.

Comments

Lewis Hyatt Nov. 11, 2019, 11:50 p.m. | #1
Hello-

Would it be appropriate to ping this patch at this point? It would be
great if someone can review it please, it's relatively short and it
fixes one of the two noticeable issues with extended identifier
diagnostics. Thanks very much!

https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00766.html

-Lewis

On Thu, Oct 10, 2019 at 4:27 PM Lewis Hyatt <lhyatt@gmail.com> wrote:
>

> Hello-

>

> This short patch addresses https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843

> by adding the needed multibyte awareness to pretty-print.c.

> Together with my other patch awaiting review

> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this fixes all

> issues that I am aware of regarding printing diagnostics with multibyte

> characters in UTF-8 locales. Would you please have a look and see if it's OK?

> Thanks very much.

>

> bootstrapped and tested on x86-64 Linux, all test results were identical before

> and after:

> 34 XPASS

> 109 FAIL

> 1490 XFAIL

> 9470 UNSUPPORTED

> 332971 PASS

>

> -Lewis
David Malcolm Dec. 9, 2019, 9:58 p.m. | #2
On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:
> Hello-

> 

> This short patch addresses 

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843

> by adding the needed multibyte awareness to pretty-print.c.

> Together with my other patch awaiting review

> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this

> fixes all

> issues that I am aware of regarding printing diagnostics with

> multibyte

> characters in UTF-8 locales. Would you please have a look and see if

> it's OK?

> Thanks very much.

> 

> bootstrapped and tested on x86-64 Linux, all test results were

> identical before

> and after:

> 34 XPASS

> 109 FAIL

> 1490 XFAIL

> 9470 UNSUPPORTED

> 332971 PASS

> 

> -Lewis


Patch looks good to me.

Do you want SVN commit access, as per:
  https://www.gnu.org/software/gcc/svnwrite.html
?

I'm willing to sponsor you.

Dave
Lewis Hyatt Dec. 9, 2019, 11:02 p.m. | #3
On Mon, Dec 9, 2019 at 4:58 PM David Malcolm <dmalcolm@redhat.com> wrote:
>

> On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:

> > Hello-

> >

> > This short patch addresses

> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843

> > by adding the needed multibyte awareness to pretty-print.c.

> > Together with my other patch awaiting review

> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this

> > fixes all

> > issues that I am aware of regarding printing diagnostics with

> > multibyte

> > characters in UTF-8 locales. Would you please have a look and see if

> > it's OK?

> > Thanks very much.

> >

> > bootstrapped and tested on x86-64 Linux, all test results were

> > identical before

> > and after:

> > 34 XPASS

> > 109 FAIL

> > 1490 XFAIL

> > 9470 UNSUPPORTED

> > 332971 PASS

> >

> > -Lewis

>

> Patch looks good to me.

>

> Do you want SVN commit access, as per:

>   https://www.gnu.org/software/gcc/svnwrite.html

> ?

>

> I'm willing to sponsor you.

>

> Dave

>


Thanks, that sounds great. I will submit the form then.

-Lewis
Lewis Hyatt Dec. 10, 2019, 2:36 p.m. | #4
On Mon, Dec 9, 2019 at 4:58 PM David Malcolm <dmalcolm@redhat.com> wrote:
>

> On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:

> > Hello-

> >

> > This short patch addresses

> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843

> > by adding the needed multibyte awareness to pretty-print.c.

> > Together with my other patch awaiting review

> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this

> > fixes all

> > issues that I am aware of regarding printing diagnostics with

> > multibyte

> > characters in UTF-8 locales. Would you please have a look and see if

> > it's OK?

> > Thanks very much.

> >

> > bootstrapped and tested on x86-64 Linux, all test results were

> > identical before

> > and after:

> > 34 XPASS

> > 109 FAIL

> > 1490 XFAIL

> > 9470 UNSUPPORTED

> > 332971 PASS

> >

> > -Lewis

>

> Patch looks good to me.

>

> Do you want SVN commit access, as per:

>   https://www.gnu.org/software/gcc/svnwrite.html

> ?

>

> I'm willing to sponsor you.

>

> Dave

>


All set with the commit access, thank you. I just wanted to double
check that this patch is OK to go in.

-Lewis
David Malcolm Dec. 11, 2019, 2:29 a.m. | #5
On Tue, 2019-12-10 at 09:36 -0500, Lewis Hyatt wrote:
> On Mon, Dec 9, 2019 at 4:58 PM David Malcolm <dmalcolm@redhat.com>

> wrote:

> > On Thu, 2019-10-10 at 16:27 -0400, Lewis Hyatt wrote:

> > > Hello-

> > > 

> > > This short patch addresses

> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91843

> > > by adding the needed multibyte awareness to pretty-print.c.

> > > Together with my other patch awaiting review

> > > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01627.html), this

> > > fixes all

> > > issues that I am aware of regarding printing diagnostics with

> > > multibyte

> > > characters in UTF-8 locales. Would you please have a look and see

> > > if

> > > it's OK?

> > > Thanks very much.

> > > 

> > > bootstrapped and tested on x86-64 Linux, all test results were

> > > identical before

> > > and after:

> > > 34 XPASS

> > > 109 FAIL

> > > 1490 XFAIL

> > > 9470 UNSUPPORTED

> > > 332971 PASS

> > > 

> > > -Lewis

> > 

> > Patch looks good to me.

> > 

> > Do you want SVN commit access, as per:

> >   https://www.gnu.org/software/gcc/svnwrite.html

> > ?

> > 

> > I'm willing to sponsor you.

> > 

> > Dave

> > 

> 

> All set with the commit access, thank you. 


Excellent.

> I just wanted to double

> check that this patch is OK to go in.


Looks good to me.

Dave

Patch

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index c57a3dbd887..742f3d23725 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -699,6 +699,8 @@  mingw_ansi_fputs (const char *str, FILE *fp)
 
 #endif /* __MINGW32__ */
 
+static int
+decode_utf8_char (const unsigned char *, size_t len, unsigned int *);
 static void pp_quoted_string (pretty_printer *, const char *, size_t = -1);
 
 /* Overwrite the given location/range within this text_info's rich_location.
@@ -1689,6 +1691,8 @@  void
 pp_character (pretty_printer *pp, int c)
 {
   if (pp_is_wrapping_line (pp)
+      /* If printing UTF-8, don't wrap in the middle of a sequence.  */
+      && (((unsigned int) c) & 0xC0) != 0x80
       && pp_remaining_character_count_for_line (pp) <= 0)
     {
       pp_newline (pp);
@@ -1729,8 +1733,22 @@  pp_quoted_string (pretty_printer *pp, const char *str, size_t n /* = -1 */)
       if (ISPRINT (*ps))
 	  continue;
 
+      /* Don't escape a valid UTF-8 extended char.  */
+      const unsigned char *ups = (const unsigned char *) ps;
+      if (*ups & 0x80)
+	{
+	  unsigned int extended_char;
+	  const int valid_utf8_len = decode_utf8_char (ups, n, &extended_char);
+	  if (valid_utf8_len > 0)
+	    {
+	      ps += valid_utf8_len - 1;
+	      n -= valid_utf8_len - 1;
+	      continue;
+	    }
+	}
+
       if (last < ps)
-	pp_maybe_wrap_text (pp, last, ps - 1);
+	pp_maybe_wrap_text (pp, last, ps);
 
       /* Append the hexadecimal value of the character.  Allocate a buffer
 	 that's large enough for a 32-bit char plus the hex prefix.  */
@@ -2374,6 +2392,46 @@  test_urls ()
   }
 }
 
+/* Test multibyte awareness.  */
+static void test_utf8 ()
+{
+
+  /* Check that pp_quoted_string leaves valid UTF-8 alone.  */
+  {
+    pretty_printer pp;
+    const char *s = "\xf0\x9f\x98\x82";
+    pp_quoted_string (&pp, s);
+    ASSERT_STREQ (pp_formatted_text (&pp), s);
+  }
+
+  /* Check that pp_quoted_string escapes non-UTF-8 nonprintable bytes.  */
+  {
+    pretty_printer pp;
+    pp_quoted_string (&pp, "\xf0!\x9f\x98\x82");
+    ASSERT_STREQ (pp_formatted_text (&pp),
+		  "\\xf0!\\x9f\\x98\\x82");
+  }
+
+  /* Check that pp_character will line-wrap at the beginning of a UTF-8
+     sequence, but not in the middle.  */
+  {
+      pretty_printer pp (3);
+      const char s[] = "---\xf0\x9f\x98\x82";
+      for (int i = 0; i != sizeof (s) - 1; ++i)
+	pp_character (&pp, s[i]);
+      pp_newline (&pp);
+      for (int i = 1; i != sizeof (s) - 1; ++i)
+	pp_character (&pp, s[i]);
+      pp_character (&pp, '-');
+      ASSERT_STREQ (pp_formatted_text (&pp),
+		    "---\n"
+		    "\xf0\x9f\x98\x82\n"
+		    "--\xf0\x9f\x98\x82\n"
+		    "-");
+  }
+
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -2383,6 +2441,7 @@  pretty_print_c_tests ()
   test_pp_format ();
   test_prefixes_and_wrapping ();
   test_urls ();
+  test_utf8 ();
 }
 
 } // namespace selftest