RFA: Sanitize deprecation messages (PR 84195)

Message ID 87h8qv9xhb.fsf@redhat.com
State New
Headers show
Series
  • RFA: Sanitize deprecation messages (PR 84195)
Related show

Commit Message

Nick Clifton Feb. 5, 2018, 5:07 p.m.
Hi Martin, Hi Dodji, Hi David,

  PR 84195 makes the point that deprecation messages do not follow the
  formatting guidelines set out by the -fmessage-length option, and that
  they could even contain unwanted control characters:
  
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84195

  Below is my attempt at a patch to fix this.  It is not very clever,
  just replacing the control characters with spaces, and doing it each
  time that a deprecation warning is displayed.  But I figured that
  such warnings are going to be rare and do not need to be efficiently
  handled.  So I choose to keep it simple. :-)

  No regressions with an x86_64-pc-linux-gnu toolchain.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2018-02-05  Nick Clifton  <nickc@redhat.com>

	PR 84195
	* tree.c (warn_deprecated_use): Sanitize deprecation messages.

Comments

Martin Sebor Feb. 5, 2018, 6:59 p.m. | #1
On 02/05/2018 10:07 AM, Nick Clifton wrote:
> Hi Martin, Hi Dodji, Hi David,

>

>   PR 84195 makes the point that deprecation messages do not follow the

>   formatting guidelines set out by the -fmessage-length option, and that

>   they could even contain unwanted control characters:

>

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84195

>

>   Below is my attempt at a patch to fix this.  It is not very clever,

>   just replacing the control characters with spaces, and doing it each

>   time that a deprecation warning is displayed.  But I figured that

>   such warnings are going to be rare and do not need to be efficiently

>   handled.  So I choose to keep it simple. :-)


Thanks for working on this!  I agree your patch is an improvement.

My only suggestions are to consider how control characters and
excessively messages are handled in other contexts and adopt
the same approach here.  In the tests I did, control characters
were mostly escaped (e.g., as \n or as \x0a) rather than replaced
with spaces.  I haven't thought too hard about how excessively
long messages should be handled, or about the interactions with
-fmessage-length= (i.e., if messages longer than the limit should
be broken up.)

Martin

>

>   No regressions with an x86_64-pc-linux-gnu toolchain.

>

>   OK to apply ?

>

> Cheers

>   Nick

>

> gcc/ChangeLog

> 2018-02-05  Nick Clifton  <nickc@redhat.com>

>

> 	PR 84195

> 	* tree.c (warn_deprecated_use): Sanitize deprecation messages.

>

> Index: gcc/tree.c

> ===================================================================

> --- gcc/tree.c	(revision 257389)

> +++ gcc/tree.c	(working copy)

> @@ -12436,6 +12436,39 @@

>    else

>      msg = NULL;

>

> +  char * new_msg = NULL;

> +  if (msg)

> +    {

> +      unsigned int i; /* FIXME: Do we need to check to see if msg is longer

> +			 than 2^32 ?  */

> +

> +      /* PR 84195: Sanitize the message:

> +	

> +	 If -fmessage-length is set to 0 then replace newline characters with

> +	 spaces.  Replace other non-printing ASCII characters with spaces too.

> +

> +	 We do this check here, rather than where the deprecated attribute is

> +	 recorded because the setting of -fmessage-length can be changed

> +	 between those two locations.  */

> +      for (i = 0; i < strlen (msg); i++)

> +	{

> +	  char c = msg[i];

> +	

> +	  if (! ISCNTRL (c))

> +	    continue;

> +

> +	  if (c != '\n' || ! pp_is_wrapping_line (global_dc->printer))

> +	    {

> +	      if (new_msg == NULL)

> +		new_msg = xstrdup (msg);

> +	      new_msg [i] = ' ';

> +	    }

> +	}

> +

> +      if (new_msg)

> +	msg = new_msg;

> +    }

> +

>    bool w;

>    if (DECL_P (node))

>      {

> @@ -12505,6 +12538,8 @@

>  	    }

>  	}

>      }

> +

> +  free (new_msg);

>  }

>

>  /* Return true if REF has a COMPONENT_REF with a bit-field field declaration

>
Nick Clifton Feb. 6, 2018, 12:39 p.m. | #2
Hi Martin,

> My only suggestions are to consider how control characters and

> excessively messages are handled in other contexts and adopt

> the same approach here.


Are there other places where user-supplied messages are displayed by gcc ?


> In the tests I did, control characters

> were mostly escaped (e.g., as \n or as \x0a) rather than replaced

> with spaces. 


Ooo - what tests were these ?

I agree that it there is a standard way to handle these characters 
then the deprecated attribute ought to do the same.  (Plus hopefully
there is already a function in gcc to do this kind of processing,
so the deprecation code can just call it).


> I haven't thought too hard about how excessively

> long messages should be handled, or about the interactions with

> -fmessage-length= (i.e., if messages longer than the limit should

> be broken up.)


I believe that this will happen automatically.  The diagnostic display
routine will automatically insert newlines into the output if the 
message length is non-zero, and the message extends past this limit.
(Well provided that the message contains spaces.  The newlines are
only inserted in place of space characters so a very long, single
word message will not be split...)

Cheers
  Nick
David Malcolm Feb. 6, 2018, 2 p.m. | #3
On Tue, 2018-02-06 at 12:39 +0000, Nick Clifton wrote:
> Hi Martin,

> 

> > My only suggestions are to consider how control characters and

> > excessively messages are handled in other contexts and adopt

> > the same approach here.

> 

> Are there other places where user-supplied messages are displayed by

> gcc ?



#pragma GCC warning and error (as opposed to #warning and #error) also
print control chars without escaping:

$cat /tmp/prag.c 
#error "foo\nbar"
#warning "foo\nbar"
#pragma GCC error "foo\nbar"
#pragma GCC warning "foo\nbar"

$ gcc -c /tmp/prag.c
/tmp/prag.c:1:2: error: #error "foo\nbar"
 #error "foo\nbar"
  ^~~~~
/tmp/prag.c:2:2: warning: #warning "foo\nbar" [-Wcpp]
 #warning "foo\nbar"
  ^~~~~~~
/tmp/prag.c:3:19: error: foo
bar
 #pragma GCC error "foo\nbar"
                   ^~~~~~~~~~
/tmp/prag.c:4:21: warning: foo
bar
 #pragma GCC warning "foo\nbar"
                     ^~~~~~~~~~


Note the newlines in the above output.

There may be other places.

My preference is either the status quo, or to escape the control
characters.

The patch is missing a test case.

> 

> > In the tests I did, control characters

> > were mostly escaped (e.g., as \n or as \x0a) rather than replaced

> > with spaces. 

> 

> Ooo - what tests were these ?

> 

> I agree that it there is a standard way to handle these characters 

> then the deprecated attribute ought to do the same.  (Plus hopefully

> there is already a function in gcc to do this kind of processing,

> so the deprecation code can just call it).


There ought to be; I don't know it off the top of my head.

> > I haven't thought too hard about how excessively

> > long messages should be handled, or about the interactions with

> > -fmessage-length= (i.e., if messages longer than the limit should

> > be broken up.)

> 

> I believe that this will happen automatically.  The diagnostic

> display

> routine will automatically insert newlines into the output if the 

> message length is non-zero, and the message extends past this limit.

> (Well provided that the message contains spaces.  The newlines are

> only inserted in place of space characters so a very long, single

> word message will not be split...)

> 

> Cheers

>   Nick
Martin Sebor Feb. 6, 2018, 5:13 p.m. | #4
On 02/06/2018 05:39 AM, Nick Clifton wrote:
> Hi Martin,

>

>> My only suggestions are to consider how control characters and

>> excessively messages are handled in other contexts and adopt

>> the same approach here.

>

> Are there other places where user-supplied messages are displayed by gcc ?


There are a few that I know of.  I added an incomplete list to
the bug but I'm pretty sure I missed some.

>

>

>> In the tests I did, control characters

>> were mostly escaped (e.g., as \n or as \x0a) rather than replaced

>> with spaces.

>

> Ooo - what tests were these ?


Some of those David already mentioned (from my recent comment
on the bug):

1) #error and #warning where the string is printed as it appears
    in the source (i.e., with escape sequences),
2) -Wformat where control characters are escaped,
3) -Wformat-overflow where control characters are not escaped
    unless -fexec-charset= is specified.  I consider it a bug
    and plan to fix it.

> I agree that it there is a standard way to handle these characters

> then the deprecated attribute ought to do the same.  (Plus hopefully

> there is already a function in gcc to do this kind of processing,

> so the deprecation code can just call it).


My preference is the same as David's: escape control characters
(using standard C escape sequences, i.e., \a, \n, etc, or \xff
for those that don't have the shorthand notation).

>

>

>> I haven't thought too hard about how excessively

>> long messages should be handled, or about the interactions with

>> -fmessage-length= (i.e., if messages longer than the limit should

>> be broken up.)

>

> I believe that this will happen automatically.  The diagnostic display

> routine will automatically insert newlines into the output if the

> message length is non-zero, and the message extends past this limit.

> (Well provided that the message contains spaces.  The newlines are

> only inserted in place of space characters so a very long, single

> word message will not be split...)


You're right, it does seem to happen automatically for strings
in #warning and attribute deprecated.

Martin
Nick Clifton Feb. 7, 2018, 5:26 p.m. | #5
Hi Martin, Hi David,

OK - attached is a new patch that:

  * Replaces control characters with their escape equivalents.
  * Includes a testcase.

I was not sure what to do about the inconsistencies between the
behaviour of #warning and #pragma GCC warning, (and the error
equivalents), so I decided to leave it for now.  Fixing them
will require mucking about in the C preprocessor library, which
I did not fancy doing at the moment.

So, is this enhanced patch OK now ?

Cheers
  Nick

gcc/ChangeLog
2018-02-07  Nick Clifton  <nickc@redhat.com>

	PR 84195
	* tree.c (warn_deprecated_use): Replace control characters in
	deprecation messages with escape sequences.

gcc/testsuite/ChangeLog
2018-02-07  Nick Clifton  <nickc@redhat.com>

	* gcc.c-torture/compile/pr84195.c: New test.
David Malcolm Feb. 7, 2018, 7:17 p.m. | #6
On Wed, 2018-02-07 at 17:26 +0000, Nick Clifton wrote:
> Hi Martin, Hi David,

> 

> OK - attached is a new patch that:

> 

>   * Replaces control characters with their escape equivalents.

>   * Includes a testcase.

> 

> I was not sure what to do about the inconsistencies between the

> behaviour of #warning and #pragma GCC warning, (and the error

> equivalents), so I decided to leave it for now.  Fixing them

> will require mucking about in the C preprocessor library, which

> I did not fancy doing at the moment.

> 

> So, is this enhanced patch OK now ?

> 

> Cheers

>   Nick

> 

> gcc/ChangeLog

> 2018-02-07  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* tree.c (warn_deprecated_use): Replace control characters in

> 	deprecation messages with escape sequences.

> 

> gcc/testsuite/ChangeLog

> 2018-02-07  Nick Clifton  <nickc@redhat.com>

> 

> 	* gcc.c-torture/compile/pr84195.c: New test.


> Index: gcc/tree.c

> ===================================================================

> --- gcc/tree.c	(revision 257451)

> +++ gcc/tree.c	(working copy)

> @@ -12431,8 +12431,66 @@

>    if (attr)

>      attr = lookup_attribute ("deprecated", attr);

>  

> +  char * new_msg = NULL;

> +  unsigned int new_i = 0;

> +

>    if (attr)

> -    msg = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));

> +    {

> +      msg = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));

> +

> +      if (msg)

> +	{

> +	  /* PR 84195: Replace control characters in the message with their

> +	     escaped equivalents.  Allow newlines if -fmessage-length has

> +	     been set to a non-zero value.  


I'm not quite sure why we allow newlines in this case, sorry.

> This is done here, rather than

> +	     where the attribute is recorded as the message length can

> +	     change between these two locations.  */

> +

> +	  /* FIXME: Do we need to check to see if msg is longer than 2^32 ?  */

> +	  unsigned int i;


Please can you split this out into a subroutine, and please add selftests
for it, so that we can unit-test it directly.

Try grepping for e.g. tree_c_tests and other users of selftest.h to see
the idea.

This would be in addition to the DejaGnu-based test (I prefer a "belt and
braces" approach here).

Suggested unittest coverage:

- the empty string
- a non-empty string in which nothing is escaped
- a non-empty string in which everything is escaped
- all of the cases in the switch
- strings with and without newlines, since you're special-casing that

Ownership of the string seems fiddly, so given that gcc is implemented
in C++98, maybe instead of a subroutine, make it a class, with
responsibility for a string that we may or may not own; something like:

  class escaped_string
  {
  public:
     escaped_string (const char *unescaped);
     ~escaped_string () { if (m_owned) free (m_str); }
     operator const char *() const { return m_str; }
  private:
     char *m_str;
     bool m_owned;
  };

so you can (I hope) simply do:

  ASSERT_STREQ ("foo\\nbar", escaped_string ("foo\nbar"));

and similar for the various cases, without needing an explicit "free"
that can get lost if the function ever gains an early return.

(Or, better, have an escape_string function return an instance of a
class that has responsibility for the "if (ownership) free" dtor, but I
can't think of a good name for such a class right now).

Note that "make selftest-valgrind" will run the selftests under
valgrind (sadly, there are a few pre-existing leaks, even if you do
configure gcc with --enable-valgrind-annotations).

> +

> +	  for (i = 0; i < strlen (msg); i++)


Just get strlen once, rather than each time through the loop?
"size_t unescaped_len", maybe?

> +	    {

> +	      char c = msg[i];

> +

> +	      if (! ISCNTRL (c))

> +		{

> +		  if (new_msg)

> +		    new_msg[new_i++] = c;

> +		  continue;

> +		}

> +

> +	      if (c != '\n' || ! pp_is_wrapping_line (global_dc->printer))


Could a "bool needs_escaping_p (char)" subroutine clarify the logic
here?

> +		{


Maybe a comment here noting this is the first char needing escaping?

> +		  if (new_msg == NULL)

> +		    {

> +		      new_msg = (char *) xmalloc (strlen (msg) * 2);


Off by one, I think - shouldn't this be
  (strlen (msg) * 2) + 1
to allow for every char potentially being escaped, *plus* the NIL-
terminator?

Hence worth having a unittest for the "non-empty and everything is
escaped" case.

(and reuse the one-time strlen from above).

> +		      strncpy (new_msg, msg, i);

> +		      new_i = i;

> +		    }

> +		  new_msg [new_i++] = '\\';

> +		  switch (c)

> +		    {

> +		    case  7: new_msg[new_i++] = 'a'; break;

> +		    case  8: new_msg[new_i++] = 'b'; break;

> +		    case 27: new_msg[new_i++] = 'e'; break;

> +		    case 12: new_msg[new_i++] = 'f'; break;

> +		    case 10: new_msg[new_i++] = 'n'; break;

> +		    case 13: new_msg[new_i++] = 'r'; break;

> +		    case  9: new_msg[new_i++] = 't'; break;

> +		    case 11: new_msg[new_i++] = 'v'; break;


Can all these case labels be char constants, so e.g.:

    case '\n': new_msg[new_i++] = 'n'; break;

(does that work for every build/host combo?)

> +		    default: new_msg[new_i++] = '?'; break;

> +		    }

> +		}

> +	    }

> +

> +	  if (new_msg)

> +	    {

> +	      new_msg[new_i] = 0;

> +	      msg = new_msg;

> +	    }

> +	}

> +    }



>    else

>      msg = NULL;

>  

> @@ -12505,6 +12563,8 @@

>  	    }

>  	}

>      }

> +

> +  free (new_msg);

>  }

>  

>  /* Return true if REF has a COMPONENT_REF with a bit-field field declaration

> --- /dev/null	2018-02-07 09:26:20.608663241 +0000

> +++ gcc/testsuite/gcc.c-torture/compile/pr84195.c	2018-02-07 17:20:13.494587052 +0000

> @@ -0,0 +1,17 @@

> +/* { dg-options "-Wdeprecated-declarations" } */

> +

> +/* Check that MSG is printed without the escape characters being interpreted.

> +   Especially the newlines.

> +

> +   Note - gcc's behaviour is inconsistent in this regard as #error and

> +   #warning will also display control characters as escape sequences,

> +   whereas #pragma GCC error and #pragma GCC warning will perform the

> +   control operations of the control characters.  */

> +   

> +#define MSG "foo\n\t\rbar"

> +

> +int f (int i __attribute__ ((deprecated (MSG))))

> +{

> +  return 0 ? i : 0; /* { dg-warning "'i' is deprecated: foo.n.t.rbar" } */

> +}


We want the output to be:

  file:line:col: warning: 'i' is deprecated: foo\n\t\rbar [-Wblah-blah]

I think you can use "\\" to signifiy "\" in a DG test, so I *think*
this dg directive can
read:

  return 0 ? i : 0; /* { dg-warning "'i' is deprecated: foo\\n\\t\\rbar" } */


Hope this is constructive
Dave
Nick Clifton Feb. 8, 2018, 11:04 a.m. | #7
Hi David,

>> +	  /* PR 84195: Replace control characters in the message with their

>> +	     escaped equivalents.  Allow newlines if -fmessage-length has

>> +	     been set to a non-zero value.  

> 

> I'm not quite sure why we allow newlines in this case, sorry.


Because the documentation for -fmessage-length says:

  Try to format error messages so that they fit on lines 
  of about N characters.  If N is zero, then no 
  line-wrapping is done; each error message appears on a 
  single line.  This is the default for all front ends.

So with a non-zero message length, multi-line messages are allowed.

At least that was my understanding of the option.

Thanks for the patch review.  I will get onto fixing the points you
raised today.

Cheers
  Nick
Martin Sebor Feb. 8, 2018, 4:31 p.m. | #8
On 02/08/2018 04:04 AM, Nick Clifton wrote:
> Hi David,

>

>>> +	  /* PR 84195: Replace control characters in the message with their

>>> +	     escaped equivalents.  Allow newlines if -fmessage-length has

>>> +	     been set to a non-zero value.

>>

>> I'm not quite sure why we allow newlines in this case, sorry.

>

> Because the documentation for -fmessage-length says:

>

>   Try to format error messages so that they fit on lines

>   of about N characters.  If N is zero, then no

>   line-wrapping is done; each error message appears on a

>   single line.  This is the default for all front ends.

>

> So with a non-zero message length, multi-line messages are allowed.

>

> At least that was my understanding of the option.


It would be helpful to mention this somehow in the documentation
of each of the #-directives (i.e., that control characters are
escaped including newlines, subject to -fmessage-length).
Unless you want to handle that as part of the patch I'll see
about submitting a docs only change for the affected bits once
these bits have been committed.

Martin

>

> Thanks for the patch review.  I will get onto fixing the points you

> raised today.

>

> Cheers

>   Nick

>
Nick Clifton Feb. 9, 2018, 1:01 p.m. | #9
Hi David, Hi Martin,

  OK, take 3 of the patch is attached.  In this version:

  * The escaping is handled by a new class.
  * Self-tests have been added (and they helped find a bug - yay!)
  * The documentation has been extended to mention -fmessage-length's
    effect on the #-directives, and to add documentation for #pragma
    GCC error and #pragma GCC warning.  (Which appeared to be missing).

  I did try to add backslash characters to the regexp in the new testcase
  but I just could not find a way to persuade dejagnu to accept them.

  OK to apply ?

  (Apologies for the poor C++ coding - it is not my strong point.  I am
  an assembler level programmer at heart).

Cheers
  Nick

gcc/ChangeLog
2018-02-09  Nick Clifton  <nickc@redhat.com>

	PR 84195
	* tree.c (escaped_string): New class.  Converts an unescaped
	string into its escaped equivalent.
	(warn_deprecated_use): Use the new class to convert the
	deprecation message, if present.
	(test_escaped_strings): New self test.
	(test_c_tests): Add test_escaped_strings.

gcc/testsuite/ChangeLog
2018-02-07  Nick Clifton  <nickc@redhat.com>

 	* gcc.c-torture/compile/pr84195.c: New test.
David Malcolm Feb. 15, 2018, 9:04 p.m. | #10
On Fri, 2018-02-09 at 13:01 +0000, Nick Clifton wrote:
> Hi David, Hi Martin,

> 

>   OK, take 3 of the patch is attached.  In this version:

> 

>   * The escaping is handled by a new class.

>   * Self-tests have been added (and they helped find a bug - yay!)

>   * The documentation has been extended to mention -fmessage-length's

>     effect on the #-directives, and to add documentation for #pragma

>     GCC error and #pragma GCC warning.  (Which appeared to be

> missing).

> 

>   I did try to add backslash characters to the regexp in the new

> testcase

>   but I just could not find a way to persuade dejagnu to accept them.

> 

>   OK to apply ?

> 

>   (Apologies for the poor C++ coding - it is not my strong point.  I

> am

>   an assembler level programmer at heart).

> 

> Cheers

>   Nick


Thanks Nick, sorry for inflicting C++ on you.

This looks much better.  Out of interest, what bug did the selftests
help find?

Various nitpicks below

> gcc/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* tree.c (escaped_string): New class.  Converts an unescaped

> 	string into its escaped equivalent.

> 	(warn_deprecated_use): Use the new class to convert the

> 	deprecation message, if present.

> 	(test_escaped_strings): New self test.

> 	(test_c_tests): Add test_escaped_strings.


This doesn't mention the changes to the docs.

> gcc/testsuite/ChangeLog

> 2018-02-07  Nick Clifton  <nickc@redhat.com>

> 

>  	* gcc.c-torture/compile/pr84195.c: New test.


> Index: gcc/tree.c

> ===================================================================

> --- gcc/tree.c	(revision 257519)

> +++ gcc/tree.c	(working copy)

> @@ -12416,11 +12416,99 @@

>    return is_typedef_decl (TYPE_NAME (type));

>  }

>  

> +// A class to handle converting a string that might contain

> +// control characters, (eg newline, form-feed, etc), into one

> +// in which contains escape sequences instead.


We're still mostly using C-style comments for blocks, though I don't
think we have an actual rule about this.

> +class escaped_string

> +{

> + public:

> +  escaped_string () { m_owned = false; m_str = NULL; };

> +  ~escaped_string () { if (m_owned) free (m_str); }

> +  operator const char *() const { return (const char *) m_str; }

> +  void escape (const char *);

> + private:

> +  char * m_str;

> +  bool   m_owned;

> +};


I'd hoped that instead of this we could have an escape_string function
return an instance of a class that has responsibility for the "if
(ownership) free" dtor, but I don't think C++89 supports that (I had a
go at implementing it, but I think we'd need C++11's move semantics,
rather than just the NRVO).

So the approach above is OK by me (which, given that I suggested it,
may seem redundant :) )

> +void

> +escaped_string::escape (const char * unescaped)

> +{

> +  /* PR 84195: Replace control characters in "unescaped" with their

> +     escaped equivalents.  Allow newlines if -fmessage-length has

> +     been set to a non-zero value.  This is done here, rather than

> +     where the attribute is recorded as the message length can

> +     change between these two locations.  */


Move this comment to outside the function, as a descriptive comment for
the function as a whole.

> +  char * escaped;

> +  size_t i, new_i, len;

> +

> +  if (m_owned)

> +    free (m_str);

> +

> +  m_str = (char *) unescaped;

> +  m_owned = false;

> +

> +  if (unescaped == NULL || *unescaped == 0)

> +    return;


Is NULL a valid input here?  If so, please add test coverage for that
to the selftest.

The "*unescaped == 0" is presumably a micro-optimization for the
strlen(escaped) == 0 case, as it appears the code already handles this
case.

> +  len = strlen (unescaped);

> +  escaped = NULL;

> +  new_i = 0;

> +

> +  for (i = 0; i < len; i++)

> +    {

> +      char c = unescaped[i];

> +

> +      if (! ISCNTRL (c))

> +	{

> +	  if (escaped)

> +	    escaped[new_i++] = c;

> +	  continue;

> +	}

> +

> +      if (c != '\n' || ! pp_is_wrapping_line (global_dc->printer))

> +	{

> +	  if (escaped == NULL)

> +	    {

> +	      /* We only allocate space for a new string if we

> +		 actually encounter a control character that

> +		 needs replacing.  */

> +	      escaped = (char *) xmalloc (len * 2 + 1);

> +	      strncpy (escaped, unescaped, i);

> +	      new_i = i;

> +	    }

> +

> +	  escaped [new_i++] = '\\';


Some of the spacing in this function looks a bit odd to my eyes, but
you're much more familiar with the GNU standards than me, and I don't
want to be too nitpicky  (e.g. the space before an array access, and
the space after '!').

> +

> +	  switch (c)

> +	    {

> +	    case '\a': escaped[new_i++] = 'a'; break;

> +	    case '\b': escaped[new_i++] = 'b'; break;

> +	    case '\f': escaped[new_i++] = 'f'; break;

> +	    case '\n': escaped[new_i++] = 'n'; break;

> +	    case '\r': escaped[new_i++] = 'r'; break;

> +	    case '\t': escaped[new_i++] = 't'; break;

> +	    case '\v': escaped[new_i++] = 'v'; break;

> +	    default:   escaped[new_i++] = '?'; break;

> +	    }

> +	}

> +      else if (escaped)

> +	escaped[new_i++] = c;

> +    }

> +

> +  if (escaped)

> +    {

> +      escaped[new_i] = 0;

> +      m_str = escaped;

> +      m_owned = true;

> +    }

> +}

> +

>  /* Warn about a use of an identifier which was marked deprecated.  */

>  void

>  warn_deprecated_use (tree node, tree attr)

>  {

> -  const char *msg;

> +  class escaped_string msg;


"class" is redundant here.


>    if (node == 0 || !warn_deprecated_decl)

>      return;

> @@ -12442,9 +12530,7 @@

>      attr = lookup_attribute ("deprecated", attr);

>  

>    if (attr)

> -    msg = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)));

> -  else

> -    msg = NULL;

> +    msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr))));

>    bool w;

>    if (DECL_P (node))

> @@ -12451,7 +12537,7 @@

>      {

>        if (msg)

>  	w = warning (OPT_Wdeprecated_declarations,

> -		     "%qD is deprecated: %s", node, msg);

> +		     "%qD is deprecated: %s", node, (const char *) msg);

>        else

>  	w = warning (OPT_Wdeprecated_declarations,

>  		     "%qD is deprecated", node);

> @@ -12478,7 +12564,7 @@

>  	    {

>  	      if (msg)

>  		w = warning (OPT_Wdeprecated_declarations,

> -			     "%qE is deprecated: %s", what, msg);

> +			     "%qE is deprecated: %s", what, (const char *) msg);

>  	      else

>  		w = warning (OPT_Wdeprecated_declarations,

>  			     "%qE is deprecated", what);

> @@ -12487,7 +12573,7 @@

>  	    {

>  	      if (msg)

>  		w = warning (OPT_Wdeprecated_declarations,

> -			     "type is deprecated: %s", msg);

> +			     "type is deprecated: %s", (const char *) msg);

>  	      else

>  		w = warning (OPT_Wdeprecated_declarations,

>  			     "type is deprecated");

> @@ -12501,7 +12587,7 @@

>  	    {

>  	      if (msg)

>  		warning (OPT_Wdeprecated_declarations, "%qE is deprecated: %s",

> -			 what, msg);

> +			 what, (const char *) msg);

>  	      else

>  		warning (OPT_Wdeprecated_declarations, "%qE is deprecated", what);

>  	    }

> @@ -12509,7 +12595,7 @@

>  	    {

>  	      if (msg)

>  		warning (OPT_Wdeprecated_declarations, "type is deprecated: %s",

> -			 msg);

> +			 (const char *) msg);

>  	      else

>  		warning (OPT_Wdeprecated_declarations, "type is deprecated");

>  	    }

> @@ -14548,6 +14634,43 @@

>    check_strip_nops (wrapped_int_var, int_var);

>  }

>  

> +/* Check that string escaping works correctly.  */

> +

> +static void

> +test_escaped_strings (void)

> +{

> +  int saved_cutoff;

> +  escaped_string msg;

> +

> +  msg.escape ("");

> +  ASSERT_STREQ ("", (const char *) msg);

> +

> +  msg.escape ("foobar");

> +  ASSERT_STREQ ("foobar", (const char *) msg);

> +

> +  /* Ensure that we have -fmessage-length set to 0.  */

> +  saved_cutoff = pp_line_cutoff (global_dc->printer);

> +  pp_line_cutoff (global_dc->printer) = 0;

> +

> +  msg.escape ("foo\nbar");

> +  ASSERT_STREQ ("foo\\nbar", (const char *) msg);

> +

> +  msg.escape ("\a\b\f\n\r\t\v");

> +  ASSERT_STREQ ("\\a\\b\\f\\n\\r\\t\\v", (const char *) msg);

> +

> +  /* Now repeat the tests with -fmessage-length set to 5.  */

> +  pp_line_cutoff (global_dc->printer) = 5;

> +

> +  msg.escape ("foo\nbar");

> +  ASSERT_STREQ ("foo\nbar", (const char *) msg);

> +

> +  msg.escape ("\a\b\f\n\r\t\v");

> +  ASSERT_STREQ ("\\a\\b\\f\n\\r\\t\\v", (const char *) msg);

> +

> +  /* Restore the original message length setting.  */

> +  pp_line_cutoff (global_dc->printer) = saved_cutoff;

> +}

> +

>  /* Run all of the selftests within this file.  */

>  

>  void

> @@ -14558,6 +14681,7 @@

>    test_labels ();

>    test_vector_cst_patterns ();

>    test_location_wrappers ();

> +  test_escaped_strings ();

>  }

>  

>  } // namespace selftest

> Index: gcc/doc/extend.texi

> ===================================================================

> --- gcc/doc/extend.texi	(revision 257519)

> +++ gcc/doc/extend.texi	(working copy)

> @@ -2563,6 +2563,9 @@

>  The @code{deprecated} attribute can also be used for variables and

>  types (@pxref{Variable Attributes}, @pxref{Type Attributes}.)

>  

> +The message attached to the attribute is affected by the setting of

> +the @option{-fmessage-length} option.

> +

>  @item error ("@var{message}")

>  @itemx warning ("@var{message}")

>  @cindex @code{error} function attribute

> @@ -6077,6 +6080,9 @@

>  types (@pxref{Common Function Attributes},

>  @pxref{Common Type Attributes}).

>  

> +The message attached to the attribute is affected by the setting of

> +the @option{-fmessage-length} option.

> +

>  @item nonstring

>  @cindex @code{nonstring} variable attribute

>  The @code{nonstring} variable attribute specifies that an object or member

> @@ -7035,11 +7041,16 @@

>  deprecated.  Line 5 has no warning because T3 is explicitly

>  deprecated.  Similarly for line 6.  The optional @var{msg}

>  argument, which must be a string, is printed in the warning if

> -present.

> +present.  Control characters in the string will be replaced with

> +spaces, and if the @option{-fmessage-length} option is set to 0 (its

> +default value) then any newline characters will also be replaced.

>  

>  The @code{deprecated} attribute can also be used for functions and

>  variables (@pxref{Function Attributes}, @pxref{Variable Attributes}.)

>  

> +The message attached to the attribute is affected by the setting of

> +the @option{-fmessage-length} option.

> +

>  @item designated_init

>  @cindex @code{designated_init} type attribute

>  This attribute may only be applied to structure types.  It indicates

> @@ -22372,7 +22383,9 @@

>  @cindex pragma, diagnostic

>  

>  Prints @var{string} as a compiler message on compilation.  The message

> -is informational only, and is neither a compilation warning nor an error.

> +is informational only, and is neither a compilation warning nor an

> +error.  Newlines can be included in the string by using the @samp{\n}

> +escape sequence.

>  

>  @smallexample

>  #pragma message "Compiling " __FILE__ "..."

> @@ -22392,6 +22405,37 @@

>  prints @samp{/tmp/file.c:4: note: #pragma message:

>  TODO - Remember to fix this}.

>  

> +@item #pragma GCC error @var{message}

> +@cindex pragma, diagnostic

> +Generates an error message.  This pragma @emph{is} considered to

> +indicate an error in the compilation, and it will be treated as such.

> +

> +Newlines can be included in the string by using the @samp{\n}

> +escape sequence.  They will be displayed as newlines even if the

> +@option{-fmessage-length} option is set to zero.

> +

> +The error is only generated if the pragma is present in the code after

> +pre-processing has been completed.  It does not matter however if the

> +code containing the pragma is unreachable:

> +

> +@smallexample

> +#if 0

> +#pragma GCC error "this error is not seen"

> +#endif

> +void foo (void)

> +@{

> +  return;

> +#pragma GCC error "this error is seen"

> +@}

> +@end smallexample

> +

> +@item #pragma GCC warning @var{message}

> +@cindex pragma, diagnostic

> +This is just like @samp{pragma GCC error} except that a warning

> +message is issued instead of an error message.  Unless

> +@option{-Werror} is in effect, in which case this pragma will generate

> +an error as well.

> +

>  @end table

>  

>  @node Visibility Pragmas

> Index: gcc/doc/invoke.texi

> ===================================================================

> --- gcc/doc/invoke.texi	(revision 257519)

> +++ gcc/doc/invoke.texi	(working copy)

> @@ -3496,6 +3496,11 @@

>  done; each error message appears on a single line.  This is the

>  default for all front ends.

>  

> +Note - this option also affects the display of the @samp{#error} and

> +@samp{#warning} pre-processor directives, and the @samp{deprecated}

> +function/type/variable attribute.  It does not however affect the

> +@samp{pragma GCC warning} and @samp{pragma GCC error} pragmas.

> +

>  @item -fdiagnostics-show-location=once

>  @opindex fdiagnostics-show-location

>  Only meaningful in line-wrapping mode.  Instructs the diagnostic messages

> --- /dev/null	2018-02-08 08:21:34.475648704 +0000

> +++ gcc/testsuite/gcc.c-torture/compile/pr84195.c	2018-02-07 17:20:13.494587052 +0000

> @@ -0,0 +1,17 @@

> +/* { dg-options "-Wdeprecated-declarations" } */

> +

> +/* Check that MSG is printed without the escape characters being interpreted.

> +   Especially the newlines.

> +

> +   Note - gcc's behaviour is inconsistent in this regard as #error and

> +   #warning will also display control characters as escape sequences,

> +   whereas #pragma GCC error and #pragma GCC warning will perform the

> +   control operations of the control characters.  */

> +   

> +#define MSG "foo\n\t\rbar"

> +

> +int f (int i __attribute__ ((deprecated (MSG))))

> +{

> +  return 0 ? i : 0; /* { dg-warning "'i' is deprecated: foo.n.t.rbar" } */

> +}


Thanks for updating/fixing the docs.

Other than the nitpicks above, this looks good.

I believe I can approve this with my "diagnostics messages" maintainer
hat on, but given that we're deep in stage 4 and this isn't a
regression, please queue it up for gcc 9 stage 1 (unless there's a
pressing need to get this into gcc 8, in which case talk to the release
managers!)

Dave
Trevor Saunders Feb. 16, 2018, 12:14 p.m. | #11
On Thu, Feb 15, 2018 at 04:04:45PM -0500, David Malcolm wrote:
> On Fri, 2018-02-09 at 13:01 +0000, Nick Clifton wrote:

> > +class escaped_string

> > +{

> > + public:

> > +  escaped_string () { m_owned = false; m_str = NULL; };

> > +  ~escaped_string () { if (m_owned) free (m_str); }

> > +  operator const char *() const { return (const char *) m_str; }

> > +  void escape (const char *);

> > + private:

> > +  char * m_str;

> > +  bool   m_owned;

> > +};

> 

> I'd hoped that instead of this we could have an escape_string function

> return an instance of a class that has responsibility for the "if

> (ownership) free" dtor, but I don't think C++89 supports that (I had a

> go at implementing it, but I think we'd need C++11's move semantics,

> rather than just the NRVO).


I think either gcc::unique_ptr does what you want here, or you could
use the same trick of having a copy ctor that takes a non constant
reference to implement this in C++98.

Thanks!

Trev
Nick Clifton Feb. 16, 2018, 12:43 p.m. | #12
Hi David,

  Attached is a revised version of the patch which I hope addresses all 
  of your (very helpful) comments on the v3 patch.

  OK to apply once the sources are back on stage 1 ?

Cheers
  Nick

gcc/ChangeLog
2018-02-09  Nick Clifton  <nickc@redhat.com>

	PR 84195
	* tree.c (escaped_string): New class.  Converts an unescaped
	string into its escaped equivalent.
	(warn_deprecated_use): Use the new class to convert the
	deprecation message, if present.
	(test_escaped_strings): New self test.
	(test_c_tests): Add test_escaped_strings.
	* doc/extend.texi (deprecated): Add a note that the
	deprecation message is affected by the -fmessage-length
	option, and that control characters will be escaped.
	(#pragma GCC error): Document this pragma.
	(#pragma GCC warning): Likewise.
	* doc/invoke.texi (-fmessage-length): Document this option's
	effect on the #warning and #error preprocessor directives and
	the deprecated attribute.
	
gcc/testsuite/ChangeLog
2018-02-09  Nick Clifton  <nickc@redhat.com>

	PR 84195
	* gcc.c-torture/compile/pr84195.c: New test.
David Malcolm Feb. 16, 2018, 3:33 p.m. | #13
On Fri, 2018-02-16 at 12:43 +0000, Nick Clifton wrote:
> Hi David,

> 

>   Attached is a revised version of the patch which I hope addresses

> all 

>   of your (very helpful) comments on the v3 patch.

> 

>   OK to apply once the sources are back on stage 1 ?


Yes.

Thanks!
Dave

> Cheers

>   Nick

> 

> gcc/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* tree.c (escaped_string): New class.  Converts an unescaped

> 	string into its escaped equivalent.

> 	(warn_deprecated_use): Use the new class to convert the

> 	deprecation message, if present.

> 	(test_escaped_strings): New self test.

> 	(test_c_tests): Add test_escaped_strings.

> 	* doc/extend.texi (deprecated): Add a note that the

> 	deprecation message is affected by the -fmessage-length

> 	option, and that control characters will be escaped.

> 	(#pragma GCC error): Document this pragma.

> 	(#pragma GCC warning): Likewise.

> 	* doc/invoke.texi (-fmessage-length): Document this option's

> 	effect on the #warning and #error preprocessor directives and

> 	the deprecated attribute.

> 	

> gcc/testsuite/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* gcc.c-torture/compile/pr84195.c: New test.
David Malcolm Feb. 16, 2018, 3:38 p.m. | #14
On Fri, 2018-02-16 at 07:14 -0500, Trevor Saunders wrote:
> On Thu, Feb 15, 2018 at 04:04:45PM -0500, David Malcolm wrote:

> > On Fri, 2018-02-09 at 13:01 +0000, Nick Clifton wrote:

> > > +class escaped_string

> > > +{

> > > + public:

> > > +  escaped_string () { m_owned = false; m_str = NULL; };

> > > +  ~escaped_string () { if (m_owned) free (m_str); }

> > > +  operator const char *() const { return (const char *) m_str; }

> > > +  void escape (const char *);

> > > + private:

> > > +  char * m_str;

> > > +  bool   m_owned;

> > > +};

> > 

> > I'd hoped that instead of this we could have an escape_string

> > function

> > return an instance of a class that has responsibility for the "if

> > (ownership) free" dtor, but I don't think C++89 supports that (I

> > had a

> > go at implementing it, but I think we'd need C++11's move

> > semantics,

> > rather than just the NRVO).

> 

> I think either gcc::unique_ptr does what you want here, 


That would involve a second heap allocation; as-is, the object lives on
the stack.

> or you could

> use the same trick of having a copy ctor that takes a non constant

> reference to implement this in C++98.


Maybe.  I was trying not to bash Nick over the head too much with C++
here :)  If you want to implement it, go for it (I can think of
possible uses for it in the driver, for fixing leaks there when used by
libgccjit).

> Thanks!

> 

> Trev
Martin Sebor Feb. 16, 2018, 4:19 p.m. | #15
On 02/16/2018 05:43 AM, Nick Clifton wrote:
> Hi David,

>

>   Attached is a revised version of the patch which I hope addresses all

>   of your (very helpful) comments on the v3 patch.

>

>   OK to apply once the sources are back on stage 1 ?


Thanks (also) for adding the documentation!

I have just one minor observation/suggestion for the escaped_string
class.  Feel free not to change anything, it's just a possible gotcha
if the class were to become more widely used than it is now.

Since the class manages a resource it should ideally make sure it
doesn't try to release the same resource multiple times.  I.e., its
copy constructor and assignment operators should either "do the right
thing" (whatever you think that is) or be made inaccessible (or declared 
deleted in C++ 11).

For example:

   {
     escaped_string a;
     a.escape ("foo\nbar");

     escaped_string b (a);
     // b destroys its m_str
     // double free in a's destructor here
   }

(Some day GCC will have a warning to help detect this pitfall.)

Martin

>

> Cheers

>   Nick

>

> gcc/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

>

> 	PR 84195

> 	* tree.c (escaped_string): New class.  Converts an unescaped

> 	string into its escaped equivalent.

> 	(warn_deprecated_use): Use the new class to convert the

> 	deprecation message, if present.

> 	(test_escaped_strings): New self test.

> 	(test_c_tests): Add test_escaped_strings.

> 	* doc/extend.texi (deprecated): Add a note that the

> 	deprecation message is affected by the -fmessage-length

> 	option, and that control characters will be escaped.

> 	(#pragma GCC error): Document this pragma.

> 	(#pragma GCC warning): Likewise.

> 	* doc/invoke.texi (-fmessage-length): Document this option's

> 	effect on the #warning and #error preprocessor directives and

> 	the deprecated attribute.

> 	

> gcc/testsuite/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

>

> 	PR 84195

> 	* gcc.c-torture/compile/pr84195.c: New test.

>
Nick Clifton Feb. 20, 2018, 12:48 p.m. | #16
Hi Martin,


> Since the class manages a resource it should ideally make sure it

> doesn't try to release the same resource multiple times.  I.e., its

> copy constructor and assignment operators should either "do the right

> thing" (whatever you think that is) or be made inaccessible (or declared deleted in C++ 11).

> 

> For example:

> 

>   {

>     escaped_string a;

>     a.escape ("foo\nbar");

> 

>     escaped_string b (a);

>     // b destroys its m_str

>     // double free in a's destructor here

>   }


I am not sure that this can happen.  First of the escaped_string
class does not have constructor that accepts a char* argument.
(Maybe in C++ this is done automatically ?  My C++-fu is very weak).

Secondly the m_owned private field is only set to true when
the m_str field is set to a string allocated by the particular
instance of the class, and memory is only freed by the destructor 
if m_owned is true.  

So even this should work:

  {
    escaped_string a,b;

    a.escape ("foo\nbar");
    b.escape ((const char *) a);
  }

The destructor for B will not free any memory, even though its
m_str field has been set to the same string as A, because its
m_owned field will be set to FALSE.

Cheers
  Nick
Martin Sebor Feb. 21, 2018, 12:07 a.m. | #17
On 02/20/2018 05:48 AM, Nick Clifton wrote:
> Hi Martin,

>

>

>> Since the class manages a resource it should ideally make sure it

>> doesn't try to release the same resource multiple times.  I.e., its

>> copy constructor and assignment operators should either "do the right

>> thing" (whatever you think that is) or be made inaccessible (or declared deleted in C++ 11).

>>

>> For example:

>>

>>   {

>>     escaped_string a;

>>     a.escape ("foo\nbar");

>>

>>     escaped_string b (a);

>>     // b destroys its m_str

>>     // double free in a's destructor here

>>   }

>

> I am not sure that this can happen.  First of the escaped_string

> class does not have constructor that accepts a char* argument.

> (Maybe in C++ this is done automatically ?  My C++-fu is very weak).


It doesn't have a converting ctor and the above example doesn't
use one.  It uses the implicitly defined copy constructor to
create a copy of A in B.

>

> Secondly the m_owned private field is only set to true when

> the m_str field is set to a string allocated by the particular

> instance of the class, and memory is only freed by the destructor

> if m_owned is true.


Right, and implicitly-defined copy constructor copies all the
fields so both the original and the copy wind up pointing to
the same allocated chunk and both with m_owned set to true.

The same thing happens when an object of the calss is assigned
to another, thanks to the implicitly-defined copy assignment
operator.

> So even this should work:

>

>   {

>     escaped_string a,b;

>

>     a.escape ("foo\nbar");

>     b.escape ((const char *) a);

>   }


Yes, this works but this wouldn't:

   {
     escaped_string a, b;

     a.escape ("foo\nbar");
     b = a;
   }   // double free in a's dtor here

What also wouldn't work right is the converse:

   {
     escaped_string a, b;

     a.escape ("foo\nbar");
     a = b;   // a.m_str leaks here (it's reset by the assignment)
   }

Your patch works correctly because it doesn't use the copy
constructor or the copy assignment operator.  So this is
only a hypothetical concern, if a change in the code led
to one of the copy functions being called.

Hope this makes sense.

Martin

>

> The destructor for B will not free any memory, even though its

> m_str field has been set to the same string as A, because its

> m_owned field will be set to FALSE.

>

> Cheers

>   Nick

>
Jeff Law May 2, 2018, 5 p.m. | #18
On 02/16/2018 05:43 AM, Nick Clifton wrote:
> Hi David,

> 

>   Attached is a revised version of the patch which I hope addresses all 

>   of your (very helpful) comments on the v3 patch.

> 

>   OK to apply once the sources are back on stage 1 ?

> 

> Cheers

>   Nick

> 

> gcc/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* tree.c (escaped_string): New class.  Converts an unescaped

> 	string into its escaped equivalent.

> 	(warn_deprecated_use): Use the new class to convert the

> 	deprecation message, if present.

> 	(test_escaped_strings): New self test.

> 	(test_c_tests): Add test_escaped_strings.

> 	* doc/extend.texi (deprecated): Add a note that the

> 	deprecation message is affected by the -fmessage-length

> 	option, and that control characters will be escaped.

> 	(#pragma GCC error): Document this pragma.

> 	(#pragma GCC warning): Likewise.

> 	* doc/invoke.texi (-fmessage-length): Document this option's

> 	effect on the #warning and #error preprocessor directives and

> 	the deprecated attribute.

> 	

> gcc/testsuite/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

> 

> 	PR 84195

> 	* gcc.c-torture/compile/pr84195.c: New test.

The docs still say "Control characters in the string will be replaced
with spaces", but they're being escaped now.  That needs to be fixed.


I note you overload the cast operator in your new class.  Why not just
use an accessor?  Was this style something someone else suggested?



Onward to the nits :-)

+  char *m_str;
+  bool  m_owned;

I don't think we try to line up variable definitions like that.


+             escaped = (char *) xmalloc (len * 2 + 1);
I believe that generally we want a slower growth pattern.  You'll find
that we regularly do something like len * 3 / 2 + 1.  Seems wise here too.





+void
+escaped_string::escape (const char * unescaped)
No need for the space between the * and unescaped.

+         switch (c)
+           {
+           case '\a': escaped[new_i++] = 'a'; break;
+           case '\b': escaped[new_i++] = 'b'; break;
+           case '\f': escaped[new_i++] = 'f'; break;
+           case '\n': escaped[new_i++] = 'n'; break;
+           case '\r': escaped[new_i++] = 'r'; break;
+           case '\t': escaped[new_i++] = 't'; break;
+           case '\v': escaped[new_i++] = 'v'; break;
+           default:   escaped[new_i++] = '?'; break;
+           }
I believe our coding standards would have the statements on new lines
rather than on the line with the case label.


So I think the biggest question here is the cast overload vs just using
an accessor.

Jeff
Jakub Jelinek May 2, 2018, 5:04 p.m. | #19
On Wed, May 02, 2018 at 11:00:20AM -0600, Jeff Law wrote:
> +             escaped = (char *) xmalloc (len * 2 + 1);

> I believe that generally we want a slower growth pattern.  You'll find

> that we regularly do something like len * 3 / 2 + 1.  Seems wise here too.


We don't want to use xmalloc in these cases, but XNEWVEC (char, ...).

	Jakub
Nick Clifton May 3, 2018, 9:55 a.m. | #20
Hi Jeff,

  Thanks for the review.

> The docs still say "Control characters in the string will be replaced

> with spaces", but they're being escaped now.  That needs to be fixed.


Done.

> I note you overload the cast operator in your new class.  Why not just

> use an accessor?  Was this style something someone else suggested?


Yup.  My C++ foo is very weak, so I asked Martin for help, and that
was he suggested.  Is this method wrong ?



> Onward to the nits :-)


:-)

> +  char *m_str;

> +  bool  m_owned;

> 

> I don't think we try to line up variable definitions like that.


OK. Fixed.

> +             escaped = (char *) xmalloc (len * 2 + 1);

> I believe that generally we want a slower growth pattern.  You'll find

> that we regularly do something like len * 3 / 2 + 1.  Seems wise here too.


Also done.

> +void

> +escaped_string::escape (const char * unescaped)

> No need for the space between the * and unescaped.


Et tu Jeff ?  Then fall Nick. :-)  [Fixed]


> +           case '\v': escaped[new_i++] = 'v'; break;

> +           default:   escaped[new_i++] = '?'; break;

> +           }

> I believe our coding standards would have the statements on new lines

> rather than on the line with the case label.


Sorted.

> So I think the biggest question here is the cast overload vs just using

> an accessor.


OK, so demonstrating my ignorance: what is the accessor solution to this problem ?

Cheers
  Nick

PS.  Revised patch attached in case the current solution of casting the operators is OK.
Jeff Law May 30, 2018, 11:11 p.m. | #21
On 05/03/2018 03:55 AM, Nick Clifton wrote:
> Hi Jeff,

> 

>   Thanks for the review.

> 

>> The docs still say "Control characters in the string will be replaced

>> with spaces", but they're being escaped now.  That needs to be fixed.

> 

> Done.

> 

>> I note you overload the cast operator in your new class.  Why not just

>> use an accessor?  Was this style something someone else suggested?

> 

> Yup.  My C++ foo is very weak, so I asked Martin for help, and that

> was he suggested.  Is this method wrong ?

As is so often the case, there's more than one way to do everything.
Overloading operators is allowed, but it's generally not my first choice.

> 

>> So I think the biggest question here is the cast overload vs just using

>> an accessor.

> 

> OK, so demonstrating my ignorance: what is the accessor solution to this problem ?

So an accessor is just a member function to retrieve the data.  You
could embed the cast there to convert.    So instead of:


+  operator const char *() const { return (const char *) m_str; }

You have something like:

const char *get (void) { return (const char *) m_str; }


And instead of casting at the use site, you just call the get() method.

I think your overloaded cast is simple and unsurprising (which are
essentially the requirements for using an operator overload in our
conventions).  So I can live with the overload.

For reference:

https://gcc.gnu.org/codingconventions.html#Over_Oper

> 

> PS.  Revised patch attached in case the current solution of casting the operators is OK.

OK.  Please install.

Thanks,
jeff
Martin Jambor June 18, 2018, 2:04 p.m. | #22
Hi Nick,

On Fri, Feb 16 2018, Nick Clifton wrote:
> Hi David,

>

>   Attached is a revised version of the patch which I hope addresses all 

>   of your (very helpful) comments on the v3 patch.

>

>   OK to apply once the sources are back on stage 1 ?

>

> Cheers

>   Nick

>

> gcc/ChangeLog

> 2018-02-09  Nick Clifton  <nickc@redhat.com>

>

> 	PR 84195

> 	* tree.c (escaped_string): New class.  Converts an unescaped

> 	string into its escaped equivalent.

> 	(warn_deprecated_use): Use the new class to convert the

> 	deprecation message, if present.

> 	(test_escaped_strings): New self test.

> 	(test_c_tests): Add test_escaped_strings.

> 	* doc/extend.texi (deprecated): Add a note that the

> 	deprecation message is affected by the -fmessage-length

> 	option, and that control characters will be escaped.

> 	(#pragma GCC error): Document this pragma.

> 	(#pragma GCC warning): Likewise.

> 	* doc/invoke.texi (-fmessage-length): Document this option's

> 	effect on the #warning and #error preprocessor directives and

> 	the deprecated attribute.

> 	


I'm getting a bootstrap failure:

/home/mjambor/gcc/trunk/src/gcc/tree.c:12457:20: error: cast from type ‘const char*’ to type ‘char*’ casts away qualifiers [-Werror=cast-qual]
   m_str = (char *) unescaped;

that I believe is caused by this patch, because the warning complains
about the newly added code:

> --- gcc/tree.c	(revision 257653)

> +++ gcc/tree.c	(working copy)

> @@ -12416,11 +12416,101 @@

> 


...

> 

> +/* PR 84195: Replace control characters in "unescaped" with their

> +   escaped equivalents.  Allow newlines if -fmessage-length has

> +   been set to a non-zero value.  This is done here, rather than

> +   where the attribute is recorded as the message length can

> +   change between these two locations.  */

> +

> +void

> +escaped_string::escape (const char * unescaped)

> +{

> +  char *escaped;

> +  size_t i, new_i, len;

> +

> +  if (m_owned)

> +    free (m_str);

> +

> +  m_str = (char *) unescaped;


specifically about the line above, and the complaint seems quite valid.

Sorry for bad news,

Martin
Nick Clifton June 18, 2018, 4:29 p.m. | #23
Hi Martin,

> I'm getting a bootstrap failure:


*sigh* yes - my bad.  Fortunately a patch has already been posted:

  https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01039.html

And I think that it has now been approved.

Cheers
  Nick
Jason Merrill June 18, 2018, 5:56 p.m. | #24
On Fri, Feb 16, 2018 at 11:19 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 02/16/2018 05:43 AM, Nick Clifton wrote:

>>

>> Hi David,

>>

>>   Attached is a revised version of the patch which I hope addresses all

>>   of your (very helpful) comments on the v3 patch.

>>

>>   OK to apply once the sources are back on stage 1 ?

>

>

> Thanks (also) for adding the documentation!

>

> I have just one minor observation/suggestion for the escaped_string

> class.  Feel free not to change anything, it's just a possible gotcha

> if the class were to become more widely used than it is now.

>

> Since the class manages a resource it should ideally make sure it

> doesn't try to release the same resource multiple times.  I.e., its

> copy constructor and assignment operators should either "do the right

> thing" (whatever you think that is) or be made inaccessible (or declared

> deleted in C++ 11).

>

> For example:

>

>   {

>     escaped_string a;

>     a.escape ("foo\nbar");

>

>     escaped_string b (a);

>     // b destroys its m_str

>     // double free in a's destructor here

>   }

>

> (Some day GCC will have a warning to help detect this pitfall.)


-Wdeprecated-copy, which I recently added to -Wall, will warn if such
a class is actually copied.  Though I'm not sure it will stay in
-Wall, or keep that name.

Jason

Patch

Index: gcc/tree.c
===================================================================
--- gcc/tree.c	(revision 257389)
+++ gcc/tree.c	(working copy)
@@ -12436,6 +12436,39 @@ 
   else
     msg = NULL;
 
+  char * new_msg = NULL;
+  if (msg)
+    {
+      unsigned int i; /* FIXME: Do we need to check to see if msg is longer
+			 than 2^32 ?  */
+
+      /* PR 84195: Sanitize the message:
+	 
+	 If -fmessage-length is set to 0 then replace newline characters with
+	 spaces.  Replace other non-printing ASCII characters with spaces too.
+
+	 We do this check here, rather than where the deprecated attribute is
+	 recorded because the setting of -fmessage-length can be changed
+	 between those two locations.  */
+      for (i = 0; i < strlen (msg); i++)
+	{
+	  char c = msg[i];
+	  
+	  if (! ISCNTRL (c))
+	    continue;
+
+	  if (c != '\n' || ! pp_is_wrapping_line (global_dc->printer))
+	    {
+	      if (new_msg == NULL)
+		new_msg = xstrdup (msg);
+	      new_msg [i] = ' ';
+	    }
+	}
+
+      if (new_msg)
+	msg = new_msg;
+    }
+
   bool w;
   if (DECL_P (node))
     {
@@ -12505,6 +12538,8 @@ 
 	    }
 	}
     }
+
+  free (new_msg);
 }
 
 /* Return true if REF has a COMPONENT_REF with a bit-field field declaration