PR preprocessor/84517 allow double-underscore macros after string literals

Message ID CAH6eHdTfKxbucGxh7FGjRcWeNKhAqfQQO+k0pC7CjAaDKfy8Vw@mail.gmail.com
State New
Headers show
Series
  • PR preprocessor/84517 allow double-underscore macros after string literals
Related show

Commit Message

Jonathan Wakely Feb. 27, 2018, 2:41 p.m.
Since the fix for PR c++/80955 any suffix on a string literal that
begins with an underscore is assumed to be a user-defined literal
suffix, not a macro. This assumption is invalid for a suffix beginning
with two underscores, because such names are reserved and can't be used
for UDLs anyway. Checking for exactly one underscore restores support
for macro expansion in cases like "File: "__FILE__ or "Date: "__DATE__
(which are formally ill-formed but accepted with a warning, as a
conforming extension).

gcc/testsuite:

        PR preprocessor/84517
        * g++.dg/cpp0x/udlit-macros.C: Expect a warning for ""__FILE__.

libcpp:

        PR preprocessor/84517
        * lex.c (is_macro_not_literal_suffix): New function.
        (lex_raw_string, lex_string): Use is_macro_not_literal_suffix to
        decide when to issue -Wliteral-suffix warnings.


Tested powerpc64le-linux, OK for trunk?
commit 9781a25354e07503690259d7fa95e9dd459c51b2
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Feb 22 19:59:38 2018 +0000

    PR preprocessor/84517 allow double-underscore macros after string literals
    
    Since the fix for PR c++/80955 any suffix on a string literal that
    begins with an underscore is assumed to be a user-defined literal
    suffix, not a macro. This assumption is invalid for a suffix beginning
    with two underscores, because such names are reserved and can't be used
    for UDLs anyway. Checking for exactly one underscore restores support
    for macro expansion in cases like "File: "__FILE__ or "Date: "__DATE__
    (which are formally ill-formed but accepted with a warning, as a
    conforming extension).
    
    gcc/testsuite:
    
            PR preprocessor/84517
            * g++.dg/cpp0x/udlit-macros.C: Expect a warning for ""__FILE__.
    
    libcpp:
    
            PR preprocessor/84517
            * lex.c (is_macro_not_literal_suffix): New function.
            (lex_raw_string, lex_string): Use is_macro_not_literal_suffix to
            decide when to issue -Wliteral-suffix warnings.

Comments

Tim Song Feb. 27, 2018, 4:49 p.m. | #1
On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> Since the fix for PR c++/80955 any suffix on a string literal that

> begins with an underscore is assumed to be a user-defined literal

> suffix, not a macro. This assumption is invalid for a suffix beginning

> with two underscores, because such names are reserved and can't be used

> for UDLs anyway.


[lex.name]/3 reserves all identifiers containing double underscore,
but the spaceless one-token form does not actually use such an
identifier.

See the (equally reserved) _Bq example in [over.literal]/8:

double operator""_Bq(long double);                  // OK: does not
use the reserved identifier _­Bq ([lex.name])
double operator"" _Bq(long double);                 // uses the
reserved identifier _­Bq ([lex.name])
Jonathan Wakely Feb. 27, 2018, 4:59 p.m. | #2
On 27 February 2018 at 16:49, Tim Song <t.canens.cpp@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:

>> Since the fix for PR c++/80955 any suffix on a string literal that

>> begins with an underscore is assumed to be a user-defined literal

>> suffix, not a macro. This assumption is invalid for a suffix beginning

>> with two underscores, because such names are reserved and can't be used

>> for UDLs anyway.

>

> [lex.name]/3 reserves all identifiers containing double underscore,

> but the spaceless one-token form does not actually use such an

> identifier.

>

> See the (equally reserved) _Bq example in [over.literal]/8:

>

> double operator""_Bq(long double);                  // OK: does not

> use the reserved identifier _­Bq ([lex.name])

> double operator"" _Bq(long double);                 // uses the

> reserved identifier _­Bq ([lex.name])


I know, but GCC doesn't implement the rule accurately. I reported PR
80955 because GCC's UDL parsing meant we reject valid programs. The
fix for that bug was a bit of a hack, simply adding a special case so
that a suffix starting with an underscore is never expanded as a
macro, which makes the above examples do the right thing. But that
introduces a regression where we no longer accept ""__FILE__ (because
it starts with an underscore), where previously that was accepted as
an extension, just like "%"PRIu64 is accepted (with a warning).

After the hack for 80955 we still do the wrong thing for:

#define foo
int operator""foo();

But that can't appear in a valid program, so we get away with it. But
it's still a hack.

My patch doesn't try to change how we parse operator""_Bq it just
ensures we accept ""__FILE__ and similar cases. I think my patch means
we reject:

int operator""__X(unsigned long long);

But that also can't appear in a valid program, so again we get away with it.
Jonathan Wakely Feb. 27, 2018, 5:13 p.m. | #3
On 27 February 2018 at 16:59, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> On 27 February 2018 at 16:49, Tim Song <t.canens.cpp@gmail.com> wrote:

>> On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:

>>> Since the fix for PR c++/80955 any suffix on a string literal that

>>> begins with an underscore is assumed to be a user-defined literal

>>> suffix, not a macro. This assumption is invalid for a suffix beginning

>>> with two underscores, because such names are reserved and can't be used

>>> for UDLs anyway.

>>

>> [lex.name]/3 reserves all identifiers containing double underscore,

>> but the spaceless one-token form does not actually use such an

>> identifier.

>>

>> See the (equally reserved) _Bq example in [over.literal]/8:

>>

>> double operator""_Bq(long double);                  // OK: does not

>> use the reserved identifier _­Bq ([lex.name])

>> double operator"" _Bq(long double);                 // uses the

>> reserved identifier _­Bq ([lex.name])

>

> I know, but GCC doesn't implement the rule accurately. I reported PR

> 80955 because GCC's UDL parsing meant we reject valid programs. The

> fix for that bug was a bit of a hack, simply adding a special case so

> that a suffix starting with an underscore is never expanded as a

> macro, which makes the above examples do the right thing. But that

> introduces a regression where we no longer accept ""__FILE__ (because

> it starts with an underscore), where previously that was accepted as

> an extension, just like "%"PRIu64 is accepted (with a warning).

>

> After the hack for 80955 we still do the wrong thing for:

>

> #define foo

> int operator""foo();

>

> But that can't appear in a valid program, so we get away with it. But

> it's still a hack.

>

> My patch doesn't try to change how we parse operator""_Bq it just

> ensures we accept ""__FILE__ and similar cases. I think my patch means

> we reject:

>

> int operator""__X(unsigned long long);

>

> But that also can't appear in a valid program, so again we get away with it.


Sorry for being inaccurate, what I mean is the problem case where a
macro clashes with a UDL can't appear (because defining the macro
would be invalid):

#define __X
int operator""__X(unsigned long long);

However, that doesn't help for this case:

int operator""__FILE__(unsigned long long);

This is allowed as a UDL, and doesn't use a reserved identifier
(because it doesn't define __FILE__, the implementation does), but
would fail after my patch because __FILE__ gets incorrectly expanded
as a macro. I don't have much sympathy for anybody defining such a
UDL.
Jason Merrill Feb. 27, 2018, 6:11 p.m. | #4
OK.

On Tue, Feb 27, 2018 at 9:41 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> Since the fix for PR c++/80955 any suffix on a string literal that

> begins with an underscore is assumed to be a user-defined literal

> suffix, not a macro. This assumption is invalid for a suffix beginning

> with two underscores, because such names are reserved and can't be used

> for UDLs anyway. Checking for exactly one underscore restores support

> for macro expansion in cases like "File: "__FILE__ or "Date: "__DATE__

> (which are formally ill-formed but accepted with a warning, as a

> conforming extension).

>

> gcc/testsuite:

>

>         PR preprocessor/84517

>         * g++.dg/cpp0x/udlit-macros.C: Expect a warning for ""__FILE__.

>

> libcpp:

>

>         PR preprocessor/84517

>         * lex.c (is_macro_not_literal_suffix): New function.

>         (lex_raw_string, lex_string): Use is_macro_not_literal_suffix to

>         decide when to issue -Wliteral-suffix warnings.

>

>

> Tested powerpc64le-linux, OK for trunk?

Patch

diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C b/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
index fb518281811..7ef324b7e04 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-macros.C
@@ -16,7 +16,7 @@  int operator""_ID(const char*, size_t) { return 0; }
 int main()
 {
   long i64 = 123;
-  char buf[100];
+  char buf[] = "xxxxxx"__FILE__;      // { dg-warning "invalid suffix on literal" }
   sprintf(buf, "%"PRId64"abc", i64);  // { dg-warning "invalid suffix on literal" }
   return strcmp(buf, "123abc")
 	 + ""_zero
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 92c62517a4d..37c365a3560 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1630,6 +1630,21 @@  is_macro(cpp_reader *pfile, const uchar *base)
   return !result ? false : (result->type == NT_MACRO);
 }
 
+/* Returns true if a literal suffix does not have the expected form
+   and is defined as a macro.  */
+
+static bool
+is_macro_not_literal_suffix(cpp_reader *pfile, const uchar *base)
+{
+  /* User-defined literals outside of namespace std must start with a single
+     underscore, so assume anything of that form really is a UDL suffix.
+     We don't need to worry about UDLs defined inside namespace std because
+     their names are reserved, so cannot be used as macro names in valid
+     programs.  */
+  if (base[0] == '_' && base[1] != '_')
+    return false;
+  return is_macro (pfile, base);
+}
 
 /* Lexes a raw string.  The stored string contains the spelling, including
    double quotes, delimiter string, '(' and ')', any leading
@@ -1900,10 +1915,8 @@  lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base,
     {
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
-	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued.
-	 The macro name should not start with '_' for this warning. */
-      if ((*cur != '_') && is_macro (pfile, cur))
+	 literal thus breaking the program.  */
+      if (is_macro_not_literal_suffix (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
@@ -2031,10 +2044,8 @@  lex_string (cpp_reader *pfile, cpp_token *token, const uchar *base)
     {
       /* If a string format macro, say from inttypes.h, is placed touching
 	 a string literal it could be parsed as a C++11 user-defined string
-	 literal thus breaking the program.
-	 Try to identify macros with is_macro. A warning is issued.
-	 The macro name should not start with '_' for this warning. */
-      if ((*cur != '_') && is_macro (pfile, cur))
+	 literal thus breaking the program.  */
+      if (is_macro_not_literal_suffix (pfile, cur))
 	{
 	  /* Raise a warning, but do not consume subsequent tokens.  */
 	  if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)