warn for strlen of arrays with missing nul (PR 86552)

Message ID a0d277c4-41c5-61a8-0284-5b1b245e2c74@gmail.com
State New
Headers show
Series
  • warn for strlen of arrays with missing nul (PR 86552)
Related show

Commit Message

Martin Sebor July 19, 2018, 8:08 p.m.
In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

   const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin

Comments

Martin Sebor July 25, 2018, 11:38 p.m. | #1
Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:
> In the discussion of my patch for pr86532 Bernd noted that

> GCC silently accepts constant character arrays with no

> terminating nul as arguments to strlen (and other string

> functions).

>

> The attached patch is a first step in detecting these kinds

> of bugs in strlen calls by issuing -Wstringop-overflow.

> The next step is to modify all other handlers of built-in

> functions to detect the same problem (not part of this patch).

> Yet another step is to detect these problems in arguments

> initialized using the non-string form:

>

>   const char a[] = { 'a', 'b', 'c' };

>

> This patch is meant to apply on top of the one for bug 86532

> (I tested it with an earlier version of that patch so there

> is code in the context that does not appear in the latest

> version of the other diff).

>

> Martin

>
Martin Sebor July 30, 2018, 7:17 p.m. | #2
Attached is an updated version of the patch that handles more
instances of calling strlen() on a constant array that is not
a nul-terminated string.

No other functions except strlen are explicitly handled yet,
and neither are constant arrays with braced-initializer lists
like const char a[] = { 'a', 'b', 'c' };  I am testing
an independent solution for those (bug 86552).  Once those
are handled the warning will be able to detect those as well.

Tested on x86_64-linux.

On 07/25/2018 05:38 PM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

>

> The fix for bug 86532 has been checked in so this enhancement

> can now be applied on top of it (with only minor adjustments).

>

> On 07/19/2018 02:08 PM, Martin Sebor wrote:

>> In the discussion of my patch for pr86532 Bernd noted that

>> GCC silently accepts constant character arrays with no

>> terminating nul as arguments to strlen (and other string

>> functions).

>>

>> The attached patch is a first step in detecting these kinds

>> of bugs in strlen calls by issuing -Wstringop-overflow.

>> The next step is to modify all other handlers of built-in

>> functions to detect the same problem (not part of this patch).

>> Yet another step is to detect these problems in arguments

>> initialized using the non-string form:

>>

>>   const char a[] = { 'a', 'b', 'c' };

>>

>> This patch is meant to apply on top of the one for bug 86532

>> (I tested it with an earlier version of that patch so there

>> is code in the context that does not appear in the latest

>> version of the other diff).

>>

>> Martin

>>

>
PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86552
	* gcc.dg/warn-string-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+     of a conditional expression.  When ARR is null, point it at
+     one of the elements for simplicity.  */
+  tree arrs[] = { NULL_TREE, NULL_TREE };
+  if (!arr)
+    arr = arrs;
+
   if (TREE_CODE (src) == COND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
     {
-      tree len1, len2;
-
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
+      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
       if (tree_int_cst_equal (len1, len2))
-	return len1;
+	{
+	  *arr = arrs[0] ? arrs[0] : arrs[1];
+	  return len1;
+	}
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
-    return NULL_TREE;
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
+    {
+      *arr = arrs[0] ? arrs[0] : arrs[1];
+      return NULL_TREE;
+    }
+
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm)
+    *arr = NULL_TREE;
 
   /* Determine the size of the string element.  */
   unsigned eltsize
@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
 	maxelts = maxelts / eltsize - 1;
       }
 
+  /* Unless the caller is prepared to handle it by passing in a non-null
+     ARR, fail if the terminating nul doesn't fit in the array the string
+     is stored in (as in const char a[3] = "123";  */
+  if (!arr && maxelts < strelts)
+    return NULL_TREE;
+
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
   const char *ptr = TREE_STRING_POINTER (src);
@@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  /* Array refers to the non-nul terminated constant array
+	     whose length is attempted to be computed.  */
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
+	    loc = EXPR_LOCATION (arg);
+
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 2e0a2f9..73b0b0b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
@@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-
+extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..edbd7f8 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  /* Avoid pointers to arrays (see bug 86622).  */
 	  if (POINTER_TYPE_P (TREE_TYPE (arg))
@@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (nulterm)
+	*nulterm = true;
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) try to determine
+     the size of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+
+  if (TREE_CODE (ref) == MEM_REF)
+    {
+      ref = TREE_OPERAND (ref, 0);
+      if (TREE_CODE (ref) == ADDR_EXPR)
+	{
+	  ref = TREE_OPERAND (ref, 0);
+	  reftype = TREE_TYPE (ref);
+	}
+    }
+  else
+    while (TREE_CODE (ref) == ARRAY_REF)
+      {
+	reftype = TREE_TYPE (ref);
+	ref = TREE_OPERAND (ref, 0);
+      }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
   length = string_length (TREE_STRING_POINTER (init), charsize,
 			  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@ expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 06a42060..849a443 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
   switch (fn)
     {
     case CFN_BUILT_IN_STRLEN:
-      if (const char *str = c_getstr (arg))
-	return build_int_cst (type, strlen (str));
-      return NULL_TREE;
+      {
+	bool nulterm;
+	if (const char *str = c_getstr (arg, NULL, &nulterm))
+	  if (nulterm)
+	    return build_int_cst (type, strlen (str));
+
+	return NULL_TREE;
+      }
 
     CASE_CFN_NAN:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b318fc77..ecbc38c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRSIZE is non-null, store the size of the string
+   literal in *STRSIZE, including any embedded or terminating nuls.  If
+   If NULLTERM is non-null, set *NULLTERM if the referenced string is
+   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
+   can happen in the case of valid C declarations such as:
+   const char a[3] = "123";  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
+	  bool *nulterm /* = NULL */)
 {
   tree offset_node;
 
-  if (strlen)
-    *strlen = 0;
+  if (strsize)
+    *strsize = 0;
 
-  src = string_constant (src, &offset_node);
+  src = string_constant (src, &offset_node, nulterm);
   if (src == 0)
     return NULL;
 
@@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	offset = tree_to_uhwi (offset_node);
     }
 
-  /* STRING_LENGTH is the size of the string literal, including any
-     embedded NULs.  STRING_SIZE is the size of the array the string
+  /* STRING_SIZE is the size of the string literal, including any
+     embedded NULs.  ARRAY_SIZE is the size of the array the string
      literal is stored in.  */
-  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
+  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
+  unsigned HOST_WIDE_INT array_size = string_size;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+      array_size = tree_to_uhwi (size);
+
+  const char *string = TREE_STRING_POINTER (src);
 
-  if (strlen)
+  if (strsize)
     {
-      /* Compute and store the length of the substring at OFFSET.
+      /* Compute and store the size of the substring at OFFSET.
 	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
+      if (offset <= string_size)
+	*strsize = string_size - offset;
       else
-	*strlen = 0;
+	*strsize = 0;
     }
 
-  const char *string = TREE_STRING_POINTER (src);
-
-  if (string_length == 0
-      || offset >= string_size)
+  if (string_size == 0
+      || offset >= array_size)
     return NULL;
 
-  if (strsize)
-    {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
-    }
-  else if (string[string_length - 1] != '\0')
+  if (!nulterm && string[string_size - 1] != '\0')
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
+      /* When NULTERM is null, support only properly nul-terminated
+	 strings but handle consecutive strings within the same array,
+	 such as the six substrings in "1\0002\0003".  Otherwise, let
+	 the caller deal with non-nul-terminated arrays.  */
       return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset <= string_size ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 1b9ccc0..a58a4a2 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+			     bool * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index c3fa570..9eefb37 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
    Set *FLEXP to true if the range of the string lengths has been
    obtained from the upper bound of an array at the end of a struct.
    Such an array may hold a string that's longer than its upper bound
-   due to it being used as a poor-man's flexible array member.  */
+   due to it being used as a poor-man's flexible array member.
+   Clear *NULTERM if ARG refers to a constant array that is known
+   not be nul-terminated.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  int fuzzy, bool *flexp)
+		  int fuzzy, bool *flexp, bool *nulterm)
 {
   tree var, val = NULL_TREE;
   gimple *def_stmt;
@@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	      if (TREE_CODE (aop0) == INDIRECT_REF
 		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 		return get_range_strlen (TREE_OPERAND (aop0, 0),
-					 length, visited, type, fuzzy, flexp);
+					 length, visited, type, fuzzy, flexp,
+					 nulterm);
 	    }
 	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
 	    {
@@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1);
+	{
+	  tree arr;
+	  val = c_strlen (arg, 1, &arr);
+	  if (val && arr)
+	    *nulterm = false;
+	}
 
       if (!val && fuzzy)
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
-				     visited, type, fuzzy, flexp);
+				     visited, type, fuzzy, flexp, nulterm);
 
 	  if (TREE_CODE (arg) == ARRAY_REF)
 	    {
@@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             || gimple_assign_unary_nop_p (def_stmt))
           {
             tree rhs = gimple_assign_rhs1 (def_stmt);
-	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
+	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
+				     nulterm);
           }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
@@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 
 	    for (unsigned int i = 0; i < 2; i++)
 	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
-				     flexp))
+				     flexp, nulterm))
 		{
 		  if (fuzzy == 2)
 		    *maxlen = build_all_ones_cst (size_type_node);
@@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             if (arg == gimple_phi_result (def_stmt))
               continue;
 
-	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
+	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
+				   nulterm))
 	      {
 		if (fuzzy == 2)
 		  *maxlen = build_all_ones_cst (size_type_node);
@@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
    and false if PHIs and COND_EXPRs are to be handled optimistically,
    if we can determine string length minimum and maximum; it will use
    the minimum from the ones where it can be determined.
-   STRICT false should be only used for warning code.  */
+   STRICT false should be only used for warning code.
+   When non-null, clear *NULTERM if ARG refers to a constant array
+   that is known not be nul-terminated.  Otherwise set it to true.  */
 
 bool
-get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
+get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
+		  bool *nulterm /* = NULL */)
 {
   bitmap visited = NULL;
 
   minmaxlen[0] = NULL_TREE;
   minmaxlen[1] = NULL_TREE;
 
+  bool nultermbuf;
+  if (!nulterm)
+    nulterm = &nultermbuf;
+  *nulterm = true;
+
   bool flexarray = false;
   if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
-			 &flexarray))
+			 &flexarray, nulterm))
     {
       minmaxlen[0] = NULL_TREE;
       minmaxlen[1] = NULL_TREE;
@@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type)
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
@@ -3496,12 +3514,14 @@ static bool
 gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  tree arg = gimple_call_arg (stmt, 0);
 
   wide_int minlen;
   wide_int maxlen;
 
   tree lenrange[2];
-  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
+  bool nulterm;
+  if (!get_range_strlen (arg, lenrange, true, &nulterm)
       && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
       && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
     {
@@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 
   if (minlen == maxlen)
     {
+      if (!nulterm)
+	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
+			    gimple_call_fndecl (stmt), arg);
+
       lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
 					      true, GSI_SAME_STMT);
       replace_call_with_value (gsi, lenrange[0]);
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..fe11728 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-extern bool get_range_strlen (tree, tree[2], bool = false);
+extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
 extern tree get_maxval_strlen (tree, int);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.dg/warn-string-no-nul.c b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
new file mode 100644
index 0000000..838528f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
@@ -0,0 +1,239 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+int i1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+T (&a[i0] + 1);       /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+T (b[i0]);
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
+
+T (i0 ? "" : b[0]);
+T (i0 ? "" : b[1]);
+T (i0 ? "" : b[2]);
+T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
+T (i0 ? b[0] : "");
+T (i0 ? b[1] : "");
+T (i0 ? b[2] : "");
+T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
+
+T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
+T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
+
+T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
+T (i0 ? b[0] : b[2]);
+T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
+T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
+
+T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
+T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+/* It's possible to detect the missing nul in the following two
+   expressions but GCC doesn't do it yet.  */
+T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);
+
+
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+
+T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
+T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
+
+T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
+T (i0 ? ba[0].a[1].b : ba[0].a[0].a);

Patch

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86552
	* builtins.c (warn_string_no_nul): New function.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	(fold_builtin_1): Adjust.
	* builtins.h (c_strlen): Add argument.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Revert test.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86552
	* gcc.dg/warn-string-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 03cf012..9885c4b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@  static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@  string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+static void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,13 +597,17 @@  string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
   if (TREE_CODE (src) == COND_EXPR
@@ -581,24 +615,31 @@  c_strlen (tree src, int only_value)
     {
       tree len1, len2;
 
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arr);
+      len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arr);
       if (tree_int_cst_equal (len1, len2))
 	return len1;
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
     return NULL_TREE;
 
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm && arr)
+    *arr = NULL_TREE;
+
   /* Determine the size of the string element.  */
   unsigned eltsize
     = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (src))));
@@ -650,7 +691,8 @@  c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +732,7 @@  c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2897,6 @@  expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2905,37 @@  expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8238,19 +8296,27 @@  fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9158,7 +9224,7 @@  fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index c922904..9446c09 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,7 +57,7 @@  extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index 79ead3d..79bcbbe 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@  is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11335,7 +11339,7 @@  string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  tree type = TREE_TYPE (arg1);
 	  *ptr_offset = fold_build2 (PLUS_EXPR, type, offset, arg1);
@@ -11357,12 +11361,10 @@  string_constant (tree arg, tree *ptr_offset)
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
 
-      while (TREE_CODE (chartype) == ARRAY_TYPE
-	     || TREE_CODE (chartype) == POINTER_TYPE)
-	chartype = TREE_TYPE (chartype);
+      gcc_assert (TREE_CODE (chartype) == POINTER_TYPE);
 
-      if (TREE_CODE (chartype) != INTEGER_TYPE)
-	return NULL;
+      while (TREE_CODE (chartype) != INTEGER_TYPE)
+	chartype = TREE_TYPE (chartype);
 
       /* Set the non-constant offset to the non-constant index scaled
 	 by the size of the character type.  */
@@ -11374,6 +11376,8 @@  string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11420,6 +11424,38 @@  string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) determine the size
+     of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+  while (TREE_CODE (ref) == ARRAY_REF)
+    {
+      reftype = TREE_TYPE (ref);
+      ref = TREE_OPERAND (ref, 0);
+    }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11430,7 +11466,9 @@  string_constant (tree arg, tree *ptr_offset)
      but not to strlen().  */
   unsigned HOST_WIDE_INT length
     = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@  expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 15bbf95..b318fc77 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14638,8 +14638,7 @@  c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	 NUL-terminated strings.  */
       *strsize = string_size;
     }
-  else if (string_size < string_length
-	   || string[string_length - 1] != '\0')
+  else if (string[string_length - 1] != '\0')
     {
       /* Support only properly NUL-terminated strings but handle
 	 consecutive strings within the same array, such as the six
diff --git a/gcc/testsuite/gcc.dg/warn-string-no-nul.c b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
new file mode 100644
index 0000000..e470ade
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
@@ -0,0 +1,200 @@ 
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);