[PR84969] Don't reorder builtin memsets if they set different rhs values

Message ID DB6PR0802MB2504475DBF9C4380CE47E0D9E7AB0@DB6PR0802MB2504.eurprd08.prod.outlook.com
State New
Headers show
Series
  • [PR84969] Don't reorder builtin memsets if they set different rhs values
Related show

Commit Message

Bin Cheng March 20, 2018, 6:18 p.m.
Hi,
As noted in PR84969, fuse_memset_builtins breaks dependence between different memsets.
Specifically, it reorders different builtin memset partitions though it doesn't merge
them in the end.  This simple patch fixes this wrong code issue by checking if any two
builtin memsets set the same rhs value or not.  Note we don't need to bother if two
memsets intersect with each other or not.

Of course, this would miss opportunity merging S1/S3 in below case:
  memset(p+12, 0, 12);   //<-----S1
  memset(p+17, 1, 10);
  memset(p, 0, 12);      //<-----S3
In my opinion, this should be resolved in a more general way maximizing parallelism
as well as merging opportunities when sorting partitions into topological order from
dependence graph, which isn't GCC8 task.

Bootstrap and test on x86_64 and AArch64 ongoing.  Okay if no failures?

Thanks,
bin

2018-03-20  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/84969
	* tree-loop-distribution.c (fuse_memset_builtins): Don't reorder
	builtin memset partitions if they set differnt rhs values.

gcc/testsuite
2018-03-20  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/84969
	* gcc.dg/tree-ssa/pr84969.c: New test.

Comments

Martin Liška March 20, 2018, 8:08 p.m. | #1
On 03/20/2018 07:18 PM, Bin Cheng wrote:
> Hi,

> As noted in PR84969, fuse_memset_builtins breaks dependence between different memsets.

> Specifically, it reorders different builtin memset partitions though it doesn't merge

> them in the end.  This simple patch fixes this wrong code issue by checking if any two

> builtin memsets set the same rhs value or not.  Note we don't need to bother if two

> memsets intersect with each other or not.

> 

> Of course, this would miss opportunity merging S1/S3 in below case:

>    memset(p+12, 0, 12);   //<-----S1

>    memset(p+17, 1, 10);

>    memset(p, 0, 12);      //<-----S3

> In my opinion, this should be resolved in a more general way maximizing parallelism

> as well as merging opportunities when sorting partitions into topological order from

> dependence graph, which isn't GCC8 task.

> 

> Bootstrap and test on x86_64 and AArch64 ongoing.  Okay if no failures?

> 

> Thanks,

> bin

> 

> 2018-03-20  Bin Cheng  <bin.cheng@arm.com>

> 

> 	PR tree-optimization/84969

> 	* tree-loop-distribution.c (fuse_memset_builtins): Don't reorder

> 	builtin memset partitions if they set differnt rhs values.

> 

> gcc/testsuite

> 2018-03-20  Bin Cheng  <bin.cheng@arm.com>

> 

> 	PR tree-optimization/84969

> 	* gcc.dg/tree-ssa/pr84969.c: New test.

> 


Thanks Bin.

I can confirm it fixes regression tests of postgres w/ -O3.

Martin
Richard Biener March 21, 2018, 9:28 a.m. | #2
On Tue, Mar 20, 2018 at 7:18 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,

> As noted in PR84969, fuse_memset_builtins breaks dependence between different memsets.

> Specifically, it reorders different builtin memset partitions though it doesn't merge

> them in the end.  This simple patch fixes this wrong code issue by checking if any two

> builtin memsets set the same rhs value or not.  Note we don't need to bother if two

> memsets intersect with each other or not.

>

> Of course, this would miss opportunity merging S1/S3 in below case:

>   memset(p+12, 0, 12);   //<-----S1

>   memset(p+17, 1, 10);

>   memset(p, 0, 12);      //<-----S3

> In my opinion, this should be resolved in a more general way maximizing parallelism

> as well as merging opportunities when sorting partitions into topological order from

> dependence graph, which isn't GCC8 task.

>

> Bootstrap and test on x86_64 and AArch64 ongoing.  Okay if no failures?


OK.

Richard.

> Thanks,

> bin

>

> 2018-03-20  Bin Cheng  <bin.cheng@arm.com>

>

>         PR tree-optimization/84969

>         * tree-loop-distribution.c (fuse_memset_builtins): Don't reorder

>         builtin memset partitions if they set differnt rhs values.

>

> gcc/testsuite

> 2018-03-20  Bin Cheng  <bin.cheng@arm.com>

>

>         PR tree-optimization/84969

>         * gcc.dg/tree-ssa/pr84969.c: New test.

>

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84969.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84969.c
new file mode 100644
index 0000000..e15c3d9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84969.c
@@ -0,0 +1,57 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-distribute-patterns" } */
+
+static void
+__attribute__((noipa, noinline))
+foo (char **values, int ndim, char *needquotes, int *dims)
+{
+  int i;
+  int j = 0;
+  int k = 0;
+  char *retval = (char *)__builtin_malloc(1000); 
+  char *p = retval;
+  char *tmp;
+
+  int indx[111];
+
+#define APPENDSTR(str)	(__builtin_strcpy(p, (str)), p += __builtin_strlen(p))
+#define APPENDCHAR(ch)	(*p++ = (ch), *p = '\0')
+
+	APPENDCHAR('{');
+	for (i = 0; i < ndim; i++)
+		indx[i] = 0;
+	do
+	{
+		for (i = j; i < ndim - 1; i++)
+			APPENDCHAR('{');
+
+			APPENDSTR(values[k]);
+		k++;
+
+		for (i = ndim - 1; i >= 0; i--)
+		{
+			indx[i] = (indx[i] + 1) % dims[i];
+			if (indx[i])
+			{
+				APPENDCHAR(',');
+				break;
+			}
+			else
+				APPENDCHAR('}');
+		}
+		j = i;
+	} while (j != -1);
+
+	if (__builtin_strcmp (retval, "{{{0,1},{2,3}}}") != 0)
+	  __builtin_abort ();
+}
+
+int main()
+{
+  char* array[4] = {"0", "1", "2", "3"};
+  char f[] = {0, 0, 0, 0, 0, 0, 0, 0};
+  int dims[] = {1, 2, 2};
+  foo (array, 3, f, dims);
+
+  return 0;
+}
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 67f27ba..5e327f4 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2569,6 +2569,7 @@  fuse_memset_builtins (vec<struct partition *> *partitions)
 {
   unsigned i, j;
   struct partition *part1, *part2;
+  tree rhs1, rhs2;
 
   for (i = 0; partitions->iterate (i, &part1);)
     {
@@ -2586,6 +2587,12 @@  fuse_memset_builtins (vec<struct partition *> *partitions)
 	      || !operand_equal_p (part1->builtin->dst_base_base,
 				   part2->builtin->dst_base_base, 0))
 	    break;
+
+	  /* Memset calls setting different values can't be merged.  */
+	  rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
+	  rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
+	  if (!operand_equal_p (rhs1, rhs2, 0))
+	    break;
 	}
 
       /* Stable sort is required in order to avoid breaking dependence.  */
@@ -2617,8 +2624,8 @@  fuse_memset_builtins (vec<struct partition *> *partitions)
 	  i++;
 	  continue;
 	}
-      tree rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
-      tree rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
+      rhs1 = gimple_assign_rhs1 (DR_STMT (part1->builtin->dst_dr));
+      rhs2 = gimple_assign_rhs1 (DR_STMT (part2->builtin->dst_dr));
       int bytev1 = const_with_all_bytes_same (rhs1);
       int bytev2 = const_with_all_bytes_same (rhs2);
       /* Only merge memset partitions of the same value.  */